Two real findings from the security reviewer:
1. urllib auto-follows 3xx redirects via the default HTTPRedirectHandler.
The previous code's hop loop never ran — urllib silently followed.
Replaced with http.client + a manual hop loop. Every hop re-runs
_validate_url, so an open-redirect to 127.0.0.1 / RFC1918 / metadata
gets caught on the second hop.
2. DNS TOCTOU — _resolve() validated but urllib.request re-resolved on
connect. Now the connection is pinned to the validated IP via a
PinnedHTTPConn / PinnedHTTPSConn subclass that overrides connect() to
bind socket.create_connection to (addr, port). For HTTPS, TLS
server_hostname is set to the original host so SNI + cert
verification still work against the named host while the TCP
destination is the pinned IP.
Tests added: redirect-to-loopback short-circuits at validation;
too-many-redirects exhausts max_hops; 2xx returns body; non-2xx raises.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors lib/ingest/safe_fetch.js. Same scheme + IP-range checks and
VOID_INGEST_ALLOW_PRIVATE env gate. Used by sync.source_doc and any
future Python workers that fetch user-controlled URLs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>