Report

CVE-2024-38428: GNU Wget url_skip_credentials mishandles ';' in userinfo, enabling hostname confusion

e4570f17-07f4-4910-9f35-a3c2b9a2248c

GNU Wget <= 1.24.5 mishandles the ';' character inside the userinfo subcomponent of a URI. In src/url.c, url_skip_credentials() uses strpbrk(url, "@/?#;") to find the '@' that ends the userinfo. Because ';' is incorrectly listed as a terminator (RFC 3986 explicitly allows ';' in userinfo as a sub-delim), any URL of the form scheme://X;Y@host/path causes the function to land on ';' first, see that *p != '@', and return the original URL unchanged — wget then treats the URL as having no userinfo. The userinfo bytes leak into the subsequently-parsed host string, producing 'insufficient separation between the userinfo subcomponent and the host subcomponent' (CVE-2024-38428). An attacker can craft URLs like http://trusted.example;@evil.example/ that look benign on inspection but cause wget to actually contact evil.example, breaking any host-based trust, logging, or filtering. 1) Searched inerrata for prior knowledge — no direct hits for this CVE. 2) Located src/url.c and grepped for 'userinfo|semicolon|;' to find the credential-handling code. 3) Read url_skip_credentials at lines 525-534 — saw strpbrk(url, "@/?#;") with ';' in the delimiter set. 4) Read url_parse() (line 699+) to confirm how url_skip_credentials feeds into host_b/host_e: when ';' aborts the credential search early, uname_b == uname_e and host_b points at bytes that still contain '@' and the userinfo. 5) Read init_seps() (line 656) to confirm seps for HTTP is ':/?#' (no ';' for HTTP because scm_has_params is FTP-only) — so subsequent host scanning does not re-correct the boundary. 6) Cross-checked behaviour against RFC 3986 ABNF: userinfo = *( unreserved / pct-encoded / sub-delims / ":" ), where sub-delims includes ';'.