Accept CONNECT and absolute-form request-targets (RFC 9112 §3.2)#923
Open
CodyPubNub wants to merge 1 commit into
Open
Accept CONNECT and absolute-form request-targets (RFC 9112 §3.2)#923CodyPubNub wants to merge 1 commit into
CodyPubNub wants to merge 1 commit into
Conversation
As of http 1.4.1, PathAndQuery rejects any input that is not origin-form
or "*" (RFC 9112 §3.2), so set_raw_path() — which routed every target
through Uri::builder().path_and_query() — fails to parse CONNECT and
absolute-form request-targets. pingora-http declares http = "1", so a
routine dependency bump silently breaks every forward-proxy and CONNECT
request. (Before 1.4.1 the same call mis-parsed absolute-form, folding
scheme+authority into the path so route matching failed — broken either
way.)
- absolute-form ("http://host/path", §3.2.2): a server MUST accept it.
Parse as a full Uri, keeping scheme+authority available to callers,
and re-serialize origin-form on the wire (raw_path() derives from
path_and_query()). An empty path is "/" (§3.2.1); the no-path-with-
query shape ("http://host?q") is canonicalized to "/?q" on the stored
Uri so a later rebuild (e.g. the H1->H2 :path) stays valid.
- authority-form ("host:443", §3.2.3): accepted only for CONNECT and
only as uri-host ":" port — a scheme, path, query, missing port, or
userinfo are rejected. The target is re-serialized verbatim so the
existing CONNECT tunneling path (allow_connect_method_proxying) runs.
- raw_path() no longer unwraps path_and_query(), and set_raw_path()
clears any stale fallback on entry so a reused header cannot serialize
a previous target.
Origin-form and asterisk-form keep their fast path; malformed targets
still error. The non-UTF-8 lossy branch is untouched.
Verified by request-target unit tests and end-to-end re-serialization
through http_req_header_to_wire().
Resolves cloudflare#909
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #909. Alternative to #912.
What
set_raw_path()passed every request-target throughUri::builder().path_and_query(). As ofhttp1.4.1 that builder rejects anyinput that is not origin-form or
*(RFC 9112 §3.2), so two valid request formsnow fail to parse:
GET http://host/path, which clients send through a forwardproxy. RFC 9112 §3.2.2 requires a server to accept it.
CONNECT host:443, the only form CONNECT uses (RFC 9112 §3.2.3).pingora-httpdeclareshttp = "1", so a routine dependency bump silently turnsevery CONNECT and absolute-form request into an
InvalidHTTPHeadererror. Before1.4.1 the same builder mis-parsed absolute-form instead, folding the scheme and
authority into the path, so
uri.path()returned the whole URL and route matchingfailed. Broken either way.
This change detects the non-origin forms and handles each one:
Uri. Scheme and authority stay on theUrisothe request-target host is available to callers (§3.2.2: an origin server "MUST
ignore the received Host header field ... and instead use the host information of
the request-target"). The wire target comes out as origin-form because
http_req_header_to_wire()builds the request-line fromraw_path(), whichderives from
path_and_query(). An empty path is already/(§3.2.1). Theno-path-with-query shape (
http://host?q, whosepath_and_query()is?q) getsrewritten to
/?qon the storedUri, not just on the wire, so a later rebuildlike the H1 to H2
:pathinproxy_h2also sees valid origin-form.method == CONNECT) parses as aUriforauthority()access and re-serializes the original bytes via
raw_path(), so the existingCONNECT tunneling path (
allow_connect_method_proxying) can run. Acceptance isexactly
uri-host ":" port. A scheme, path, query, missing port, or userinfo areall rejected, since
http::Uriotherwise parsesexample.com(host-only) anduser@host:portas authorities that are not valid CONNECT targets.raw_path()no longerunwrap()spath_and_query(). AUriwithout one returnsthe CONNECT authority, otherwise
/.set_raw_path()clearsraw_path_fallbackon entry, so reusing a header (forexample mutating a CONNECT request into a normal one) can't serialize a stale
target. This also closes the same latent issue on a non-UTF-8 to UTF-8 path change.
Origin-form and asterisk-form keep their existing fast path. Malformed non-origin
targets (
http:///x,http://ho{st/x) still error.Request-target forms covered
uri.path()uri.authority()raw_path()(wire)/path?q(origin)/path/path?q*(asterisk)**http://host/path?q/pathhost/path?qhttp://host/host/http://host?q/host/?qhost:443(CONNECT)""host:443host:443http:///x,http://ho{st/xErrPositioning
This is how servers and reverse proxies already treat absolute-form. nginx strips
scheme and authority in
ngx_http_parse_request_line()and routes on the path, andRFC 9112 §3.2.2 requires any server to accept absolute-form. Same here: origin-form
on the wire, path-based routing, with the request-target authority still reachable
on the
Uri. It does not change request-forwarding to prefer the request-targetauthority over the
Hostheader; that is a separate behavior that would need itsown tests.
Compared with #912: for absolute-form without a path, that PR stores the raw target
in
raw_path_fallback, soraw_path()re-emitshttp://hoston the wire. RFC 9112§3.2.1 requires origin-form to send at least
/, so this normalizes to/instead.Both keep the non-UTF-8 lossy branch on
path_and_query()(see Out of scope); #912'sreview flagged that rerouting it through
parse::<Uri>()panics thepatched_http1tests.
Tests
cargo test -p pingora-http --libcovers request-target parsing across every form:?-only; with port; uppercase scheme; non-http schemeUri(http://host:8080/p?qgives authorityhost:8080, path/p)www.example.com:80example and IPv6[::1]:443Errcargo test -p pingora-core --lib test_absolute_form_and_connect_to_wirere-serializes each form through
http_req_header_to_wire()and asserts the exactrequest-line, using the RFC's own examples:
Reproducer for the parse failure this fixes:
Out of scope
path_and_query()call also rejects these on
http>= 1.4.1 (test_invalid_path/test_override_invalid_pathfail at the base commit, independent of this change).A separate manifestation of the same crate change. Those tests are
patched_http1-gated and not run by default CI.forwarding an OPTIONS request in absolute-form with an empty path sends
*. Thisnormalizes it to
/, which is correct for a server receiving it; theproxy-forwarding rewrite is left out.
Runtime cost
No new fields, and no change to origin-form, the common case (the
starts_with('/')fast path is unchanged). The CONNECT and absolute-form branches run only for
non-origin-form targets, and
raw_path()gains twomatcharms on an already-coldpath.
If maintainers would rather also fold in the non-UTF-8 lossy branch or the OPTIONS to
asterisk forwarding (see Out of scope), happy to add either in a follow-up commit here.