Skip to content

Accept CONNECT and absolute-form request-targets (RFC 9112 §3.2)#923

Open
CodyPubNub wants to merge 1 commit into
cloudflare:mainfrom
CodyPubNub:connect-absolute-form-parsing
Open

Accept CONNECT and absolute-form request-targets (RFC 9112 §3.2)#923
CodyPubNub wants to merge 1 commit into
cloudflare:mainfrom
CodyPubNub:connect-absolute-form-parsing

Conversation

@CodyPubNub

Copy link
Copy Markdown
Contributor

Resolves #909. Alternative to #912.

What

set_raw_path() passed every request-target through
Uri::builder().path_and_query(). As of http 1.4.1 that builder rejects any
input that is not origin-form or * (RFC 9112 §3.2), so two valid request forms
now fail to parse:

  • absolute-form, GET http://host/path, which clients send through a forward
    proxy. RFC 9112 §3.2.2 requires a server to accept it.
  • authority-form, CONNECT host:443, the only form CONNECT uses (RFC 9112 §3.2.3).

pingora-http declares http = "1", so a routine dependency bump silently turns
every CONNECT and absolute-form request into an InvalidHTTPHeader error. Before
1.4.1 the same builder mis-parsed absolute-form instead, folding the scheme and
authority into the path, so uri.path() returned the whole URL and route matching
failed. Broken either way.

This change detects the non-origin forms and handles each one:

  • absolute-form parses as a full Uri. Scheme and authority stay on the Uri so
    the request-target host is available to callers (§3.2.2: an origin server "MUST
    ignore the received Host header field ... and instead use the host information of
    the request-target"). The wire target comes out as origin-form because
    http_req_header_to_wire() builds the request-line from raw_path(), which
    derives from path_and_query(). An empty path is already / (§3.2.1). The
    no-path-with-query shape (http://host?q, whose path_and_query() is ?q) gets
    rewritten to /?q on the stored Uri, not just on the wire, so a later rebuild
    like the H1 to H2 :path in proxy_h2 also sees valid origin-form.
  • authority-form (when method == CONNECT) parses as a Uri for authority()
    access and re-serializes the original bytes via raw_path(), so the existing
    CONNECT tunneling path (allow_connect_method_proxying) can run. Acceptance is
    exactly uri-host ":" port. A scheme, path, query, missing port, or userinfo are
    all rejected, since http::Uri otherwise parses example.com (host-only) and
    user@host:port as authorities that are not valid CONNECT targets.
  • raw_path() no longer unwrap()s path_and_query(). A Uri without one returns
    the CONNECT authority, otherwise /.
  • set_raw_path() clears raw_path_fallback on entry, so reusing a header (for
    example mutating a CONNECT request into a normal one) can't serialize a stale
    target. This also closes the same latent issue on a non-UTF-8 to UTF-8 path change.

Origin-form and asterisk-form keep their existing fast path. Malformed non-origin
targets (http:///x, http://ho{st/x) still error.

Request-target forms covered

request-line target uri.path() uri.authority() raw_path() (wire)
/path?q (origin) /path /path?q
* (asterisk) * *
http://host/path?q /path host /path?q
http://host / host /
http://host?q / host /?q
host:443 (CONNECT) "" host:443 host:443
http:///x, http://ho{st/x Err

Positioning

This is how servers and reverse proxies already treat absolute-form. nginx strips
scheme and authority in ngx_http_parse_request_line() and routes on the path, and
RFC 9112 §3.2.2 requires any server to accept absolute-form. Same here: origin-form
on the wire, path-based routing, with the request-target authority still reachable
on the Uri. It does not change request-forwarding to prefer the request-target
authority over the Host header; that is a separate behavior that would need its
own tests.

Compared with #912: for absolute-form without a path, that PR stores the raw target
in raw_path_fallback, so raw_path() re-emits http://host on the wire. RFC 9112
§3.2.1 requires origin-form to send at least /, so this normalizes to / instead.
Both keep the non-UTF-8 lossy branch on path_and_query() (see Out of scope); #912's
review flagged that rerouting it through parse::<Uri>() panics the patched_http1
tests.

Tests

cargo test -p pingora-http --lib covers request-target parsing across every form:

  • origin-form and asterisk-form unchanged
  • absolute-form with path, without path, and ?-only; with port; uppercase scheme; non-http scheme
  • authority retained on the Uri (http://host:8080/p?q gives authority host:8080, path /p)
  • CONNECT authority-form, including the §3.2.3 www.example.com:80 example and IPv6 [::1]:443
  • CONNECT rejects origin/absolute/asterisk targets, a missing port, and userinfo
  • stale-fallback cleared on header reuse
  • malformed targets return Err

cargo test -p pingora-core --lib test_absolute_form_and_connect_to_wire
re-serializes each form through http_req_header_to_wire() and asserts the exact
request-line, using the RFC's own examples:

GET http://www.example.org/pub/WWW/TheProject.html HTTP/1.1  ->  GET /pub/WWW/TheProject.html HTTP/1.1
GET http://host?q=1 HTTP/1.1                                 ->  GET /?q=1 HTTP/1.1
GET http://host HTTP/1.1                                     ->  GET / HTTP/1.1
CONNECT www.example.com:80 HTTP/1.1                          ->  CONNECT www.example.com:80 HTTP/1.1

Reproducer for the parse failure this fixes:

// On http >= 1.4.1, before this change both return Err(InvalidHTTPHeader):
RequestHeader::build("CONNECT", b"example.com:443", None);
RequestHeader::build("GET", b"http://example.com/index.html", None);
// After: both Ok. CONNECT keeps "example.com:443", the GET resolves to path "/index.html".

Out of scope

  • Non-UTF-8 paths without a leading slash. The lossy branch's path_and_query()
    call also rejects these on http >= 1.4.1 (test_invalid_path /
    test_override_invalid_path fail at the base commit, independent of this change).
    A separate manifestation of the same crate change. Those tests are
    patched_http1-gated and not run by default CI.
  • OPTIONS absolute-form to asterisk forwarding. RFC 9112 §3.2.4 says a proxy
    forwarding an OPTIONS request in absolute-form with an empty path sends *. This
    normalizes it to /, which is correct for a server receiving it; the
    proxy-forwarding rewrite is left out.

Runtime cost

No new fields, and no change to origin-form, the common case (the starts_with('/')
fast path is unchanged). The CONNECT and absolute-form branches run only for
non-origin-form targets, and raw_path() gains two match arms on an already-cold
path.

If maintainers would rather also fold in the non-UTF-8 lossy branch or the OPTIONS to
asterisk forwarding (see Out of scope), happy to add either in a follow-up commit here.

As of http 1.4.1, PathAndQuery rejects any input that is not origin-form
or "*" (RFC 9112 §3.2), so set_raw_path() — which routed every target
through Uri::builder().path_and_query() — fails to parse CONNECT and
absolute-form request-targets. pingora-http declares http = "1", so a
routine dependency bump silently breaks every forward-proxy and CONNECT
request. (Before 1.4.1 the same call mis-parsed absolute-form, folding
scheme+authority into the path so route matching failed — broken either
way.)

- absolute-form ("http://host/path", §3.2.2): a server MUST accept it.
  Parse as a full Uri, keeping scheme+authority available to callers,
  and re-serialize origin-form on the wire (raw_path() derives from
  path_and_query()). An empty path is "/" (§3.2.1); the no-path-with-
  query shape ("http://host?q") is canonicalized to "/?q" on the stored
  Uri so a later rebuild (e.g. the H1->H2 :path) stays valid.
- authority-form ("host:443", §3.2.3): accepted only for CONNECT and
  only as uri-host ":" port — a scheme, path, query, missing port, or
  userinfo are rejected. The target is re-serialized verbatim so the
  existing CONNECT tunneling path (allow_connect_method_proxying) runs.
- raw_path() no longer unwraps path_and_query(), and set_raw_path()
  clears any stale fallback on entry so a reused header cannot serialize
  a previous target.

Origin-form and asterisk-form keep their fast path; malformed targets
still error. The non-UTF-8 lossy branch is untouched.

Verified by request-target unit tests and end-to-end re-serialization
through http_req_header_to_wire().

Resolves cloudflare#909
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP/1 CONNECT and absolute-form requests fail to parse with http crate >= 1.4.1 (set_raw_path rejects non-origin-form request targets)

1 participant