Parse HTTP headers from RFC 9110 (and a bunch of others) using the full ABNF.
If there is a specified and non-deprecated header you want parsed and it is
not yet supported, please file an issue. I won't be tracking all of the
revisions to all of the docs, but I will fix issues if they are pointed out to
me.
This code was tested against the headers returned by the top 50 websites as
reported by
wikipedia on
the day that I looked in November 2024. I made sure that all of the
non-custom headers that were in use that day by 3 or more of those sites was
supported here.
Installation
npm install @cto.af/http-headers
Caveats
- Check for max headers size before calling this parser. Many servers
choose 8k or 16k as their maximum.
- Check the
unknown
property of headers. Headers that are supported, but
have syntax errors, are treated as if they are unknown, un-parseable
headers. They will have always have these properties:
- kind: lowercased header name
- name: original header name
- value: full text of the header, to the first newline
- unknown: true
- The option
obsolete: true
can be passed in to the parse function to enable
a bunch of obsolete rules in processing email addresses (and a few other
obs_*
productions). Hopefully none of those productions have never
actually been used on the web, but I have included them for completeness,
and left the obsolete
flag in place mostly for testing purposes. - I've tried to stay as faithful to the ABNF for each header as possible.
However, the definitions are rife with different understandings of how ABNF
works. In particular, Parser Expression Grammars (PEGs) parse by trying
each alternate successively until one matches. If an alternate always
matches (e.g. *"foo", which matches the empty string), then none of the
subsequent alternates are ever checked. Similarly, if one of two alternates
is the prefix for another (e.g. "foo" and "foobar"), the longer prefix must
be checked first. There are several places where look-ahead assertions were
required to deal with these sorts of issues, or to ensure testability.
API
Example:
import {parse} from '@cto.af/http-headers';
const headers = parse('Date: Sun, 06 Nov 1994 08:49:37 GMT\r\n\r\n');
const contentType = parse('text/html;charset=utf8', {
startRule: 'Content_Type',
});
const unknownHeader = parse('Foo: bar=baz', {startRule: 'Header'});
See the Peggy docs
for more information on the parse
function.
Here are the supported values for startRule
:
- 'Headers': default
- 'Headers_Loose': Accept "\r\n" or "\n" at the end of lines
- 'Header': Any single header line
- 'Accept'
- 'Accept_CH'
- 'Accept_Charset'
- 'Accept_Encoding'
- 'Accept_Language'
- 'Accept_Ranges'
- 'Access-Control-Allow-Credentials'
- 'Access-Control-Allow-Headers'
- 'Access-Control-Allow-Methods'
- 'Access-Control-Allow-Origin'
- 'Access-Control-Expose-Headers'
- 'Access-Control-Max-Age'
- 'Access-Control-Request-Headers'
- 'Access-Control-Request-Method'
- 'Age'
- 'Allow'
- 'ALPN'
- 'Alt_Svc'
- 'Authentication_Info'
- 'Authorization'
- 'Cache_Control'
- 'Connection'
- 'Content_Encoding'
- 'Content_Language'
- 'Content_Length'
- 'Content_Location'
- 'Content_Range'
- 'Content_Security_Policy'
- 'Content_Security_Policy_Report_Only'
- 'Content_Type'
- 'Cross_Origin_Embedder_Policy'
- 'Cross_Origin_Embedder_Policy_Report_Only'
- 'Cross_Origin_Opener_Policy'
- 'Cross_Origin_Opener_Policy_Report_Only'
- 'Cross_Origin_Resource_Policy'
- 'Date'
- 'ETag'
- 'Expect'
- 'Expires'
- 'From'
- 'Host'
- 'If_Match'
- 'If_Modified_Since'
- 'If_None_Match'
- 'If_Range'
- 'If_Unmodified_Since'
- 'Last_Modified'
- 'Location'
- 'Link'
- 'Max_Forwards'
- 'NEL'
- 'Permissions_Policy'
- 'Proxy_Authenticate'
- 'Proxy_Authentication_Info'
- 'Proxy_Authorization'
- 'Range'
- 'Referer'
- 'Referrer_Policy'
- 'Reporting_Endpoints'
- 'Retry_After'
- 'Server'
- 'Server_Timing'
- 'Set_Cookie'
- 'Strict_Transport_Security'
- 'TE'
- 'Trailer'
- 'Upgrade'
- 'User_Agent'
- 'Vary'
- 'Via'
- 'WWW_Authenticate'
- 'Unknown_Header'
Development
To try a rule out without having to rebuild, do a variation of this:
curl -si --head https://github.com/ | \
tail -n +2 | \
node_modules/.bin/peggy src/index.peggy --format es -T- -S Headers_Loose
