
ReURL
ReUrl is a library for parsing and manipulating URLs. It supports relative- and non-normalized URLs and a number of operations on them. It can be used to parse, resolve, normalize and serialize URLs in separate phases and in such a way that it conforms to the WhatWG URL Standard.
Motivation
I wrote this library because I needed a library that supported non-normalized and relative URLs but I also wanted to be certain that it followed the specification completely.
The WhatWG URL Standard defines URLs in terms of a parser algorithm that resolves URLs, normalizes URLs and serializes URL components in one pass. Thus to implement a library that follows the standard, but also supports a versatile set of operations on relative, and non-normalized URLs, I had to disentangle these phases from the specification and to some extent rephrase the specification in more elementary terms.
Eventually I came up with a small 'theory' of URLs that I found very helpful and I based the library on that. Over time, this theory has become thoroughly documented in this new URL Specification.
API
Overview
The ReUrl library exposes an Url class and a RawUrl class with an identical API. Their only difference is in their handling of percent escape sequences.
Url
For Url objects the URL parser decodes percent escape sequences, getters report percent-decoded values and the set method assumes that its input is percent-decoded unless explicitly specified otherwise.
var url = new Url ('//host/%61bc')
url.file
url = url.set ({ query:'%def' })
url.query
url.toString ()
RawUrl
For RawUrl objects the parser preserves percent escape sequences, getters report values with percent-escape-sequenes preserved and set expects values in which % signs start a percent-escape sequence.
var url = new RawUrl ('//host/%61bc')
url.file
url = url.set ({ query:'%25%64ef' })
url.query
url.toString ()
Url and RawUrl objects are immutable. Modifying URLs is acomplished through methods that return new Url and/ or RawUrl objects, such as the url.set (patch) method described below.
Constructors
new Url (string \[, conf])
Construct a new Url object from an URL-string. The optional conf argument, if present must be a configuration object as described below.
var url = new Url ('sc:/foo/bar')
console.log (url)
new Url (object)
Construct a new Url object from any object, possibly an Url object itself. The optional conf argument, if present, must be a configuration object as described below.
Throws an error if the object cannot be coerced into a valid URL.
var url = new Url ({ scheme:'file', dirs:['foo', 'buzz'], file:'abc' })
console.log (url.toString ())
conf.parser
You can pass a configuration object with a parser property to the Url constructor to trigger scheme-specific parsing behaviour for relative, scheme-less URL-strings.
The scheme determines support for windows drive-letters and backslash separators.
Drive-letters are only supported in file
URL-strings, and backslash separators are limited to file
, http
, https
, ws
, wss
and ftp
URL-strings.
var url = new Url ('/c:/foo\\bar', { parser:'file' })
console.log (url)
var url = new Url ('/c:/foo\\bar', { parser:'http' })
console.log (url)
var url = new Url ('/c:/foo\\bar')
console.log (url)
Properties
Url and RawUrl objects have the following optional properties.
url.scheme
The scheme of an URL as a string. This property is absent if no scheme part is present, e.g. in scheme-relative URLs.
new Url ('http://foo?search#baz') .scheme
new Url ('/abc/?') .scheme
url.user
The username of an URL as a string. This property is absent if the URL does not have an authority or does not have credentials.
new Url ('http://joe@localhost') .user
new Url ('//host/abc') .user
url.pass
A property for the password of an URL as a string.
This property is absent if the URL does not have an authority, credentials or password.
new Url ('http://joe@localhost') .pass
new Url ('http://host') .pass
new Url ('http://joe:pass@localhost') .pass
new Url ('http://joe:@localhost') .pass
url.host
A property for the hostname of an URL as a string,
This property is absent if the URL does not have an authority.
new Url ('http://localhost') .host
new Url ('http:foo') .host
new Url ('/foo') .host
url.port
The port of (the authority part of) of an URL, being either a number, or the empty string if present. The property is absent if the URL does not have an authority or a port.
new Url ('http://localhost:8080') .port
new Url ('foo://host:/foo') .port
new Url ('foo://host/foo') .port
url.root
A property for the path-root of an URL. Its value is '/'
if the URL has an absolute path. The property is absent otherwise.
new Url ('foo://localhost?q') .root
new Url ('foo://localhost/') .root
new Url ('foo/bar')
new Url ('/foo/bar')
It is possible for file URLs to have a drive, but not a root.
new Url ('file:/c:')
new Url ('file:/c:/')
url.drive
A property for the drive of an URL as a string, if present.
Note that the presence of drives depends on the parser settings and/ or URL scheme.
new Url ('file://c:') .drive
new Url ('http://c:') .drive
new Url ('/c:/foo/bar', 'file') .drive
new Url ('/c:/foo/bar') .drive
url.dirs
If present, a nonempty array of strings. Note that the trailing slash determines whether a component is part of the dirs or set as the file property.
new Url ('/foo/bar/baz/').dirs
new Url ('/foo/bar/baz').dirs
url.file
If present, a non-empty string.
new Url ('/foo/bar/baz') .file
new Url ('/foo/bar/baz/') .file
url.query
A property for the query part of url
as a string,
if present.
new Url ('http://foo?search#baz') .query
new Url ('/abc/?') .query
new Url ('/abc/') .query
url.hash
A property for the hash part of url
as a string,
if present.
new Url ('http://foo#baz') .hash
new Url ('/abc/#') .hash
new Url ('/abc/') .hash
Setting Properties
Url and RawUrl objects are immutable, therefore setting and removing components is achieved via a set method that takes a patch object.
url.set (patch)
The patch object may contain one or more keys being
scheme, user, pass, host, port, drive, root, dirs, file, query and/ or hash. To remove a component you can set its patch' value to null.
If present;
– port must be null
, a string, or a number
– dirs must be an array of strings
– root may be anything and is converted to '/'
if truth-y and is interpreted as null
otherwise
– all others must be null
or a string.
new Url ('//host/dir/file')
.set ({ host:null, query:'q', hash:'h' })
.toString ()
Resets
For security reasons, setting the user will remove pass, unless a value is supplied for it as well.
Setting the host will remove user, pass and port, unless values are supplied for them as well.
new Url ('http://joe:secret@example.com')
.set ({ user:'jane' })
.toString ()
new Url ('http://joe:secret@localhost:8080')
.set ({ host:'example.com' })
.toString ()
patch.percentCoded
The patch may have an additional key percentCoded with a boolean value to indicate that strings in the patch contain percent encode sequences.
This means that you can pass percent-encoded values to Url.set by explicity setting percentCoded to true. The values will then be decoded.
var url = new Url ('//host/')
url = url.set ({ file:'%61bc-%25-sign', percentCoded:true })
url.file
log (url.toString ())
You can pass percent-decoded values to RawUrl.set by explicitly setting percentCoded to false. Percent characters in values will then be encoded; specifically, they will be replaced with %25
.
var rawUrl = new RawUrl ('//host/')
rawUrl = rawUrl.set ({ file:'abc-%-sign', percentCoded:false })
rawUrl.file
rawUrl.toString ()
Note that if no percentCoded value is specified, then Url.set assumes percentCoded to be false whilst RawUrl.set assumes percentCoded to be true.
var url = new Url ('//host/') .set ({ file:'%61bc' })
url.file
url.toString ()
var rawUrl = new RawUrl ('//host/') .set ({ file:'%61bc' })
url.file
rawUrl.toString ()
Conversions
url.toString ()
Converts an Url object to a string. Percent encodes only a minimal set of codepoints. The resulting string may contain non-ASCII codepoints.
var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toString ()
url.toASCII (), url.toJSON (), url.href
Converts an Url object to a string that contains only ASCII code points. Non-ASCII codepoints in components will be percent encoded and/ or punycoded.
var url = new Url ('http://🌿🌿🌿/{braces}/hʌɪ')
url.toASCII ()
url.toURI ()
Uses url.toASCII () to convert url to an RFC3986 URI. Throws an error if url does not have a scheme, because URIs must always have a scheme.
Normalisation
url.normalize (), url.normalise ()
Returns a new Url object by normalizing url
.
This interprets a.o. .
and ..
segments within the path and removes default ports and trivial usernames/ passwords from the authority of url
.
new Url ('http://foo/bar/baz/./../bee') .normalize () .toString ()
Percent Coding
url.percentEncode ()
Returns a RawUrl object by percent-encoding the properties of url
according to the Standard. Prevents double escaping of percent-encoded-bytes in the case of RawUrl objects.
url.percentDecode ()
Returns an Url object by percent-decoding the properties of url
if it is a RawUrl, and leaving them as-is otherwise.
Goto
url.goto (url2)
Returns a new Url object by 'extending' url with url2, where url2 may be a string, an Url or a RawUrl object.
new Url ('/foo/bar') .goto ('baz/index.html') .toString ()
new Url ('/foo/bar') .goto ('//host/path') .toString ()
new Url ('http://foo/bar/baz/') .goto ('./../bee') .toString ()
If url2 is a string, it will be parsed with the scheme of url as a fallback scheme. TODO: if url has no scheme then …
new Url ('file://host/dir/') .goto ('c|/dir2/') .toString ()
new Url ('http://host/dir/') .goto ('c|/dir2/') .toString ()
Base URLs
url.isBase ()
Returns a boolean, indicating if url is a base-URL. What is and is not a base-URL, depends on the scheme of an URL. For example, http
- and file
-URLs that do not have a host are not base-URLs.
url.force ()
Forcibly convert an Url to a base-URL according to this URL Specification, in accordance with the WHATWG Standard.
- In
file
URLs without hostname, the hostname will be set to ''
.
- For URLs that have a scheme being one of
http
, https
, ws
, wss
or ftp
and an absent or empty authority, the authority will be 'stolen from the first nonempty path segment'.
- In the latter case, an error is thrown if url cannot be forced. This happens if it has no scheme, or if it has an empty host and no non-empty path segment.
new Url ('http:foo/bar') .force () .toString ()
new Url ('http:/foo/bar') .force () .toString ()
new Url ('http://foo/bar') .force () .toString ()
new Url ('http:///foo/bar') .force () .toString ()
Reference Resolution
url.genericResolve (base) — RFC3986 - strict
Resolve an Url object url against a base URL base according to the strict reference resolution algorithm as defined in RFC3986.
url.legacyResolve (base) — RFC 3986 - non-strict
Resolve an Url object url against a base URL base according to the non-strict reference resolution algorithm as defined in RFC3986.
url.WHATWGResolve (base), aka. url.resolve
Resolve an Url object url against a base URL base in a way that is compatible with the error-correcting, forcing reference resoluton algorithm as defined in the WHATWG Standard.
Changelog
Version 1.0.0-rc.2
- Converted the project from a CommonJS Module to an ES Module.
- Updated the core to use spec-url version 2.0.0-dev.1
- Changes to the API for reference resolution.
ReUrl now exposes three methods for reference resolution:
- url.genericResolve (base)
- url.legacyResolve (base)
- url.WHATWGResolve (base), also known as
- url.resolve (base)
License
MIT.
Enjoy!