tldts
tldts
is a Typescript library to work against complex domain names, subdomains and well-known TLDs. It is a fork of the very good tld.js JavaScript library.
It answers with accuracy to questions like what is mail.google.com
's domain?, what is a.b.ide.kyoto.jp
's subdomain? and is https://big.data
's TLD a well-known one?.
tldts
runs fast (even faster than the original tld.js library thanks to additional optimizations), is fully tested and is safe to use in the browser (UMD bundles are provided as well as an ES6 module version and Typescripts type declarations). Because it relies on Mozilla's public suffix list, now is a good time to say thank you Mozilla!
Install
npm install --save tldts
It ships by default with the latest version of the public suffix lists, but you
can provide your own version (more up-to-date or modified) using the update
method.
Using It
const { parse } = require('tldts');
parse('http://www.writethedocs.org/conf/eu/2017/');
⬇️ Read the documentation below to find out the available functions.
tldts.parse()
This methods returns handy properties about a URL or a hostname.
const tldts = require('tldts');
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv', { allowPrivateDomains: true })
tldts.parse('gopher://domain.unknown/');
tldts.parse('https://192.168.0.0')
Property Name | Type | |
---|
host | String | host part of the input extracted automatically |
isValid | Boolean | Is the hostname valid according to the RFC? |
publicSuffix | String | |
isIcann | Boolean | Does TLD come from public part of the list |
isPrivate | Boolean | Does TLD come from private part of the list |
domain | String | |
subdomain | String | |
Single purpose methods
These methods are shorthands if you want to retrieve only a single value (and
will perform better than parse
because less work will be needed).
getDomain()
Returns the fully qualified domain from a given string.
const { getDomain } = tldts;
getDomain('google.com');
getDomain('fr.google.com');
getDomain('fr.google.google');
getDomain('foo.google.co.uk');
getDomain('t.co');
getDomain('fr.t.co');
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash');
getSubdomain()
Returns the complete subdomain for a given string.
const { getSubdomain } = tldts;
getSubdomain('google.com');
getSubdomain('fr.google.com');
getSubdomain('google.co.uk');
getSubdomain('foo.google.co.uk');
getSubdomain('moar.foo.google.co.uk');
getSubdomain('t.co');
getSubdomain('fr.t.co');
getSubdomain('https://user:password@secure.example.co.uk:443/some/path?and&query#hash');
getPublicSuffix()
Returns the public suffix for a given string.
const { getPublicSuffix } = tldts;
getPublicSuffix('google.com');
getPublicSuffix('fr.google.com');
getPublicSuffix('google.co.uk');
getPublicSuffix('s3.amazonaws.com');
getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true });
getPublicSuffix('tld.is.unknown');
isValidHostname()
Checks if the given string is a valid hostname according to RFC 1035.
It does not check if the TLD is well-known.
const { isValidHostname } = tldts;
isValidHostname('google.com');
isValidHostname('.google.com');
isValidHostname('my.fake.domain');
isValidHostname('localhost');
isValidHostname('https://user:password@example.co.uk:8080/some/path?and&query#hash');
isValidHostname('192.168.0.0')
Troubleshooting
Retrieving subdomain of localhost
and custom hostnames
tldts
methods getDomain
and getSubdomain
are designed to work only with known and valid TLDs.
This way, you can trust what a domain is.
localhost
is a valid hostname but not a TLD. You can pass additional options to each method exposed by tldts
:
const tldts = require('tldts');
tldts.getDomain('localhost');
tldts.getSubdomain('vhost.localhost');
tldts.getDomain('localhost', { validHosts: ['localhost'] });
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] });
Updating the TLDs List
Many libraries offer a list of TLDs. But, are they up-to-date? And how to update them?
tldts
bundles a list of known TLDs but this list can become outdated.
This is especially true if the package have not been updated on npm for a while.
Thankfully for you, you can pass your own version of the list to the update
method of tldts
. It can be fetched from https://publicsuffix.org/list/public_suffix_list.dat
:
const { update } = require('tldts');
update(lists_as_a_string);
Open an issue to request an update of the bundled TLDs.
Contributing
Provide a pull request (with tested code) to include your work in this main project.
Issues may be awaiting for help so feel free to give a hand, with code or ideas.
Performances
tldts
is fast, but keep in mind that it might vary depending on your
own use-case. Because the library tried to be smart, the speed can be
drastically different depending on the input (it will be faster if you
provide an already cleaned hostname, compared to a random URL).
On an Intel i7-6600U (2,60-3,40 GHz) using Node.js v10.9.0
:
For already cleaned hostnames
Methods | ops/sec |
---|
isValidHostname | ~7,500,000 |
getHostname | ~3,200,000 |
getPublicSuffix | ~1,100,000 |
getDomain | ~1,100,000 |
getSubdomain | ~1,100,000 |
parse | ~1,000,000 |
For random URLs
Methods | ops/sec |
---|
isValidHostname | ~12,000,000 |
getHostname | ~2,640,000 |
getPublicSuffix | ~800,000 |
getDomain | ~760,000 |
getSubdomain | ~760,000 |
parse | ~750,000 |
You can measure the performance of tldts
on your hardware by running the following command:
npm run benchmark
Notice: if this is not fast enough for your use-case, please get in touch via an issue so that we can analyze that this is so.
Contributors
This project exists thanks to all the people who contributed to tld.js
as well as tldts
. [Contribute].
License
MIT License.