What is tldts?
The 'tldts' npm package is a powerful tool for parsing and manipulating domain names. It helps in extracting various parts of a domain, such as the top-level domain (TLD), subdomain, and domain name. It is particularly useful for tasks involving URL validation, domain categorization, and security checks.
What are tldts's main functionalities?
Extract Domain Parts
This feature allows you to parse a URL and extract its components such as the subdomain, domain, and TLD. The code sample demonstrates how to parse a URL and log the parsed components.
const tldts = require('tldts');
const parsed = tldts.parse('https://sub.example.co.uk/path');
console.log(parsed);
Get Domain Without Subdomain
This feature extracts the domain name without the subdomain. The code sample shows how to get the domain name from a URL.
const tldts = require('tldts');
const domain = tldts.getDomain('https://sub.example.co.uk/path');
console.log(domain);
Get Public Suffix
This feature retrieves the public suffix (TLD) of a given URL. The code sample demonstrates how to extract the public suffix from a URL.
const tldts = require('tldts');
const publicSuffix = tldts.getPublicSuffix('https://sub.example.co.uk/path');
console.log(publicSuffix);
Is Valid URL
This feature checks if a given URL is valid. The code sample shows how to validate a URL.
const tldts = require('tldts');
const isValid = tldts.isValid('https://sub.example.co.uk/path');
console.log(isValid);
Other packages similar to tldts
psl
The 'psl' package is a similar tool that provides functions to parse domain names and extract the public suffix. It is often used for similar purposes as 'tldts', such as URL validation and domain categorization. However, 'tldts' offers more comprehensive parsing capabilities and additional features like subdomain extraction.
url-parse
The 'url-parse' package is a robust URL parser that can decompose URLs into their constituent parts. While it provides general URL parsing capabilities, it does not specialize in domain-specific parsing like 'tldts'. 'tldts' offers more focused functionality for domain and TLD extraction.
parse-domain
The 'parse-domain' package is another tool for parsing domain names and extracting subdomains, domains, and TLDs. It is similar to 'tldts' but may not be as actively maintained or feature-rich. 'tldts' provides a more modern and comprehensive solution for domain parsing.
tldts - Blazing Fast URL Parsing
tldts
is a JavaScript library to extract hostnames, domains, public suffixes, top-level domains and subdomains from URLs.
Features:
- Tuned for performance (order of 0.1 to 1 μs per input)
- Handles both URLs and hostnames
- Full Unicode/IDNA support
- Support parsing email addresses
- Detect IPv4 and IPv6 addresses
- Continuously updated version of the public suffix list
- TypeScript, ships with
umd
, esm
, cjs
bundles and type definitions - Small bundles and small memory footprint
- Battle tested: full test coverage and production use
Install
npm install --save tldts
Usage
Using the command-line interface:
$ npx tldts 'http://www.writethedocs.org/conf/eu/2017/'
{
"domain": "writethedocs.org",
"hostname": "www.writethedocs.org",
"isIcann": true,
"isIp": false,
"isPrivate": false,
"publicSuffix": "org",
"subdomain": "www"
}
Programmatically:
const { parse } = require('tldts');
parse('http://www.writethedocs.org/conf/eu/2017/');
Modern ES6 modules import is also supported:
import { parse } from 'tldts';
Alternatively, you can try it directly in your browser here: https://npm.runkit.com/tldts
API
tldts.parse(url | hostname, options)
tldts.getHostname(url | hostname, options)
tldts.getDomain(url | hostname, options)
tldts.getPublicSuffix(url | hostname, options)
tldts.getSubdomain(url, | hostname, options)
The behavior of tldts
can be customized using an options
argument for all
the functions exposed as part of the public API. This is useful to both change
the behavior of the library as well as fine-tune the performance depending on
your inputs.
{
allowIcannDomains: boolean;
allowPrivateDomains: boolean;
extractHostname: boolean;
validateHostname: boolean;
detectIp: boolean;
mixedInputs: boolean;
validHosts: string[] | null;
}
The parse
method returns handy properties about a URL or a hostname.
const tldts = require('tldts');
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv');
tldts.parse('https://spark-public.s3.amazonaws.com/dataanalysis/loansData.csv', { allowPrivateDomains: true })
tldts.parse('gopher://domain.unknown/');
tldts.parse('https://192.168.0.0')
tldts.parse('https://[::1]')
tldts.parse('tldts@emailprovider.co.uk')
Property Name | Type | Description |
---|
hostname | str | hostname of the input extracted automatically |
domain | str | Domain (tld + sld) |
subdomain | str | Sub domain (what comes after domain ) |
publicSuffix | str | Public Suffix (tld) of hostname |
isIcann | bool | Does TLD come from ICANN part of the list |
isPrivate | bool | Does TLD come from Private part of the list |
isIP | bool | Is hostname an IP address? |
Single purpose methods
These methods are shorthands if you want to retrieve only a single value (and
will perform better than parse
because less work will be needed).
getHostname(url | hostname, options?)
Returns the hostname from a given string.
const { getHostname } = require('tldts');
getHostname('google.com');
getHostname('fr.google.com');
getHostname('fr.google.google');
getHostname('foo.google.co.uk');
getHostname('t.co');
getHostname('fr.t.co');
getHostname('https://user:password@example.co.uk:8080/some/path?and&query#hash');
getDomain(url | hostname, options?)
Returns the fully qualified domain from a given string.
const { getDomain } = require('tldts');
getDomain('google.com');
getDomain('fr.google.com');
getDomain('fr.google.google');
getDomain('foo.google.co.uk');
getDomain('t.co');
getDomain('fr.t.co');
getDomain('https://user:password@example.co.uk:8080/some/path?and&query#hash');
getSubdomain(url | hostname, options?)
Returns the complete subdomain for a given string.
const { getSubdomain } = require('tldts');
getSubdomain('google.com');
getSubdomain('fr.google.com');
getSubdomain('google.co.uk');
getSubdomain('foo.google.co.uk');
getSubdomain('moar.foo.google.co.uk');
getSubdomain('t.co');
getSubdomain('fr.t.co');
getSubdomain('https://user:password@secure.example.co.uk:443/some/path?and&query#hash');
getPublicSuffix(url | hostname, options?)
Returns the public suffix for a given string.
const { getPublicSuffix } = require('tldts');
getPublicSuffix('google.com');
getPublicSuffix('fr.google.com');
getPublicSuffix('google.co.uk');
getPublicSuffix('s3.amazonaws.com');
getPublicSuffix('s3.amazonaws.com', { allowPrivateDomains: true });
getPublicSuffix('tld.is.unknown');
Troubleshooting
Retrieving subdomain of localhost
and custom hostnames
tldts
methods getDomain
and getSubdomain
are designed to work only with known and valid TLDs.
This way, you can trust what a domain is.
localhost
is a valid hostname but not a TLD. You can pass additional options to each method exposed by tldts
:
const tldts = require('tldts');
tldts.getDomain('localhost');
tldts.getSubdomain('vhost.localhost');
tldts.getDomain('localhost', { validHosts: ['localhost'] });
tldts.getSubdomain('vhost.localhost', { validHosts: ['localhost'] });
Updating the TLDs List
tldts
made the opinionated choice of shipping with a list of suffixes directly
in its bundle. There is currently no mechanism to update the lists yourself, but
we make sure that the version shipped is always up-to-date.
If you keep tldts
updated, the lists should be up-to-date as well!
Performance
tldts
is the fastest JavaScript library available for parsing hostnames. It is able to parse millions of inputs per second (typically 2-3M depending on your hardware and inputs). It also offers granular options to fine-tune the behavior and performance of the library depending on the kind of inputs you are dealing with (e.g.: if you know you only manipulate valid hostnames you can disable the hostname extraction step with { extractHostname: false }
).
Please see this detailed comparison with other available libraries.
Contributors
tldts
is based upon the excellent tld.js
library and would not exist without
the many contributors who worked on the project:
This project would not be possible without the amazing Mozilla's
public suffix list. Thank you for your hard work!
License
MIT License.