Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

parse-domain

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

parse-domain

Splits a hostname into subdomains, domain and (effective) top-level domains

3.0.4
Source
npm

Version published: 3 years ago

Weekly downloads: 107K; decreased by-14.03%

Maintainers: 9

Weekly downloads

Created: 11 years ago

What is parse-domain?

The parse-domain npm package is used to parse domain names into their constituent parts, such as the top-level domain (TLD), second-level domain (SLD), and subdomain. This can be useful for various applications, including URL validation, domain name extraction, and more.

What are parse-domain's main functionalities?

Basic Domain Parsing

This feature allows you to parse a domain name into its constituent parts. The code sample demonstrates how to parse 'www.example.com' into its subdomain, domain, and TLD.

const parseDomain = require('parse-domain');
const parsed = parseDomain('www.example.com');
console.log(parsed);

Handling Different TLDs

This feature allows you to handle domains with different TLDs, including country-code TLDs. The code sample demonstrates parsing 'example.co.uk' into its parts.

const parseDomain = require('parse-domain');
const parsed = parseDomain('example.co.uk');
console.log(parsed);

Parsing URLs with Protocols

This feature allows you to parse full URLs, including the protocol and path, to extract the domain parts. The code sample demonstrates parsing 'https://www.example.com/path'.

const parseDomain = require('parse-domain');
const parsed = parseDomain('https://www.example.com/path');
console.log(parsed);

Other packages similar to parse-domain

parse-domain

Splits a hostname into subdomains, domain and (effective) top-level domains.

Since domain name registrars organize their namespaces in different ways, it's not straight-forward to split a hostname into subdomains, the domain and top-level domains. In order to do that parse-domain uses a large list of known top-level domains from publicsuffix.org:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain(
	// This should be a string with basic latin characters only.
	// More information below.
	"www.some.example.co.uk",
);

// Check if the domain is listed in the public suffix list
if (parseResult.type === ParseResultType.Listed) {
	const {subDomains, domain, topLevelDomains} = parseResult;

	console.log(subDomains); // ["www", "some"]
	console.log(domain); // "example"
	console.log(topLevelDomains); // ["co", "uk"]
} else {
	// Read more about other parseResult types below...
}

This package has been designed for modern Node and browser environments, supporting both CommonJS and ECMAScript modules. It assumes an ES2015 environment with Symbol() and URL() globally available. You need to transpile it down to ES5 (e.g. by using Babel) if you need to support older environments.

The list of top-level domains is stored in a trie data structure and serialization format to ensure the fastest lookup and the smallest possible library size. The library is side-effect free (this is important for proper tree-shaking).

Installation

npm install parse-domain

Updates

💡 Please note: publicsuffix.org is updated several times per month. This package comes with a prebuilt list that has been downloaded at the time of npm publish. In order to get an up-to-date list, you should run npx parse-domain-update everytime you start or build your application. This will download the latest list from https://publicsuffix.org/list/public_suffix_list.dat.

Expected input

⚠️ parseDomain does not parse whole URLs. You should only pass the puny-encoded hostname section of the URL:

❌ Wrong	✅ Correct
`https://user@www.example.com:8080/path?query`	`www.example.com`
`münchen.de`	`xn--mnchen-3ya.de`
`食狮.com.cn?query`	`xn--85x722f.com.cn`

There is the utility function fromUrl which tries to extract the hostname from a (partial) URL and puny-encodes it:

import {parseDomain, fromUrl} from "parse-domain";

const {subDomains, domain, topLevelDomains} = parseDomain(
	fromUrl("https://www.münchen.de?query"),
);

console.log(subDomains); // ["www"]
console.log(domain); // "xn--mnchen-3ya"
console.log(topLevelDomains); // ["de"]

// You can use the 'punycode' NPM package to decode the domain again
import {toUnicode} from "punycode";

console.log(toUnicode(domain)); // "münchen"

fromUrl parses the URL using new URL(). Depending on your target environments you need to make sure that there is a polyfill for it. It's globally available in all modern browsers (no IE) and in Node v10.

Expected output

When parsing a hostname there are 5 possible results:

invalid
it is an ip address
it is formally correct and the domain is
- reserved
- not listed in the public suffix list
- listed in the public suffix list

parseDomain returns a ParseResult with a type property that allows to distinguish these cases.

👉 Invalid domains

The given input is first validated against RFC 1034. If the validation fails, parseResult.type will be ParseResultType.Invalid:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain("münchen.de");

console.log(parseResult.type === ParseResultType.Invalid); // true

Check out the API if you need more information about the validation error.

👉 IP addresses

If the given input is an IP address, parseResult.type will be ParseResultType.Ip:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain("192.168.2.1");

console.log(parseResult.type === ParseResultType.Ip); // true
console.log(parseResult.ipVersion); // 4

It's debatable if a library for parsing domains should also accept IP addresses. In fact, you could argue that parseDomain should reject an IP address as invalid. While this is true from a technical point of view, we decided to report IP addresses in a special way because we assume that a lot of people are using this library to make sense from an arbitrary hostname (see #102).

👉 Reserved domains

There are 5 top-level domains that are not listed in the public suffix list but reserved according to RFC 6761 and RFC 6762:

localhost
local
example
invalid
test

In these cases, parseResult.type will be ParseResultType.Reserved:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain("pecorino.local");

console.log(parseResult.type === ParseResultType.Reserved); // true
console.log(parseResult.labels); // ["pecorino", "local"]

👉 Domains that are not listed

If the given hostname is valid, but not listed in the downloaded public suffix list, parseResult.type will be ParseResultType.NotListed:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain("this.is.not-listed");

console.log(parseResult.type === ParseResultType.NotListed); // true
console.log(parseResult.labels); // ["this", "is", "not-listed"]

If a domain is not listed, it can be caused by an outdated list. Make sure to update the list once in a while.

⚠️ Do not treat parseDomain as authoritative answer. It cannot replace a real DNS lookup to validate if a given domain is known in a certain network.

👉 Effective top-level domains

Technically, the term top-level domain describes the very last domain in a hostname (uk in example.co.uk). Most people, however, use the term top-level domain for the public suffix which is a namespace "under which Internet users can directly register names".

Some examples for public suffixes:

com in example.com
co.uk in example.co.uk
co in example.co
but also com.co in example.com.co

If the hostname is listed in the public suffix list, the parseResult.type will be ParseResultType.Listed:

import {parseDomain, ParseResultType} from "parse-domain";

const parseResult = parseDomain("example.co.uk");

console.log(parseResult.type === ParseResultType.Listed); // true
console.log(parseResult.labels); // ["example", "co", "uk"]

Now parseResult will also provide a subDomains, domain and topLevelDomains property:

const {subDomains, domain, topLevelDomains} = parseResult;

console.log(subDomains); // []
console.log(domain); // "example"
console.log(topLevelDomains); // ["co", "uk"]

👉 Switch over `parseResult.type` to distinguish between different parse results

We recommend switching over the parseResult.type:

switch (parseResult.type) {
	case ParseResultType.Listed: {
		const {hostname, topLevelDomains} = parseResult;

		console.log(`${hostname} belongs to ${topLevelDomains.join(".")}`);
		break;
	}
	case ParseResultType.Reserved:
	case ParseResultType.NotListed: {
		const {hostname} = parseResult;

		console.log(`${hostname} is a reserved or unknown domain`);
		break;
	}
	default:
		throw new Error(`${hostname} is an ip address or invalid domain`);
}

⚠️ Effective TLDs !== TLDs acknowledged by ICANN

What's surprising to a lot of people is that the definition of public suffix means that regular user domains can become effective top-level domains:

const {subDomains, domain, topLevelDomains} = parseDomain(
	"parse-domain.github.io",
);

console.log(subDomains); // []
console.log(domain); // "parse-domain"
console.log(topLevelDomains); // ["github", "io"] 🤯

In this case, github.io is nothing else than a private domain name registrar. github.io is the effective top-level domain and browsers are treating it like that (e.g. for setting document.domain).

If you want to deviate from the browser's understanding of a top-level domain and you're only interested in top-level domains acknowledged by ICANN, there's an icann property:

const parseResult = parseDomain("parse-domain.github.io");
const {subDomains, domain, topLevelDomains} = parseResult.icann;

console.log(subDomains); // ["parse-domain"]
console.log(domain); // "github"
console.log(topLevelDomains); // ["io"]

⚠️ `domain` can also be `undefined`

const {subDomains, domain, topLevelDomains} = parseDomain("co.uk");

console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // ["co", "uk"]

⚠️ `""` is a valid (but reserved) domain

The empty string "" represents the DNS root and is considered to be valid. parseResult.type will be ParseResultType.Reserved in that case:

const {type, subDomains, domain, topLevelDomains} = parseDomain("");

console.log(type === ParseResultType.Reserved); // true
console.log(subDomains); // []
console.log(domain); // undefined
console.log(topLevelDomains); // []

API

🧩 = JavaScript export
🧬 = TypeScript export

🧩 `export parseDomain(hostname: string | typeof NO_HOSTNAME): ParseResult`

Takes a hostname (e.g. "www.example.com") and returns a ParseResult. The hostname must only contain basic latin characters, digits, hyphens and dots. International hostnames must be puny-encoded. Does not throw an error, even with invalid input.

import {parseDomain} from "parse-domain";

const parseResult = parseDomain("www.example.com");

🧩 `export fromUrl(input: string): string | typeof NO_HOSTNAME`

Takes a URL-like string and tries to extract the hostname. Requires the global URL constructor to be available on the platform. Returns the NO_HOSTNAME symbol if the input was not a string or the hostname could not be extracted. Take a look at the test suite for some examples. Does not throw an error, even with invalid input.

🧩 `export NO_HOSTNAME: unique symbol`

NO_HOSTNAME is a symbol that is returned by fromUrl when it was not able to extract a hostname from the given string. When passed to parseDomain, it will always yield a ParseResultInvalid.

🧬 `export ParseResult`

A ParseResult is either a ParseResultInvalid, ParseResultIp, ParseResultReserved, ParseResultNotListed or ParseResultListed.

All parse results have a type property that is either "INVALID", "IP","RESERVED","NOT_LISTED"or"LISTED". Use the exported ParseResultType to check for the type instead of checking against string literals.

All parse results also have a hostname property that provides access to the sanitized hostname that was passed to parseDomain.

🧩 `export ParseResultType`

An object that holds all possible ParseResult type values:

const ParseResultType = {
	Invalid: "INVALID",
	Ip: "IP",
	Reserved: "RESERVED",
	NotListed: "NOT_LISTED",
	Listed: "LISTED",
};

🧬 `export ParseResultType`

This type represents all possible ParseResult type values.

🧬 `export ParseResultInvalid`

Describes the shape of the parse result that is returned when the given hostname does not adhere to RFC 1034:

The hostname is not a string
The hostname is longer than 253 characters
A domain label is shorter than 1 character
A domain label is longer than 63 characters
A domain label contains a character that is not a basic latin character, digit or hyphen

type ParseResultInvalid = {
	type: ParseResultType.INVALID;
	hostname: string | typeof NO_HOSTNAME;
	errors: Array<ValidationError>;
};

// Example

{
	type: "INVALID",
	hostname: ".com",
	errors: [...]
}

🧬 `export ValidationError`

Describes the shape of a validation error as returned by parseDomain

type ValidationError = {
	type: ValidationErrorType;
	message: string;
	column: number;
};

// Example

{
	type: "LABEL_MIN_LENGTH",
	message: `Label "" is too short. Label is 0 octets long but should be at least 1.`,
	column: 1,
}

🧩 `export ValidationErrorType`

An object that holds all possible ValidationError type values:

const ValidationErrorType = {
	NoHostname: "NO_HOSTNAME",
	DomainMaxLength: "DOMAIN_MAX_LENGTH",
	LabelMinLength: "LABEL_MIN_LENGTH",
	LabelMaxLength: "LABEL_MAX_LENGTH",
	LabelInvalidCharacter: "LABEL_INVALID_CHARACTER",
};

🧬 `export ValidationErrorType`

This type represents all possible type values of a ValidationError.

🧬 `export ParseResultIp`

This type describes the shape of the parse result that is returned when the given hostname was an IPv4 or IPv6 address.

type ParseResultIp = {
	type: ParseResultType.Ip;
	hostname: string;
	ipVersion: 4 | 6;
};

// Example

{
	type: "IP",
	hostname: "192.168.0.1",
	ipVersion: 4
}

According to RFC 3986, IPv6 addresses need to be surrounded by [ and ] in URLs. parseDomain accepts both IPv6 address with and without square brackets:

// Recognized as IPv4 address
parseDomain("192.168.0.1");
// Both are recognized as proper IPv6 addresses
parseDomain("::");
parseDomain("[::]");

🧬 `export ParseResultReserved`

This type describes the shape of the parse result that is returned when the given hostname

is the root domain (the empty string "")
belongs to the top-level domain localhost, local, example, invalid or test

type ParseResultReserved = {
	type: ParseResultType.Reserved;
	hostname: string;
	labels: Array<string>;
};

// Example

{
	type: "RESERVED",
	hostname: "pecorino.local",
	labels: ["pecorino", "local"]
}

⚠️ Reserved IPs, such as 127.0.0.1, will not be reported as reserved, but as ParseResultIp. See #117.

🧬 `export ParseResultNotListed`

Describes the shape of the parse result that is returned when the given hostname is valid and does not belong to a reserved top-level domain, but is not listed in the downloaded public suffix list.

type ParseResultNotListed = {
	type: ParseResultType.NotListed;
	hostname: string;
	labels: Array<string>;
};

// Example

{
	type: "NOT_LISTED",
	hostname: "this.is.not-listed",
	labels: ["this", "is", "not-listed"]
}

🧬 `export ParseResultListed`

Describes the shape of the parse result that is returned when the given hostname belongs to a top-level domain that is listed in the public suffix list.

type ParseResultListed = {
	type: ParseResultType.Listed;
	hostname: string;
	labels: Array<string>;
	subDomains: Array<string>;
	domain: string | undefined;
	topLevelDomains: Array<string>;
	icann: {
		subDomains: Array<string>;
		domain: string | undefined;
		topLevelDomains: Array<string>;
	};
};

// Example

{
	type: "LISTED",
	hostname: "parse-domain.github.io",
	labels: ["parse-domain", "github", "io"]
	subDomains: [],
	domain: "parse-domain",
	topLevelDomains: ["github", "io"],
	icann: {
		subDomains: ["parse-domain"],
		domain: "github",
		topLevelDomains: ["io"]
	}
}

License

Unlicense

3.0.4 (2021-09-02)

Bug Fixes

Improve scheme support in fromUrl (4c10202), closes #section-3 #130

Keywords

FAQs

What is parse-domain?

Is parse-domain popular?

Is parse-domain well maintained?

Package last updated on 02 Sep 2021

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

parse-domain

What is parse-domain?

What are parse-domain's main functionalities?

Other packages similar to parse-domain

parse-domain

Installation

Updates

Expected input

Expected output

👉 Invalid domains

👉 IP addresses

👉 Reserved domains

👉 Domains that are not listed

👉 Effective top-level domains

👉 Switch over `parseResult.type` to distinguish between different parse results

⚠️ Effective TLDs !== TLDs acknowledged by ICANN

⚠️ `domain` can also be `undefined`

⚠️ `""` is a valid (but reserved) domain

API

🧩 `export parseDomain(hostname: string | typeof NO_HOSTNAME): ParseResult`

🧩 `export fromUrl(input: string): string | typeof NO_HOSTNAME`

🧩 `export NO_HOSTNAME: unique symbol`

🧬 `export ParseResult`

🧩 `export ParseResultType`

🧬 `export ParseResultType`

🧬 `export ParseResultInvalid`

🧬 `export ValidationError`

🧩 `export ValidationErrorType`

🧬 `export ValidationErrorType`

🧬 `export ParseResultIp`

🧬 `export ParseResultReserved`

🧬 `export ParseResultNotListed`

🧬 `export ParseResultListed`

License

Sponsors

3.0.4 (2021-09-02)

Bug Fixes

Keywords

Related posts

parse-domain

What is parse-domain?

What are parse-domain's main functionalities?

Other packages similar to parse-domain

tldjs

psl

url-parse

parse-domain

Installation

Updates

Expected input

Expected output

👉 Invalid domains

👉 IP addresses

👉 Reserved domains

👉 Domains that are not listed

👉 Effective top-level domains

👉 Switch over parseResult.type to distinguish between different parse results

⚠️ Effective TLDs !== TLDs acknowledged by ICANN

⚠️ domain can also be undefined

⚠️ "" is a valid (but reserved) domain

API

🧩 export parseDomain(hostname: string | typeof NO_HOSTNAME): ParseResult

🧩 export fromUrl(input: string): string | typeof NO_HOSTNAME

🧩 export NO_HOSTNAME: unique symbol

🧬 export ParseResult

🧩 export ParseResultType

🧬 export ParseResultType

🧬 export ParseResultInvalid

🧬 export ValidationError

🧩 export ValidationErrorType

🧬 export ValidationErrorType

🧬 export ParseResultIp

🧬 export ParseResultReserved

🧬 export ParseResultNotListed

🧬 export ParseResultListed

License

Sponsors

3.0.4 (2021-09-02)

Bug Fixes

Keywords

Related posts

GitHub Removes Malicious Pull Requests Targeting Open Source Repositories

RubyGems.org Adds New Maintainer Role

Node.js Implements Stricter Policies for Semver-Major Pull Requests Ahead of Release Deadlines

👉 Switch over `parseResult.type` to distinguish between different parse results

⚠️ `domain` can also be `undefined`

⚠️ `""` is a valid (but reserved) domain

🧩 `export parseDomain(hostname: string | typeof NO_HOSTNAME): ParseResult`

🧩 `export fromUrl(input: string): string | typeof NO_HOSTNAME`

🧩 `export NO_HOSTNAME: unique symbol`

🧬 `export ParseResult`

🧩 `export ParseResultType`

🧬 `export ParseResultType`

🧬 `export ParseResultInvalid`

🧬 `export ValidationError`

🧩 `export ValidationErrorType`

🧬 `export ValidationErrorType`

🧬 `export ParseResultIp`

🧬 `export ParseResultReserved`

🧬 `export ParseResultNotListed`

🧬 `export ParseResultListed`