What is link-preview-js?
The link-preview-js npm package is used to generate link previews from URLs. It fetches metadata from the provided URL and returns information such as the title, description, image, and more. This is useful for creating rich link previews in applications like social media platforms, messaging apps, and content management systems.
What are link-preview-js's main functionalities?
Fetch Link Preview
This feature allows you to fetch a link preview from a given URL. The `getLinkPreview` function returns a promise that resolves with metadata such as the title, description, and images from the URL.
const { getLinkPreview } = require('link-preview-js');
getLinkPreview('https://www.example.com').then((data) => {
console.log(data);
});
Custom Fetch Options
This feature allows you to customize the fetch options, such as setting custom headers. This can be useful for bypassing restrictions or simulating different user agents.
const { getLinkPreview } = require('link-preview-js');
getLinkPreview('https://www.example.com', {
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
}).then((data) => {
console.log(data);
});
Handle Different Content Types
This feature allows you to handle different content types returned by the URL. The `getLinkPreview` function provides the content type in the response, enabling you to process HTML, JSON, or other types of content accordingly.
const { getLinkPreview } = require('link-preview-js');
getLinkPreview('https://www.example.com').then((data) => {
if (data.contentType === 'text/html') {
console.log('HTML content:', data);
} else if (data.contentType === 'application/json') {
console.log('JSON content:', data);
}
});
Other packages similar to link-preview-js
unfurl.js
unfurl.js is a package that also fetches metadata from URLs to generate link previews. It provides similar functionality to link-preview-js but offers more customization options and supports additional metadata extraction methods.
metascraper
metascraper is a library designed to scrape metadata from web pages. It is highly customizable and allows you to define your own rules for extracting metadata. Compared to link-preview-js, metascraper offers more flexibility and control over the scraping process.
open-graph-scraper
open-graph-scraper is a package focused on extracting Open Graph metadata from URLs. It is specifically designed for Open Graph tags, making it a good choice if you primarily need Open Graph data. It is more specialized compared to the broader functionality of link-preview-js.
Link Preview JS
Before creating an issue
It's more than likely there is nothing wrong with the library:
- It's very simple; fetch HTML, parse HTML, and search for OpenGraph HTML tags.
- Unless HTML or the OpenGraph standard change, the library will not break
- If the target website you are trying to preview redirects you to a login page the preview will fail, because it will parse the login page
- If the target website does not have OpenGraph tags the preview will most likely fail, there are some fallbacks but in general, it will not work
- You cannot preview (fetch) another web page from YOUR web page. This is an intentional security feature of browsers called CORS
If you use this library and find it useful please consider sponsoring me, open source takes a lot of time and effort.
Link Preview
Allows you to extract information from an HTTP URL/link (or parse an HTML string) and retrieve meta information such as title, description, images, videos, etc. via OpenGraph tags.
GOTCHAs
- You cannot request a different domain from your web app (Browsers block cross-origin-requests). If you don't know how same-origin-policy works, here is a good intro, therefore this library works on Node.js and certain mobile run-times (Cordova or React-Native).
- This library acts as if the user would visit the page, sites might re-direct you to sign-up pages, consent screens, etc. You can try to change the user-agent header (try with
google-bot
or with Twitterbot
), but you need to work around these issues yourself.
API
getLinkPreview
: you have to pass a string, doesn't matter if it is just a URL or a piece of text that contains a URL, the library will take care of parsing it and returning the info o the first valid HTTP(S) URL info it finds.
getPreviewFromContent
: useful for passing a pre-fetched Response object from an existing async/etc. call. Refer to the example below for required object values.
import { getLinkPreview, getPreviewFromContent } from "link-preview-js";
getLinkPreview("https://www.youtube.com/watch?v=MejbOFk7H6c").then((data) =>
console.debug(data)
);
getLinkPreview(
"This is a text supposed to be parsed and the first link displayed https://www.youtube.com/watch?v=MejbOFk7H6c"
).then((data) => console.debug(data));
yourAjaxCall(url, (response) => {
getPreviewFromContent(response).then((data) => console.debug(data));
});
Options
Additionally, you can pass an options object which should add more functionality to the parsing of the link
Property Name | Result |
---|
imagesPropertyType (optional) (ex: 'og') | Fetches images only with the specified property, meta[property='${imagesPropertyType}:image'] |
headers (optional) (ex: { 'user-agent': 'googlebot', 'Accept-Language': 'en-US' }) | Add request headers to fetch call |
timeout (optional) (ex: 1000) | Timeout for the request to fail |
followRedirects (optional) (default 'error') | For security reasons, the library does not automatically follow redirects ('error' value), a malicious agent can exploit redirects to steal data, posible values: ('error', 'follow', 'manual') |
handleRedirects (optional) (with followRedirects 'manual') | When followRedirects is set to 'manual' you need to pass a function that validates if the redirectinon is secure, below you can find an example |
resolveDNSHost (optional) | Function that resolves the final address of the detected/parsed URL to prevent SSRF attacks |
getLinkPreview("https://www.youtube.com/watch?v=MejbOFk7H6c", {
imagesPropertyType: "og",
headers: {
"user-agent": "googlebot",
"Accept-Language": "fr-CA",
},
timeout: 1000
}).then(data => console.debug(data));
SSRF Concerns
Doing requests on behalf of your users or using user-provided URLs is dangerous. One of such attack is trying to fetch a domain that redirects to localhost so the users get the contents of your server (doesn't affect mobile runtimes). To mitigate this attack you can use the resolveDNSHost option:
const dns = require("node:dns");
getLinkPreview("http://maliciousLocalHostRedirection.com", {
resolveDNSHost: async (url: string) => {
return new Promise((resolve, reject) => {
const hostname = new URL(url).hostname;
dns.lookup(hostname, (err, address, family) => {
if (err) {
reject(err);
return;
}
resolve(address);
});
});
},
}).catch((e) => {
});
This might add some latency to your request but prevents loopback attacks.
Redirections
Same to SSRF, following redirections is dangerous, the library errors by default when the response tries to redirect the user. There are however some simple redirections that are valid (e.g. HTTP to HTTPS) and you might want to allow them, you can do it via:
await getLinkPreview(`http://google.com/`, {
followRedirects: `manual`,
handleRedirects: (baseURL: string, forwardedURL: string) => {
const urlObj = new URL(baseURL);
const forwardedURLObj = new URL(forwardedURL);
if (
forwardedURLObj.hostname === urlObj.hostname ||
forwardedURLObj.hostname === "www." + urlObj.hostname ||
"www." + forwardedURLObj.hostname === urlObj.hostname
) {
return true;
} else {
return false;
}
},
});
Response
Returns a Promise that resolves with an object describing the provided link.
The info object returned varies depending on the content type (MIME type) returned
in the HTTP response (see below for variations of response). Rejects with an error if the response can not be parsed or if there was no URL in the text provided.
Text/HTML URL
{
url: "https://www.youtube.com/watch?v=MejbOFk7H6c",
title: "OK Go - Needing/Getting - Official Video - YouTube",
siteName: "YouTube",
description: "Buy the video on iTunes: https://itunes.apple.com/us/album/needing-getting-bundle-ep/id508124847 See more about the guitars at: http://www.gretschguitars.com...",
images: ["https://i.ytimg.com/vi/MejbOFk7H6c/maxresdefault.jpg"],
mediaType: "video.other",
contentType: "text/html",
charset: "utf-8"
videos: [],
favicons:["https://www.youtube.com/yts/img/favicon_32-vflOogEID.png","https://www.youtube.com/yts/img/favicon_48-vflVjB_Qk.png","https://www.youtube.com/yts/img/favicon_96-vflW9Ec0w.png","https://www.youtube.com/yts/img/favicon_144-vfliLAfaB.png","https://s.ytimg.com/yts/img/favicon-vfl8qSV2F.ico"]
}
Image URL
{
url: "https://media.npr.org/assets/img/2018/04/27/gettyimages-656523922nunes-4bb9a194ab2986834622983bb2f8fe57728a9e5f-s1100-c15.jpg",
mediaType: "image",
contentType: "image/jpeg",
favicons: [ "https://media.npr.org/favicon.ico" ]
}
Audio URL
{
url: "https://ondemand.npr.org/anon.npr-mp3/npr/atc/2007/12/20071231_atc_13.mp3",
mediaType: "audio",
contentType: "audio/mpeg",
favicons: [ "https://ondemand.npr.org/favicon.ico" ]
}
Video URL
{
url: "https://www.w3schools.com/html/mov_bbb.mp4",
mediaType: "video",
contentType: "video/mp4",
favicons: [ "https://www.w3schools.com/favicon.ico" ]
}
Application URL
{
url: "https://assets.curtmfg.com/masterlibrary/56282/installsheet/CME_56282_INS.pdf",
mediaType: "application",
contentType: "application/pdf",
favicons: [ "https://assets.curtmfg.com/favicon.ico" ]
}
License
MIT license