openGraphScraper
A simple node module for scraping Open Graph and Twitter Card info off a site. For browser usage, we recommend using ky to make the requests(or a backend service) then pass in the html
into open-graph-scraper
using the html
option.
Installation
npm install open-graph-scraper --save
Usage
Callback Example:
const ogs = require('open-graph-scraper');
const options = { url: 'http://ogp.me/' };
ogs(options, (error, results, response) => {
console.log('error:', error);
console.log('results:', results);
console.log('response:', response);
});
Promise Example:
const ogs = require('open-graph-scraper');
const options = { url: 'http://ogp.me/' };
ogs(options)
.then((data) => {
const { error, result, response } = data;
console.log('error:', error);
console.log('result:', result);
console.log('response:', response);
})
Results JSON
Check the return for a success
flag. If success is set to true, then the url input was valid. Otherwise it will be set to false. The above example will return something like...
{
ogTitle: 'Open Graph protocol',
ogType: 'website',
ogUrl: 'http://ogp.me/',
ogDescription: 'The Open Graph protocol enables any web page to become a rich object in a social graph.',
ogImage: {
url: 'http://ogp.me/logo.png',
width: '300',
height: '300',
type: 'image/png'
},
requestUrl: 'http://ogp.me/',
success: true
}
Options
Name | Info | Default Value | Required |
---|
url | URL of the site. | | x |
timeout | Timeout of the request | 2000 ms | |
html | You can pass in an HTML string to run ogs on it. (use without options.url) | | |
blacklist | Pass in an array of sites you don't want ogs to run on. | [] | |
onlyGetOpenGraphInfo | Only fetch open graph info and don't fall back on anything else. | false | |
ogImageFallback | Fetch other images if no open graph ones are found. | true | |
customMetaTags | Here you can define custom meta tags you want to scrape. | [] | |
allMedia | By default, OGS will only send back the first image/video it finds | false | |
decompress | Set the accept-encoding to gzip/deflate | true | |
followRedirect | Defines if redirect responses should be followed automatically. | true | |
maxRedirects | Max number of redirects ogs will follow. | 10 | |
retry | Number of times ogs will retry the request. | 2 | |
headers | An object containing request headers. Useful for setting the user-agent | {} | |
peekSize | Sets the peekSize for the request | 1024 | |
agent | Used for Proxies, Look below for notes on how to use. | null | |
downloadLimit | Maximum size of the content downloaded from the server, in bytes | 1000000 (1MB) | |
urlValidatorSettings | Sets the options used by validator.js for testing the URL | Here | |
Note: open-graph-scraper
uses got for requests and most of got's options should work as open-graph-scraper
options.
Custom Meta Tag Example
const ogs = require('open-graph-scraper');
const options = {
url: 'https://github.com/jshemas/openGraphScraper',
customMetaTags: [{
multiple: false,
property: 'hostname',
fieldName: 'hostnameMetaTag',
}],
};
ogs(options)
.then((data) => {
const { error, result, response } = data;
console.log('hostnameMetaTag:', result.hostnameMetaTag);
})
Proxy Example
Look here for more info on how to use proxies.
const ogs = require('open-graph-scraper');
const tunnel = require('tunnel');
const options = {
url: 'https://whatismyipaddress.com/',
timeout: 15000,
agent: {
https: tunnel.httpsOverHttp({
proxy: {
host: 'proxy_ip',
port: proxyPort,
rejectUnauthorized: false,
}
})
}
};
ogs(options)
.then((data) => {
const { error, result, response } = data;
console.log('response:', response);
})
User Agent Example
const ogs = require("open-graph-scraper");
const options = {
url: "https://twitter.com/elonmusk/status/1364826301027115008",
headers: {
"user-agent": "Googlebot/2.1 (+http://www.google.com/bot.html)",
},
};
ogs(options, (error, results) => {
console.log("error:", error);
console.log("results:", results);
});
Tests
Then you can run the tests by running...
npm run test