Socket
Socket
Sign inDemoInstall

open-graph-scraper

Package Overview
Dependencies
Maintainers
1
Versions
108
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

open-graph-scraper

Node.js scraper module for Open Graph and Twitter Card info


Version published
Weekly downloads
60K
increased by5.58%
Maintainers
1
Weekly downloads
 
Created
Source

openGraphScraper

Node.js CI Known Vulnerabilities

A simple node module(with TypeScript declarations) for scraping Open Graph and Twitter Card and other metadata off a site.

Note: open-graph-scraper doesn't support browser usage at this time but you can use open-graph-scraper-lite if you already have the HTML and can't use Node's Fetch API.

Installation

npm install open-graph-scraper --save

Usage

const ogs = require('open-graph-scraper');
const options = { url: 'http://ogp.me/' };
ogs(options)
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  })

Results JSON

Check the return for a success flag. If success is set to true, then the url input was valid. Otherwise it will be set to false. The above example will return something like...

{
  ogTitle: 'Open Graph protocol',
  ogType: 'website',
  ogUrl: 'https://ogp.me/',
  ogDescription: 'The Open Graph protocol enables any web page to become a rich object in a social graph.',
  ogImage: [
    {
      height: '300',
      type: 'image/png',
      url: 'https://ogp.me/logo.png',
      width: '300'
    }
  ],
  charset: 'utf-8',
  requestUrl: 'http://ogp.me/',
  success: true
}

Options

NameInfoDefault ValueRequired
urlURL of the site.x
htmlYou can pass in an HTML string to run ogs on it. (use without options.url)
fetchOptionsOptions that are used by the Fetch API{}
timeoutRequest timeout for Fetch (Default is 10 seconds)10
blacklistPass in an array of sites you don't want ogs to run on.[]
onlyGetOpenGraphInfoOnly fetch open graph info and don't fall back on anything else. Also accepts an array of properties for which no fallback should be usedfalse
customMetaTagsHere you can define custom meta tags you want to scrape.[]
urlValidatorSettingsSets the options used by validator.js for testing the URLHere

Note: open-graph-scraper uses the Fetch API for requests and most of Fetch's options should work as open-graph-scraper's fetchOptions options.

Types And Import Example

// example of how to get types
import type { SuccessResult } from 'open-graph-scraper/types';
const example: SuccessResult = {
  result: { ogTitle: 'this is a title' },
  error: false,
  response: {},
  html: '<html></html>'
}

// import example
import ogs from 'open-graph-scraper';
const options = { url: 'http://ogp.me/' };
ogs(options)
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  });

Custom Meta Tag Example

const ogs = require('open-graph-scraper');
const options = {
  url: 'https://github.com/jshemas/openGraphScraper',
  customMetaTags: [{
    multiple: false, // is there more than one of these tags on a page (normally this is false)
    property: 'hostname', // meta tag name/property attribute
    fieldName: 'hostnameMetaTag', // name of the result variable
  }],
};
ogs(options)
  .then((data) => {
    const { result } = data;
    console.log('hostnameMetaTag:', result.customMetaTags.hostnameMetaTag); // hostnameMetaTag: github.com
  })

HTML Example

const ogs = require('open-graph-scraper');
const options = {
  html: `<html><head>
  <link rel="icon" type="image/png" href="https://bar.com/foo.png" />
  <meta charset="utf-8" />
  <meta property="og:description" name="og:description" content="html description example" />
  <meta property="og:image" name="og:image" content="https://www.foo.com/bar.jpg" />
  <meta property="og:title" name="og:title" content="foobar" />
  <meta property="og:type" name="og:type" content="website" />
  </head></html>`
};
ogs(options)
  .then((data) => {
    const { result } = data;
    console.log('result:', result);
    // result: {
    //   ogDescription: 'html description example',
    //   ogTitle: 'foobar',
    //   ogType: 'website',
    //   ogImage: [ { url: 'https://www.foo.com/bar.jpg', type: 'jpg' } ],
    //   favicon: 'https://bar.com/foo.png',
    //   charset: 'utf-8',
    //   success: true
    // }
  })

User Agent Example

The request header is set to undici by default. Some sites might block this, and changing the userAgent might work. If not you can try using a proxy for the request and then pass the html into open-graph-scraper.

const ogs = require("open-graph-scraper");
const userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36';
ogs({ url: 'https://www.wikipedia.org/', fetchOptions: { headers: { 'user-agent': userAgent } } })
  .then((data) => {
    const { error, html, result, response } = data;
    console.log('error:', error);  // This returns true or false. True if there was an error. The error itself is inside the result object.
    console.log('html:', html); // This contains the HTML of page
    console.log('result:', result); // This contains all of the Open Graph results
    console.log('response:', response); // This contains response from the Fetch API
  })

Running the example app

Inside the example folder contains a simple express app where you can run npm ci && npm run start to spin up. Once the app is running, open a web browser and go to http://localhost:3000/scraper?url=http://ogp.me/ to test it out. There is also a Dockerfile if you want to run this example app in a docker container.

Keywords

FAQs

Package last updated on 29 Aug 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc