Socket
Book a DemoInstallSign in
Socket

metaweb

Package Overview
Dependencies
Maintainers
1
Versions
11
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

metaweb

get some metadata for a web page

latest
Source
npmnpm
Version
0.1.0
Version published
Weekly downloads
1
Maintainers
1
Weekly downloads
 
Created
Source

metaweb

build-status

metaweb will extract metadata for a web page. Only metadata for the webpage itself is extracted, not metadata for items within the page. metaweb attempts to extract common metadata from standard HTML, Twitter Cards and Facebook's Open Graph Protocol. It is not meant to be perfect, or adhere to any particular overarching standard, but just to scratch a particular itch I had at the time. If you've got your own itch to scratch please add an issue.

The name metaweb pays homage to one of the more forward looking startups of the same name, who created one of the first community driven entity databases on the web.

Install

npm install metaweb

Command Line

When you install metaweb you will get a command line program:

% metaweb http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/
{
  "url": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
  "canonical": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
  "status": 200,
  "content_type": "text/html",
  "title": "NSA slides explain the PRISM data-collection program - The Washington Post",
  "description": "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities.",
  "image": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
}

Use the --includeRaw parameter to include all the ran meta and link content.

metaweb http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/ --includeRaw
{
  "url": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
  "canonical": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/",
  "status": 200,
  "content_type": "text/html",
  "title": "NSA slides explain the PRISM data-collection program - The Washington Post",
  "description": "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities.",
  "image": "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg",
  "raw": {
    "link": {
      "canonical": [
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
      ],
      "shorturl": [
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
      ],
      "stylesheet": [
        "http://css.wpdigital.net/wpost/css/combo?context=eidos&c=true&m=true&r=/2.0.0/reset.css&r=/2.0.0/structure.css&r=/2.0.0/header.css&r=/2.0.0/footer.css&r=/2.0.0/right-rail.css&r=/2.0.0/rules.css&r=/2.0.0/forms.css&r=/2.0.0/base.css&r=/2.0.0/flipper.css&r=/2.0.0/modules.css&r=/2.0.0/wsodEWA.css&r=/2.0.0/ads.css&r=/2.0.0/fonts/font_FranklinITCProBold.css",
        "http://css.wpdigital.net/wp-srv/graphics/css/pretty-comments.css",
        "http://css.wpdigital.net/wp-srv/graphics/css/staticbase-2.0.css",
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/css/prism.css"
      ]
    },
    "meta": {
      "twitter:title": [
        "NSA slides explain the PRISM data-collection program"
      ],
      "description": [
        "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities."
      ],
      "twitter:description": [
        "Through a Top-Secret program authorized by federal judges working under the Foreign Intelligence Surveillance Act (FISA), the U.S. intelligence community can gain access to the servers of nine internet companies for a wide range of digital data. Documents describing the previously undisclosed program, obtained by The Washington Post, show the breadth of U.S. electronic surveillance capabilities."
      ],
      "keywords": [
        "nsa, security, privacy, government data collection, nsa data collection, nsa prism program, prism data collection, prism program"
      ],
      "twitter:url": [
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/"
      ],
      "og:image": [
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
      ],
      "twitter:image": [
        "http://www.washingtonpost.com/wp-srv/special/politics/prism-collection-documents/images/upstream-promo-296.jpg"
      ],
      "twitter:site": [
        "@postgraphics"
      ],
      "twitter:card": [
        "summary"
      ],
      "fb:app_id": [
        "41245586762"
      ],
      "og:site_name": [
        "The Washington Post"
      ]
    },
    "title": "NSA slides explain the PRISM data-collection program - The Washington Post"
  }
}

JavaScript

Usually you will probably want to use metaweb as a library in your own JavaScript applications:

metaweb = require('metaweb')

metadata = metaweb.get(url).then((metadata) => {
  // do something with the metadata
})

If you would like to also get the raw link and meta content use the includeRaw parameter:

metaweb.get(url, includeRaw=true)

FAQs

Package last updated on 05 Feb 2022

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts