Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

site-metadata-extractor

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

site-metadata-extractor

web(site) resource metadata extractor

1.0.7
latest
Source
npm

Version published: 9 months ago

Weekly downloads: 4; increased by33.33%

Maintainers: 1

Weekly downloads

Created: 2 years ago

Source

Site Metadata Extractor

Cleans and extracts a web(site) resource's metadata.

Metadata extraction fields currently supported:

Name	Data Type
author	array (jsonb)
canonical_url	string
copyright	string
date (publish date)	date
description	text
favicon	text
image (primary/og image)	text
jsonld (structured data)	object (jsonb)
keywords	array (jsonb)
lang	string
locale	string
origin	string
publisher	string
site_name	string
tags	array (jsonb)
title	string
type	string
truncated_text	text
status	string
videos	array (jsonb)
links	array (jsonb)

Install

NPM:

$ npm install site-metadata-extractor --save

Yarn:

$ yarn add site-metadata-extractor

Usage

Feed in a raw markup from a webpage to get extracted metadata fields.

From .html file:

import fs from "fs";
import siteMetadataExtractor from "site-metadata-extractor";

const getMetadataFromFile = (filename) => {
  const filepath = path.resolve(__dirname, `../data/${filename}.html`);
  const markup = fs.readFileSync(filepath).toString();
  // feel free to use localhost as the second parameter for testing
  const metadata = siteMetadataExtractor(markup, "YOUR_SITE_ORIGIN_HERE");
  return metadata;
};

getMetadataFromFile("example");

From a server request:

import axios from 'axios';
import siteMetadataExtractor from 'site-metadata-extractor';

const processSite = async (url) => {
  return axios.get(url, config = {})
    .then(res => {
      const { headers } = res;
      const contentType = headers['content-type'];
      if (contentType.includes('text/html')) {
        return {
          body: res.data,
          url
        };
      } else {
        return {};
      }
    })
    .catch(err => {
      console.log(err);
    });
};

processSite('https://www.cnbc.com/guide/personal-finance-101-the-complete-guide-to-managing-your-money/`)
	.then((data) => {
		...
    siteMetadataExtractor(data, "https://www.cnbc.com/guide/personal-finance-101-the-complete-guide-to-managing-your-money/", "en");
    ...
	});

Development

Run: git clone https://github.com/sc10ntech/site-metadata-extractor.git
Change into project directory and install deps: cd site-metadata-extractor && npm i

Creids & Disclaimer

site-metadata-extractor was inspired by, and tries to be the spiritual successor to node-unfluff

Keywords

FAQs

What is site-metadata-extractor?

Is site-metadata-extractor popular?

Is site-metadata-extractor well maintained?

Package last updated on 10 Apr 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

site-metadata-extractor

Site Metadata Extractor

Install

Usage

Development

Creids & Disclaimer

1.0.7 (2024-04-10)

Bug Fixes

Keywords

Related posts

site-metadata-extractor

Site Metadata Extractor

Install

Usage

Development

Creids & Disclaimer

1.0.7 (2024-04-10)

Bug Fixes

Keywords

Related posts

Malicious PyPI Package ‘pycord-self’ Targets Discord Developers with Token Theft and Backdoor Exploit

UK Officials Consider Banning Ransomware Payments from Public Entities