Security News
New Python Packaging Proposal Aims to Solve Phantom Dependency Problem with SBOMs
PEP 770 proposes adding SBOM support to Python packages to improve transparency and catch hidden non-Python dependencies that security tools often miss.
metascraper
Advanced tools
A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.
A library to easily scrape metadata from an article on the web using Open Graph metadata, regular HTML metadata, and series of fallbacks.
This metadata...
{
"author": "Ellen Huet",
"date": "2016-05-24T18:00:03.894Z",
"description": "The HR startups go to war.",
"image": "https://assets.bwbx.io/images/users/iqjWHBFdfxIU/ioh_yWEn8gHo/v1/-1x-1.jpg",
"publisher": "Bloomberg.com",
"title": "As Zenefits Stumbles, Gusto Goes Head-On by Selling Insurance"
}
...would be scraped from this article...
scrapeUrl(url, [rules])
import { scrapeUrl } from 'metascraper'
const metadata = await scrapeUrl('http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance')
Scrapes the url
with matching rules
.
scrapeHtml(html, [rules])
import { scrapeHtml } from 'metascraper'
const res = fetch('http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance')
const html = res.body
const metadata = await scrapeHtml(html)
Scrapes the html
string with matching rules
.
scrapeWindow(window, [rules])
import { scrapeWindow } from 'metascraper'
const metadata = await scrapeWindow(window)
Scrapes the window
object with matching rules
.
Scraping rules are just asynchonous functions that get passed the window
object and return a promise that resolves with the value of the metadata. Like so:
async function rule(window) {
const el = window.document.title
return el.textContent
}
In the browser window
will be the global you'd expect, and server-side it would be a JSDOM instance.
Passing rules
is optional, and the defaults have been configured for the best results when scraping web articles. But if you want to tweak the results, or want to scrape for additional metadata, you can pass in additional rules, or completely overwrite them.
For an idea of how rules work, check out the default rules.
FAQs
A library to easily scrape metadata from an article on the web using Open Graph, JSON+LD, regular HTML metadata, and series of fallbacks.
The npm package metascraper receives a total of 16,320 weekly downloads. As such, metascraper popularity was classified as popular.
We found that metascraper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
PEP 770 proposes adding SBOM support to Python packages to improve transparency and catch hidden non-Python dependencies that security tools often miss.
Security News
Socket CEO Feross Aboukhadijeh discusses open source security challenges, including zero-day attacks and supply chain risks, on the Cyber Security Council podcast.
Security News
Research
Socket researchers uncover how threat actors weaponize Out-of-Band Application Security Testing (OAST) techniques across the npm, PyPI, and RubyGems ecosystems to exfiltrate sensitive data.