
Research
Supply Chain Attack on Axios Pulls Malicious Dependency from npm
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.
web-meta-scraper
Advanced tools
A URL scraper for extracting various metadata, including Open Graph, JSON-LD, and more
English | 한국어
A lightweight, plugin-based TypeScript library for extracting web page metadata. Supports Open Graph, Twitter Cards, JSON-LD, oEmbed, and standard meta tags with smart priority-based merging.
| web-meta-scraper | metascraper | open-graph-scraper | |
|---|---|---|---|
| Dependencies | 1 (cheerio) | 10+ | 4+ |
| Bundle size | ~5KB min+gzip | ~50KB+ | ~15KB+ |
| Plugin system | Composable plugins | Rule-based | Monolithic |
| Custom plugins | Simple function | Complex rules | Not supported |
| TypeScript | First-class | Partial | Partial |
| oEmbed support | Built-in plugin | Separate package | Not supported |
| Custom resolve rules | Configurable priority | Fixed | Fixed |
| Native fetch | Uses native fetch() | Uses got | Uses undici |
fetch() for HTTP requests.ResolvedMetadata, ScraperResult, and plugin types.metadata and raw sources from each plugin for full transparency.npm install web-meta-scraper
# or
pnpm add web-meta-scraper
# or
yarn add web-meta-scraper
# or
bun add web-meta-scraper
scrape() functionThe easiest way to get started. Auto-detects URL vs HTML and uses all built-in plugins:
import { scrape } from 'web-meta-scraper';
// From URL
const result = await scrape('https://example.com');
// From HTML string
const result = await scrape('<html><head><title>Hello</title></head></html>');
console.log(result.metadata);
// {
// title: "Example",
// description: "An example page",
// image: "https://example.com/og-image.png",
// url: "https://example.com",
// type: "website",
// siteName: "Example",
// ...
// }
// Raw plugin outputs are also available
console.log(result.sources);
// { "open-graph": { title: "Example", ... }, "meta-tags": { ... }, ... }
createScraper()For full control over plugins, resolve rules, fetch options, and post-processing:
import { createScraper, metaTags, openGraph, twitter, jsonLd, oembed } from 'web-meta-scraper';
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter, jsonLd, oembed],
fetch: {
timeout: 10000,
userAgent: 'MyBot/1.0',
},
postProcess: {
maxDescriptionLength: 150,
secureImages: true,
},
});
// Scrape from URL
const result = await scraper.scrapeUrl('https://example.com');
// Or parse raw HTML
const result = await scraper.scrape(html, { url: 'https://example.com' });
| Plugin | Import | Extracts |
|---|---|---|
| Meta Tags | metaTags | title, description, keywords, author, favicon, canonicalUrl |
| Open Graph | openGraph | og:title, og:description, og:image, og:url, og:type, og:site_name, og:locale |
| Twitter Cards | twitter | twitter:title, twitter:description, twitter:image, twitter:card, twitter:site, twitter:creator |
| JSON-LD | jsonLd | Structured data (Article, Product, Organization, FAQPage, BreadcrumbList, etc.) |
| oEmbed | oembed | oEmbed data (title, author_name, thumbnail_url, html, etc.) |
| Favicons | favicons | All icon links (icon, apple-touch-icon, mask-icon, manifest) with sizes and type |
| Feeds | feeds | RSS (application/rss+xml) and Atom (application/atom+xml) feed links with title |
| Robots | robots | Robots meta directives (noindex, nofollow, noarchive, nosnippet, etc.) with indexability flags |
| Date | date | Publication date (article:published_time, Dublin Core, JSON-LD, <time>) and modification date |
| Logo | logo | Site logo URL from og:logo, Schema.org microdata, JSON-LD Organization/Publisher |
| Lang | lang | Document language as BCP 47 tag from <html lang>, og:locale, content-language, JSON-LD |
| Video | video | Video resources from og:video, twitter:player, <video> elements, JSON-LD VideoObject |
| Audio | audio | Audio resources from og:audio, <audio> elements, JSON-LD AudioObject |
| iFrame | iframe | Embeddable iframe HTML from twitter:player with oEmbed fallback |
// Use only what you need
const scraper = createScraper({
plugins: [openGraph, twitter],
});
Note: The
scrape()shorthand uses only the core plugins (metaTags,openGraph,jsonLd) by default. To use other plugins likefavicons,feeds,robots,date,logo,lang,video,audio, oriframe, pass them explicitly viacreateScraper().
Scrape multiple URLs concurrently with batchScrape(). Uses a promise-based worker pool with no external dependencies. Each URL is processed independently — one failure won't stop the rest.
import { batchScrape } from 'web-meta-scraper';
const results = await batchScrape(
['https://example.com', 'https://github.com', 'https://nodejs.org'],
{ concurrency: 3 },
);
for (const r of results) {
if (r.success) {
console.log(r.url, r.result.metadata.title);
} else {
console.error(r.url, r.error);
}
}
When the same field exists in multiple sources, the highest-priority value wins:
| Field | Priority (high → low) |
|---|---|
title | Open Graph → Meta Tags → Twitter |
description | Open Graph → Meta Tags → Twitter |
image | Open Graph → Twitter |
url | Open Graph → Meta Tags (canonical) |
Source-specific fields (twitterCard, siteName, locale, jsonLd, oembed, etc.) are always included directly.
You can override the default rules:
import { createScraper, metaTags, openGraph, twitter } from 'web-meta-scraper';
const scraper = createScraper({
plugins: [metaTags, openGraph, twitter],
rules: [
{
field: 'title',
sources: [
{ plugin: 'twitter', key: 'title', priority: 3 }, // Twitter first
{ plugin: 'open-graph', key: 'title', priority: 2 },
{ plugin: 'meta-tags', key: 'title', priority: 1 },
],
},
// ... other rules
],
});
ScraperConfigconst scraper = createScraper({
// Plugins to use
plugins: [metaTags, openGraph, twitter, jsonLd, oembed],
// Resolve rules (default: DEFAULT_RULES)
rules: DEFAULT_RULES,
// Fetch options (for scrapeUrl)
fetch: {
timeout: 30000, // Request timeout in ms (default: 30000)
userAgent: 'MyBot/1.0', // Custom User-Agent header
followRedirects: true, // Follow HTTP redirects (default: true)
maxContentLength: 5242880, // Max response size in bytes (default: 5MB)
},
// Post-processing options
postProcess: {
maxDescriptionLength: 200, // Truncate description (default: 200)
secureImages: true, // Convert image URLs to HTTPS (default: true)
omitEmpty: true, // Remove empty/null values (default: true)
fallbacks: true, // Apply fallback logic (default: true)
},
});
Some websites block automated requests via TLS fingerprinting. Enable stealth mode to use HTTP/2 with a browser-like TLS fingerprint:
const scraper = createScraper({
plugins: [metaTags, openGraph],
fetch: {
stealth: true,
},
});
Warning: Stealth mode is disabled by default. Rapid requests with stealth mode may trigger rate limiting (e.g. JS challenge pages). Always respect
robots.txtand site terms of service. Use responsibly.
When fallbacks: true (default):
title is missing, siteName is used insteaddescription is missing, it's extracted from JSON-LD structured dataA plugin is a function that receives a ScrapeContext and returns a PluginResult:
import type { Plugin } from 'web-meta-scraper';
const pricePlugin: Plugin = (ctx) => {
const { $ } = ctx; // Cheerio instance
const price = $('[itemprop="price"]').attr('content');
const currency = $('[itemprop="priceCurrency"]').attr('content');
return {
name: 'price',
data: { price, currency },
};
};
const scraper = createScraper({
plugins: [openGraph, pricePlugin],
rules: [
...DEFAULT_RULES,
{ field: 'price', sources: [{ plugin: 'price', key: 'price', priority: 1 }] },
{ field: 'currency', sources: [{ plugin: 'price', key: 'currency', priority: 1 }] },
],
});
import { scrape, ScraperError } from 'web-meta-scraper';
try {
const result = await scrape('https://example.com');
} catch (error) {
if (error instanceof ScraperError) {
console.error(error.message); // e.g. "Request timeout after 30000ms"
console.error(error.cause); // Original error, if any
}
}
validateMetadata() scores metadata quality (0–100) and reports issues across 14 SEO rules:
import { scrape, validateMetadata } from 'web-meta-scraper';
const result = await scrape('https://example.com');
const validation = validateMetadata(result);
console.log(validation.score); // 85
console.log(validation.issues);
// [
// { field: "description", severity: "warning", message: "Description is too short (under 50 characters)" },
// ]
extractContent() strips navigation, ads, and sidebars to extract the main text content from a web page:
import { extractContent } from 'web-meta-scraper';
const content = await extractContent('https://example.com/article');
console.log(content.content); // "Article body text..."
console.log(content.wordCount); // 1234
console.log(content.language); // "en"
console.log(content.metadata); // { title: "Article Title", description: "..." }
Supports CJK word counting and provides extractFromHtml() for parsing raw HTML strings.
web-meta-scraper-mcp provides an MCP (Model Context Protocol) server that exposes web-meta-scraper as tools for AI assistants like Claude Code and Claude Desktop.
Claude Code:
claude mcp add web-meta-scraper -- npx -y web-meta-scraper-mcp
Claude Desktop / Cursor:
Add to your config file:
{
"mcpServers": {
"web-meta-scraper": {
"command": "npx",
"args": ["-y", "web-meta-scraper-mcp"]
}
}
}
| Tool | Description |
|---|---|
scrape_url | Extract metadata from a URL (Open Graph, Twitter Cards, JSON-LD, meta tags, favicons, feeds, robots) |
scrape_html | Extract metadata from raw HTML string with optional base URL for resolving relative paths |
batch_scrape | Scrape metadata from multiple URLs concurrently |
detect_feeds | Detect RSS and Atom feed links from a web page |
check_robots | Check robots meta tag directives and indexing status |
validate_metadata | Validate metadata quality and generate an SEO score report |
extract_content | Extract main text content from a web page |
See the MCP package README for detailed usage and examples.
MIT
FAQs
A URL scraper for extracting various metadata, including Open Graph, JSON-LD, and more
The npm package web-meta-scraper receives a total of 6 weekly downloads. As such, web-meta-scraper popularity was classified as not popular.
We found that web-meta-scraper demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.

Security News
TeamPCP is partnering with ransomware group Vect to turn open source supply chain attacks on tools like Trivy and LiteLLM into large-scale ransomware operations.