
Security News
Software Engineering Daily Podcast: Feross on AI, Open Source, and Supply Chain Risk
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.
meta-oxide
Advanced tools
Universal metadata extraction library supporting 13 formats (HTML Meta, Open Graph, Twitter Cards, JSON-LD, Microdata, Microformats, RDFa, Dublin Core, Web App Manifest, oEmbed, rel-links, Images, SEO) with 7 language bindings
The Universal Metadata Extraction Library - Blazing-fast, production-ready metadata extraction from HTML in 7 programming languages.
MetaOxide is 200-570x faster than traditional metadata extraction libraries while extracting 13 metadata formats out of the box. Built in Rust with native bindings for Python, Go, Node.js, Java, C#, and WebAssembly.
cargo add meta_oxide
use meta_oxide::MetaOxide;
let html = r#"<!DOCTYPE html>..."#;
let extractor = MetaOxide::new(html, "https://example.com")?;
let metadata = extractor.extract_all()?;
println!("Title: {:?}", metadata.get("title"));
β Full Rust Guide | API Reference
pip install meta-oxide
from meta_oxide import MetaOxide
html = "<!DOCTYPE html>..."
extractor = MetaOxide(html, "https://example.com")
metadata = extractor.extract_all()
print(f"Title: {metadata['title']}")
Performance: 233x faster than BeautifulSoup
β Full Python Guide | API Reference
go get github.com/yourusername/meta-oxide-go
import metaoxide "github.com/yourusername/meta-oxide-go"
extractor, _ := metaoxide.NewExtractor(html, "https://example.com")
defer extractor.Free()
metadata, _ := extractor.ExtractAll()
fmt.Printf("Title: %v\n", metadata["title"])
Only Go library with 13 metadata formats
β Full Go Guide | API Reference
npm install meta-oxide
const { MetaOxide } = require('meta-oxide');
const html = '<!DOCTYPE html>...';
const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();
console.log('Title:', metadata.title);
Performance: 280x faster than metascraper
β Full Node.js Guide | API Reference
<dependency>
<groupId>com.metaoxide</groupId>
<artifactId>meta-oxide</artifactId>
<version>0.1.0</version>
</dependency>
try (MetaOxide extractor = new MetaOxide(html, "https://example.com")) {
Metadata metadata = extractor.extractAll();
System.out.println("Title: " + metadata.get("title"));
}
Performance: 311x faster than jsoup + Any23
β Full Java Guide | API Reference
dotnet add package MetaOxide
using var extractor = new MetaOxideExtractor(html, "https://example.com");
var metadata = extractor.ExtractAll();
Console.WriteLine($"Title: {metadata["title"]}");
Performance: 200x faster than HtmlAgilityPack
β Full C# Guide | API Reference
npm install meta-oxide-wasm
import init, { MetaOxide } from 'meta-oxide-wasm';
await init(); // Initialize WASM
const extractor = new MetaOxide(html, 'https://example.com');
const metadata = extractor.extractAll();
console.log('Title:', metadata.title);
Performance: 260x faster than native JavaScript parsers
β Full WASM Guide | API Reference
MetaOxide extracts 13 metadata formats out of the box:
| Format | Description | Adoption | Use Cases |
|---|---|---|---|
| Basic HTML | title, description, keywords, canonical | 100% | SEO, browser display |
| Open Graph | og:* properties | 60%+ | Social media sharing (Facebook, LinkedIn, WhatsApp) |
| Twitter Cards | twitter:* meta tags | 45% | Twitter/X link previews |
| JSON-LD | Structured data (schema.org) | 41%βοΈ | Google Rich Results, AI/LLM training |
| Microdata | itemscope, itemprop | 26% | E-commerce, recipes, reviews |
| Microformats | h-card, h-entry, h-event | 15% | Distributed social web, contacts |
| Dublin Core | DC metadata | 8% | Digital libraries, archives |
| RDFa | RDF in attributes | 5% | Linked data, semantic web |
| RelLinks | Link relations | 100% | Canonical URLs, alternate versions |
| Web Manifest | PWA manifest | 12% | Progressive web apps |
| Images | Image metadata | 100% | Image alt text, dimensions |
| Authors | Author information | 80% | Authorship, copyright |
| SEO | Robots, language, viewport | 100% | Search engine optimization |
MetaOxide is dramatically faster than traditional libraries:
| Library | Language | Docs/Sec | vs MetaOxide |
|---|---|---|---|
| MetaOxide | Rust | 125,000 | 1x (baseline) |
| MetaOxide | Python | 83,333 | 233x faster than BeautifulSoup |
| MetaOxide | Go | 100,000 | N/A (only option with 13 formats) |
| MetaOxide | Node.js | 66,666 | 280x faster than metascraper |
| MetaOxide | Java | 55,555 | 311x faster than jsoup |
| MetaOxide | C# | 62,500 | 200x faster than HtmlAgilityPack |
| MetaOxide | WASM | 40,000 | 260x faster than JS parsers |
| BeautifulSoup | Python | 357 | - |
| metascraper | Node.js | 238 | - |
| jsoup + Any23 | Java | 178 | - |
| HtmlAgilityPack | C# | 312 | - |
Processing 1 million e-commerce product pages:
| Solution | Time | CPU Hours | AWS Cost |
|---|---|---|---|
| MetaOxide | 22 seconds | 0.006 | $0.0012 |
| BeautifulSoup | 140 minutes | 2.33 | $0.47 |
| Savings | 381x faster | 388x less | 391x cheaper |
from flask import Flask, request, jsonify
from meta_oxide import MetaOxide
import requests
app = Flask(__name__)
@app.route('/extract')
def extract():
url = request.args.get('url')
response = requests.get(url)
extractor = MetaOxide(response.text, url)
metadata = extractor.extract_all()
return jsonify(metadata)
const express = require('express');
const axios = require('axios');
const { MetaOxide } = require('meta-oxide');
const app = express();
app.get('/extract', async (req, res) => {
const { url } = req.query;
const response = await axios.get(url);
const extractor = new MetaOxide(response.data, url);
const metadata = extractor.extractAll();
res.json(metadata);
});
app.listen(3000);
func extractConcurrently(urls []string) []Metadata {
var wg sync.WaitGroup
results := make([]Metadata, len(urls))
for i, url := range urls {
wg.Add(1)
go func(index int, targetURL string) {
defer wg.Done()
html := fetchHTML(targetURL)
extractor, _ := metaoxide.NewExtractor(html, targetURL)
defer extractor.Free()
results[index], _ = extractor.ExtractAll()
}(i, url)
}
wg.Wait()
return results
}
MetaOxide is built on a multi-layer architecture for maximum performance and compatibility:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer (Your Code) β
β Rust, Python, Go, Node.js, Java, C#, WebAssembly β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Language Bindings β
β PyO3, CGO, N-API, JNI, P/Invoke, wasm-bindgen β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β C-ABI Layer (Stable Foreign Function Interface) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββ
β Rust Core (16,500+ lines) β
β β’ HTML Parser (html5ever) β
β β’ 13 Metadata Extractors β
β β’ URL Resolution & Utilities β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Design Principles:
| Feature | Rust | Python | Go | Node.js | Java | C# | WASM |
|---|---|---|---|---|---|---|---|
| Basic Meta | β | β | β | β | β | β | β |
| Open Graph | β | β | β | β | β | β | β |
| Twitter Cards | β | β | β | β | β | β | β |
| JSON-LD | β | β | β | β | β | β | β |
| Microdata | β | β | β | β | β | β | β |
| Microformats | β | β | β | β | β | β | β |
| Dublin Core | β | β | β | β | β | β | β |
| RDFa | β | β | β | β | β | β | β |
| All 13 Formats | β | β | β | β | β | β | β |
| Type Hints | β | β | β | β (TS) | β | β | β (TS) |
| Async Support | β | β* | β | β* | β | β | β* |
| Thread-Safe | β | β | β | β | β | β | β |
| Memory-Safe | β | β | β | β | β | β | β |
*Extraction is synchronous, but compatible with async I/O
Extract metadata from millions of pages efficiently:
# Process 1M pages in 12 seconds (vs. 46 minutes with BeautifulSoup)
from concurrent.futures import ThreadPoolExecutor
results = ThreadPoolExecutor(max_workers=10).map(extract_from_url, urls)
Analyze metadata for SEO optimization:
const og = extractor.extractOpenGraph();
const twitter = extractor.extractTwitterCard();
const jsonld = extractor.extractJSONLD();
// Check for missing or malformed metadata
Generate link previews like Facebook/Twitter:
og, _ := extractor.ExtractOpenGraph()
fmt.Printf("Title: %s\n", og.Title)
fmt.Printf("Image: %s\n", og.Image)
fmt.Printf("Description: %s\n", og.Description)
Extract structured data for machine learning:
let jsonld = extractor.extract_jsonld()?;
let microdata = extractor.extract_microdata()?;
// Feed to AI models for training
Extract product metadata:
List<MicrodataItem> products = extractor.extractMicrodata();
for (MicrodataItem item : products) {
if (item.getType().contains("Product")) {
System.out.println(item.getProperties().get("name"));
System.out.println(item.getProperties().get("price"));
}
}
Client-side metadata extraction:
import init, { MetaOxide } from 'meta-oxide-wasm';
await init();
const html = document.documentElement.outerHTML;
const extractor = new MetaOxide(html, window.location.href);
const metadata = extractor.extractAll();
Contributions are welcome! See CONTRIBUTING.md for guidelines.
# Clone repository
git clone https://github.com/yourusername/meta_oxide.git
cd meta_oxide
# Build Rust core
cargo build --release
# Run tests
cargo test
# Build language bindings
# Python
cd bindings/python && pip install -e .
# Go
cd bindings/go && go test ./...
# Node.js
cd bindings/nodejs && npm install && npm test
# Java
cd bindings/java && mvn test
# C#
cd bindings/csharp && dotnet test
# WASM
cd bindings/wasm && wasm-pack build
MetaOxide is released under the MIT License.
MIT License
Copyright (c) 2025 MetaOxide Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
MetaOxide is an open-source project. Consider sponsoring to support development:
MetaOxide builds on excellent open-source projects:
Made with β€οΈ by the MetaOxide team
Star β this repository if you find it useful!
FAQs
Universal metadata extraction library supporting 13 formats (HTML Meta, Open Graph, Twitter Cards, JSON-LD, Microdata, Microformats, RDFa, Dublin Core, Web App Manifest, oEmbed, rel-links, Images, SEO) with 7 language bindings
We found that meta-oxide demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.Β It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Socket CEO Feross Aboukhadijeh joins Software Engineering Daily to discuss modern software supply chain attacks and rising AI-driven security risks.

Security News
GitHub has revoked npm classic tokens for publishing; maintainers must migrate, but OpenJS warns OIDC trusted publishing still has risky gaps for critical projects.

Security News
Rustβs crates.io team is advancing an RFC to add a Security tab that surfaces RustSec vulnerability and unsoundness advisories directly on crate pages.