🚀 Big News:Socket Has Acquired Secure Annex.Learn More →
Socket
Book a DemoSign in
Socket

@marbec/web-auto-extractor

Package Overview
Dependencies
Maintainers
1
Versions
16
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@marbec/web-auto-extractor

Automatically extracts structured information from webpages

Source
npmnpm
Version
2.1.1
Version published
Weekly downloads
8K
123.62%
Maintainers
1
Weekly downloads
 
Created
Source

Web Auto Extractor 2.0

GitHub License CI NPM Version Node Current

This project is a fork of indix/web-auto-extractor.

Parse semantically structured information from any HTML webpage.

Supported formats:

  • Encodings that support Schema.org vocabularies:
    • Microdata
    • RDFa-lite
    • JSON-LD
  • Meta tags

Popularly, many websites mark up their webpages with Schema.org vocabularies for better SEO. This library helps you parse that information to JSON.

Installation

npm i --save @marbec/web-auto-extractor

Usage

import WebAutoExtractor from '@marbec/web-auto-extractor';

const parsed = new WebAutoExtractor({
  // Add location information to the root elements in the parsed data.
  // Location is stored as start,end offset values in the @location property.
  addLocation: false,

  // Embed the source HTML in the root elements in the parsed data using the @source property.
  // This property is either a boolean to embed sources for all data types or an array of data types to embed sources for.
  embedSource: false,
}).parse(sampleHTML);

// Output format
/* {
    "metatags": {},
    "microdata": {},
    "rdfa": {},
    "jsonld": {}
} */

Browser

You can run the parser directly in the browser on any website using the following commands:

const { default: WebAutoExtractor } = await import(
  'https://unpkg.com/@marbec/web-auto-extractor@latest/dist/index.js'
);
new WebAutoExtractor().parse(document.documentElement.outerHTML);

Examples

See test cases for sample in- and outputs.

Keywords

crawler

FAQs

Package last updated on 17 Jul 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts