Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement →

@themaximalist/scrape.js

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@themaximalist/scrape.js

Simple but feature-packed web scraping library for Node.js.

latest

Source

npm

Version: 0.1.1

Version published: 2 years ago

Maintainers: 1

Created: 3 years ago

Source

Scrape.js

Scrape.js is an easy to use web scraping library for Node.js.

const data = await scrape("https://example.com");
// { url, html }

Features

Fast
Scrape nearly any website
Headless JavaScript scraping
Auto proxy rotation
...it just works
MIT License

Install

Install Scrape.js from NPM:

npm install @themaximalist/scrape.js

Config

Scrape.js uses Zen Rows for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.

ZENROWS_API_KEY=abcxyz123

Scrape.js can be used without proxies, but is less effective.

Usage

Using Scrape.js is as simple as calling a function with a website URL.

const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }

You can specify additional options to scrape() for more control:

const data = await scrape("https://example.com", {
    headless: true,
    proxy: true
});
// { url, html }

API

The Scrape.js API is a simple function you call with your URL, with an optional config object.

await scrape(
    url, // URL to scrape
    {
        headless: true, // Use JavaScript headless scraping
        proxy: true, // Use proxy rotation
        method: "GET", // HTTP Request method
        timeout: 3000, // Scrape timeout in ms
        userAgent: "Mozilla/5.0...", // User Agent
    }
);

URL (required)

url <string>: URL to scrape

Options

headless <bool>: Enable JavaScript. Default is true.
proxy <bool>: Use proxy with request. Default is true.
method <string>: HTTP request method, usually GET or POST. Default is GET.
timeout <int>: Max request time in ms. Default is 3500.
userAgent <string>: User agent for request. Default is Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36.

Response

Scrape.js returns an object containing the final url and html content.

const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // <html...

The Scrape.js API is a simple and reliable way to scrape the HTML from any website.

Debug

Scrape.js uses the debug npm module with the scrape.js namespace.

View debug logs by setting the DEBUG environment variable.

> DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs

Examples

View tests to examples on how to use Scrape.js.

Projects

Scrape.js is currently used in the following projects:

News Score — score the news, score the news, rewrite the headlines

License

MIT

Author

Created by The Maximalist, see our open-source projects.

Keywords

FAQs

What is @themaximalist/scrape.js?

Is @themaximalist/scrape.js well maintained?

Package last updated on 17 Oct 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@themaximalist/scrape.js

Scrape.js

Install

Config

Usage

API

Debug

Examples

Projects

License

Author

Keywords

Related posts

152 Chrome Live Wallpaper Extensions Hid Ad Tracking and Faked Google Search Traffic

Andrew Becherer Joins Socket as Chief Information Security Officer