Big News: Socket raises $60M Series C at a $1B valuation to secure software supply chains for AI-driven development.Announcement
Sign In

@themaximalist/scrape.js

Package Overview
Dependencies
Maintainers
1
Versions
6
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@themaximalist/scrape.js

Simple but feature-packed web scraping library for Node.js.

latest
Source
npmnpm
Version
0.1.1
Version published
Maintainers
1
Created
Source

Scrape.js

Scrape.js — Web Scraping Library for Node.js
GitHub Repo stars NPM Downloads GitHub code size in bytes GitHub License

Scrape.js is an easy to use web scraping library for Node.js.

const data = await scrape("https://example.com");
// { url, html }

Features

  • Fast
  • Scrape nearly any website
  • Headless JavaScript scraping
  • Auto proxy rotation
  • ...it just works
  • MIT License

Install

Install Scrape.js from NPM:

npm install @themaximalist/scrape.js

Config

Scrape.js uses Zen Rows for proxy rotation. To use it acquire a Zen Rows API key and setup the environment variable.

ZENROWS_API_KEY=abcxyz123

Scrape.js can be used without proxies, but is less effective.

Usage

Using Scrape.js is as simple as calling a function with a website URL.

const scrape = require("@themaximalist/scrape.js");
await scrape("http://example.com"); // { url, html }

You can specify additional options to scrape() for more control:

const data = await scrape("https://example.com", {
    headless: true,
    proxy: true
});
// { url, html }

API

The Scrape.js API is a simple function you call with your URL, with an optional config object.

await scrape(
    url, // URL to scrape
    {
        headless: true, // Use JavaScript headless scraping
        proxy: true, // Use proxy rotation
        method: "GET", // HTTP Request method
        timeout: 3000, // Scrape timeout in ms
        userAgent: "Mozilla/5.0...", // User Agent
    }
);

URL (required)

  • url <string>: URL to scrape

Options

  • headless <bool>: Enable JavaScript. Default is true.
  • proxy <bool>: Use proxy with request. Default is true.
  • method <string>: HTTP request method, usually GET or POST. Default is GET.
  • timeout <int>: Max request time in ms. Default is 3500.
  • userAgent <string>: User agent for request. Default is Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36.

Response

Scrape.js returns an object containing the final url and html content.

const { url, html } = await scrape("https://example.com");
console.log(url); // https://example.com/
console.log(html); // <html...

The Scrape.js API is a simple and reliable way to scrape the HTML from any website.

Debug

Scrape.js uses the debug npm module with the scrape.js namespace.

View debug logs by setting the DEBUG environment variable.

> DEBUG=scrape.js*
> node src/get_website_html.js
# debug logs

Examples

View tests to examples on how to use Scrape.js.

Projects

Scrape.js is currently used in the following projects:

  • News Score — score the news, score the news, rewrite the headlines

License

MIT

Author

Created by The Maximalist, see our open-source projects.

Keywords

web

FAQs

Package last updated on 17 Oct 2024

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts