Big News: Socket Selected for OpenAI's Cybersecurity Grant Program.Details
Socket
Book a DemoSign in
Socket

microfrontier

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

microfrontier

A web crawler frontier implementation in TypeScript backed by Redis. MicroFrontier is a scalable and distributed frontier implemented through Redis Queues.

latest
Source
npmnpm
Version
1.0.2
Version published
Maintainers
1
Created
Source

MicroFrontier · npm npm version Docker Pulls Docker Image Size (tag)

A web crawler frontier implementation in TypeScript backed by Redis. MicroFrontier is a scalable and distributed frontier implemented through Redis Queues.

  • Fast Ingestion & High throughput
  • Multiple priority queues
  • Custom priority strategy
  • Per-Hostname crawl rate limit or default delay fallback
  • Easy to use HTTP Microservice
  • Multi-processing support

Example of Mercator Frontier1

Queue

Usage

MicroFrontier can be used both as a Javascript library SDK, from the command line or with a Docker deploy.

Command Line

Install microfrontier with:

npm i -g microfrontier

Run microfrontier

microfrontier --port 3035 --redis:host localhost #see configuration for other parameters

As a package

Npm:

npm i microfrontier

Yarn:

yarn add microfrontier

Docker

docker pull adileo/microfrontier

Configuration

ENV VARCLI PARAMSDescription
host--hostHost name to start the microservice http server.
Default value: 127.0.0.1
port--portPort to start the microservice http server.
Default value: 8090
redis_host--redis:hostRedis server host.
Default value: 127.0.0.1
redis_port--redis:portRedis server port.
Default value: 6379
redis_*--redis:*Parameters are interpreted by nconf and passed to ioredis as the client config.
config_frontierName--config:frontierNamePrefix used for Redis keys.
config_*--config:*Parameters are interpreted by nconf, default value below.
{
    frontierName: 'frontier',
    priorities: {
        'high':     {probability: 0.6},
        'normal':   {probability: 0.3},
        'low':      {probability: 0.1},
    },
    defaultCrawlDelay: 1000
}

How to

Adding an URL to the frontier

Via HTTP

curl --location --request POST 'http://127.0.0.1:8090/frontier' \
--header 'Content-Type: application/json' \
--data-raw '{
    "url": "http://www.example.com",
    "priority": "normal",
    "meta": {
        "foo": "bar"
    }
}'

Via SDK

import { URLFrontier } from "microfrontier"

const frontier = new URLFrontier(config)

frontier.add("http://www.example.com", "normal", {"foo": "bar"}).then(() => {
    console.log('URL added')
})

Getting an URL from the frontier

curl --location --request GET 'http://127.0.0.1:8090/frontier'
import { URLFrontier } from "microfrontier"

const frontier = new URLFrontier(config)

frontier.get().then((item) => {
    // {url: "http://www.example.com", meta: {"foo":"bar"}}
})

Citations

[1]: High-Performance Web Crawling - Marc Najork, Allan Heydon

Keywords

Web Crawler

FAQs

Package last updated on 01 Nov 2021

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts