Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

@xapp/arachne

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@xapp/arachne

An extremely simple web crawler, based on puppeteer.

0.6.0
npm

Version published: 2 years ago

Weekly downloads: 590; increased by72.51%

Maintainers: 5

Weekly downloads

Created: 4 years ago

Source

@xapp/arachne

An extremely simple web crawler, based on puppeteer.

Usage in a Lambda

Chromium is required for puppeteer and is typically the limiting factor when trying to get it to run in a Lambda due to its size. This can be overcome with a Lambda Layer, specifically this community maintained layer.

You can include this layer directly in your SLS framework file or SAM Policy template.

A SLS framework example:

functions:
  eventReceiver:
    handler: dist/index.receiver
    layers:
      - "arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:31"

In your Lambda source:

import { Browser, LaunchOptions, BrowserConnectOptions, BrowserLaunchArgumentOptions } from "puppeteer";
import { Arachne, ArachnePage, ArachneRequest, MemoryRequestQueue } from "@xapp/arachne";

// Other imports and code
// The important part..

        let browser: Pick<Browser, "close" | "newPage">;
        // The try catch allows to still run it locally if you want, assuming you 
        // have chromium installed on your machine
        try {
            log().debug('Looking for chrome-aws-lambda');

            // eslint-disable-next-line @typescript-eslint/no-var-requires
            const chromium = require('@sparticuz/chrome-aws-lambda');

            browser = await chromium.puppeteer.launch({
                args: chromium.args,
                defaultViewport: chromium.defaultViewport,
                executablePath: await chromium.executablePath,
                headless: chromium.headless,
                ignoreHTTPSErrors: true,
            });
        } catch (e) {
            log().debug("Could not find chrome-aws-lambda layer");
            console.error(e);
        }

        const crawler = Arachne.crawler({
            stealth: true,
            launchOptions, /* timeout set to 5 seconds, default of 30 is too long */
            queue,
            browser,
            pageHandler: async (page: ArachnePage, request: ArachneRequest) => {
            //... handle page load
            }
        });

Lambda Layer Resources

Google Chrome for AWS Lambda as a layer
Serverless Browser Automation with AWS Lambda and Puppeteer
- NOTE! The source code linked to this article uses require("chrome-aws-lambda") but this is WRONG if you use the layer directly. You need to use require('@sparticuz/chrome-aws-lambda');

FAQs

What is @xapp/arachne?

Is @xapp/arachne popular?

Is @xapp/arachne well maintained?

Package last updated on 17 May 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@xapp/arachne

@xapp/arachne

Usage in a Lambda

Lambda Layer Resources

Related posts

Weekly Downloads Now Available in npm Package Search Results

Tech's $90B Ghost Engineer Problem: Stanford Study Finds 9.5% of Engineers Do Almost Nothing