Socket
Socket
Sign inDemoInstall

@xapp/arachne

Package Overview
Dependencies
163
Maintainers
5
Versions
49
Alerts
File Explorer

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

    @xapp/arachne

An extremely simple web crawler, based on puppeteer.


Version published
Weekly downloads
136
increased by172%
Maintainers
5
Created
Weekly downloads
 

Readme

Source

@xapp/arachne

An extremely simple web crawler, based on puppeteer.

Usage in a Lambda

Chromium is required for puppeteer and is typically the limiting factor when trying to get it to run in a Lambda due to its size. This can be overcome with a Lambda Layer, specifically this community maintained layer.

You can include this layer directly in your SLS framework file or SAM Policy template.

A SLS framework example:

functions:
  eventReceiver:
    handler: dist/index.receiver
    layers:
      - "arn:aws:lambda:us-east-1:764866452798:layer:chrome-aws-lambda:31"

In your Lambda source:

import { Browser, LaunchOptions, BrowserConnectOptions, BrowserLaunchArgumentOptions } from "puppeteer";
import { Arachne, ArachnePage, ArachneRequest, MemoryRequestQueue } from "@xapp/arachne";

// Other imports and code
// The important part..

        let browser: Pick<Browser, "close" | "newPage">;
        // The try catch allows to still run it locally if you want, assuming you 
        // have chromium installed on your machine
        try {
            log().debug('Looking for chrome-aws-lambda');

            // eslint-disable-next-line @typescript-eslint/no-var-requires
            const chromium = require('@sparticuz/chrome-aws-lambda');

            browser = await chromium.puppeteer.launch({
                args: chromium.args,
                defaultViewport: chromium.defaultViewport,
                executablePath: await chromium.executablePath,
                headless: chromium.headless,
                ignoreHTTPSErrors: true,
            });
        } catch (e) {
            log().debug("Could not find chrome-aws-lambda layer");
            console.error(e);
        }

        const crawler = Arachne.crawler({
            stealth: true,
            launchOptions, /* timeout set to 5 seconds, default of 30 is too long */
            queue,
            browser,
            pageHandler: async (page: ArachnePage, request: ArachneRequest) => {
            //... handle page load
            }
        });

Lambda Layer Resources

FAQs

Last updated on 11 Mar 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc