Socket
Book a DemoInstallSign in
Socket

@arcblock/crawler-middleware

Package Overview
Dependencies
Maintainers
4
Versions
12
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

@arcblock/crawler-middleware

This express middleware provides pre-rendered HTML generated by SnapKit for Blocklets, enabling them to return complete HTML content to web spider. This is essential for SEO and ensuring that search engines can properly index dynamically generated content

latest
npmnpm
Version
1.3.4
Version published
Maintainers
4
Created
Source

@arcblock/crawler-middleware

This express middleware provides pre-rendered HTML generated by SnapKit for Blocklets, enabling them to return complete HTML content to web spider. This is essential for SEO and ensuring that search engines can properly index dynamically generated content.

How it Works

  • The middleware intercepts incoming requests.
  • It checks if the request is from a web spider.
  • Try to read and return HTML from the local cache (Memory LRU Cache + SQLite).
  • If the cache is not found, an asynchronous request is made to SnapKit, and the local cache is updated.
  • The current request does not return the cached content; the next spider visit will hit step 3 and return the cache directly.

How to Verify

  • Update your browser's User Agent string to include "spider"
  • Visit a page that has already been crawled by SnapKit.
  • First Visit (Cache Miss): On your first visit, the cache should be missed. Check the server logs for a "Cache miss" message. and a request has been sent to SnapKit to cache the page.
  • Second Visit (Cache Hit): Wait a moment and then revisit the same page. The cache should be hit. The server logs should show a "Cache hit" message, and the returned HTML should include the meta tag: <meta name="arcblock-crawler" content="true">.

Usage

import { createSnapshotMiddleware } from '@arcblock/crawler-middleware';

const app = express();
const snapshotMiddleware = createSnapshotMiddleware({
  endpoint: process.env.SNAP_KIT_ENDPOINT,
  accessKey: process.env.SNAP_KIT_ACCESS_KEY,
  allowCrawler: (req) => {
    return req.path === '/';
  },
});

// for all route
app.use(snapshotMiddleware);

// for one route
app.use('/doc', snapshotMiddleware, (req) => {
  /* ... */
});

Options

The options for createSnapshotMiddleware:

{
  /** SnapKit endpoint */
  endpoint: string;
  /** SnapKit access key */
  accessKey: string;
  /** Max cache size for LRU cache */
  cacheMax?: number;
  /** When cache exceeds this time, it will try to fetch and update cache from SnapKit */
  updateInterval?: number;
  /** When failed cache exceeds this time, it will try to fetch and update cache from SnapKit */
  failedUpdateInterval?: number;
  /** Update queue concurrency */
  updatedConcurrency?: number;
  /** Call res.send(html) when cache hit */
  autoReturnHtml?: boolean;
  /** Custom function to determine whether to return cached content */
  allowCrawler?: (req: Request) => boolean;
};

Environment Variables

When using this middleware outside of a Blocklet environment, you need to configure the following environment variables:

  • BLOCKLET_DATA_DIR: (Required) Directory path for storing the sqlite file
  • BLOCKLET_LOG_DIR: (Required) Directory path for storing @blocklet/logger logs
  • BLOCKLET_APP_URL: (Optional) Deployed domain

SQLite

When createSnapshotMiddleware is called, it attempts to create an SQLite database at BLOCKLET_DATA_DIR. This database is used to cache HTML content retrieved from SnapKit. Please ensure that the deployment environment supports SQLite.

FAQs

Package last updated on 07 Sep 2025

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts