
Security News
US Government Forces Anthropic to Pull Claude Fable Days After Launch
Anthropic says the directive cited national security concerns over a narrow jailbreak, but offered no specific technical details.
Stealth crawler with Chrome-perfect TLS/H2 fingerprint, render pool, hooks, persistent queue
TLS, HTTP/2, JS fingerprint โ every byte indistinguishable from real Chrome 149.
Rust core โข Node SDK โข Lua hooks โข cross-platform binaries.
pnpm add -g crawlex && crawlex pages run --seed https://example.com --method render
Quickstart ยท Features ยท Examples ยท Docs ยท Why crawlex
Standard crawlers fail on the first Cloudflare wall. crawlex arrives the way real Chrome arrives โ every fingerprint surface is identical, not approximated.
| Layer | What we match โ exactly, not approximately |
|---|---|
| ๐ TLS ClientHello | Extension order, ALPS, GREASE values, permute_extensions, X25519MLKEM768, signature algorithms โ verified against tls.peet.ws and ja4db.com oracles |
| ๐ฆ HTTP/2 frame | Pseudo-header order :method :authority :scheme :path, SETTINGS frame parameters, WINDOW_UPDATE pattern โ passes Akamai BMP signature checks |
| ๐ญ JS fingerprint | 29-section stealth shim: navigator, chrome.*, permissions, plugins, screen, timezone, battery, WebGL (vendor / params / extensions), canvas (zero-preserving noise), AudioContext (FFT + offline render), Function.prototype.toString proxy, WebGPU, performance.memory, sensors, iframe, requestAnimationFrame throttle, performance.now() 100ยตs grain, mediaDevices, fonts, WebRTC SDP/ICE/getStats scrub |
| ๐ค Behavior | Mouse jitter, scroll cadence, dwell time, idle drift โ coherent motion:: profiles per persona |
| ๐ฆ Catalog | 30 Chrome stable ร 30 Chromium ร 20 Firefox ร Edge ร Safari fingerprints. Era-fallback resolution: ask for chrome-149-linux, get the closest captured profile |
| ๐ ๏ธ Worker scope | Same shim auto-attached to dedicated / shared / service workers via CDP Target.setAutoAttach โ Camoufox port |
โ Validated against BrowserScan, CreepJS, Sannysoft, tls.peet.ws, ja4db.com.
# npm โ bundled binary download via postinstall
pnpm add -g crawlex
# Rust โ from source
cargo install crawlex
# Direct binary (linux x86_64/arm64, macOS x86_64/arm64, windows x86_64)
# https://github.com/forattini-dev/crawlex/releases/latest
โ ๏ธ Production crawls run locally, never in CI. Datacenter IPs (GitHub Actions, AWS, Azure) are flagged instantly by every modern WAF.
# Stealth render with persona, sitemap discovery, NDJSON event stream
crawlex pages run \
--seed https://target.com \
--method render \
--persona atlas \
--max-depth 3 \
--screenshot \
--emit ndjson > events.ndjson
# Live tail what just happened
jq -c 'select(.event == "fetch.completed" or .event == "render.completed")' events.ndjson
Three integration paths, your pick:
| CLI | Node SDK | Embedded Rust |
|---|---|---|
One-shot crawls, scripted pipelines. |
Production services with hook logic. |
In-process embedding, zero IPC. |
import { crawl } from 'crawlex';
for await (const ev of crawl({
seeds: ['https://stripe.com/pricing'],
args: {
method: 'render',
persona: 'atlas', // macOS Apple M1, Retina, en-US
screenshot: true,
screenshotMode: 'fullpage',
storage: 'filesystem',
storagePath: './out',
waitStrategy: '{"NetworkIdle":{"idle_ms":1500}}',
},
})) {
if (!('event' in ev)) continue;
switch (ev.event) {
case 'render.completed':
console.log(`โ
${ev.url} | LCP=${ev.data.vitals.largest_contentful_paint_ms}ms | CLS=${ev.data.vitals.cumulative_layout_shift}`);
break;
case 'artifact.saved':
if (ev.data.kind === 'screenshot.full_page')
console.log(`๐ธ โ out/${ev.data.path} (${(ev.data.size/1024).toFixed(0)}kB)`);
break;
case 'challenge.detected':
console.log(`๐ง ${ev.data.vendor} (${ev.data.level}) on ${ev.url}`);
break;
}
}
import { crawl, defineHooks } from 'crawlex';
const hooks = defineHooks({
// Rate-limit retry: 429/503 โ re-enqueue (up to retry_max)
async onAfterFirstByte(ctx) {
if (ctx.response_status === 429 || ctx.response_status === 503) return 'retry';
return 'continue';
},
// Inject the canonical sitemap.xml for every host we touch
async onDiscovery(ctx) {
const host = new URL(ctx.url).host;
return {
decision: 'continue',
patch: { capturedUrls: [...ctx.captured_urls, `https://${host}/sitemap.xml`] },
};
},
// Tag the crawl with custom metadata that lands in user_data
async onJobStart(ctx) {
return {
decision: 'continue',
patch: { userData: { ...ctx.user_data, run_owner: 'qa-bot' } },
};
},
});
for await (const ev of crawl({
seeds: ['https://target.com'],
args: {
method: 'auto', // policy engine picks http vs render
maxConcurrentHttp: 8,
maxConcurrentRender: 2,
maxDepth: 5,
crtsh: true, // certificate-transparency seeding
storage: 'sqlite',
storagePath: './crawl.db',
queue: 'sqlite',
queuePath: './crawl.db',
proxies: ['http://user:pass@proxy1:8080', 'http://user:pass@proxy2:8080'],
proxyStrategy: 'health-weighted',
proxyStickyPerHost: true,
},
hooks,
signal: AbortSignal.timeout(30 * 60_000),
})) {
if (!('event' in ev)) continue;
if (ev.event === 'job.failed') console.error(`โ ${ev.url} โ ${ev.data.error}`);
if (ev.event === 'run.completed') console.log('done.');
}
use crawlex::{Config, Crawler, queue::FetchMethod};
use crawlex::hooks::{HookDecision, HookRegistry};
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
#[tokio::main]
async fn main() -> crawlex::Result<()> {
let hooks = HookRegistry::new();
let pages_seen = Arc::new(AtomicUsize::new(0));
// Closure-captured counter โ observe without intervening
let counter = pages_seen.clone();
hooks.on_response_body(move |_ctx| {
let c = counter.clone();
Box::pin(async move {
c.fetch_add(1, Ordering::Relaxed);
Ok(HookDecision::Continue)
})
});
// Domain-level deny list โ short-circuit before fetch
hooks.on_before_each_request(|ctx| {
let url = ctx.url.clone();
Box::pin(async move {
if url.path().starts_with("/admin/") { return Ok(HookDecision::Skip); }
Ok(HookDecision::Continue)
})
});
let config = Config::builder()
.max_concurrent_http(16)
.build()?;
let crawler = Crawler::new(config)?.with_hooks(hooks);
crawler.seed_with(
vec!["https://target.com".parse().unwrap()],
FetchMethod::HttpSpoof,
).await?;
crawler.run().await?;
println!("Crawled {} pages", pages_seen.load(Ordering::Relaxed));
Ok(())
}
โ Full runnable example: examples/embedded_with_hooks.rs
# Browse 80+ ready-to-use fingerprints
crawlex stealth catalog list
crawlex stealth catalog list --filter chrome
crawlex stealth catalog show chrome-149-linux
# Pin a precise version + OS
crawlex pages run --seed https://target.com \
--profile chrome-149-linux
# Era fallback: chromium-122 not captured? falls back to closest era + warns
crawlex pages run --seed https://target.com \
--profile chromium-122-linux
# Mobile persona (touch viewport, sec-ch-ua-mobile: ?1)
crawlex pages run --seed https://target.com \
--method render --persona pixel
# Print active IdentityBundle + TLS profile summary
crawlex stealth inspect --profile chrome-149-linux
# Verify ALPN/cipher/JA4 against built-in expectations
crawlex stealth test
# Compare against tls.peet.ws / ja4db.com via the live oracle
crawlex stealth catalog show chrome-149-linux --json
๐ฅท Stealth core
๐ Discovery
๐ก๏ธ Antibot policy engine
|
โ๏ธ Pipeline
๐ก Observability
๐ Integrations
|
Every run emits one JSON envelope per line on stdout. Versioned, stable, 19 kinds:
{"v":1,"event":"run.started","ts":"2026-04-26T19:42:00.000Z","run_id":42,"data":{"policy_profile":"strict","max_concurrent_http":8,"max_concurrent_render":2}}
{"v":1,"event":"job.started","run_id":42,"url":"https://target.com/","data":{"job_id":"j_001","method":"render","depth":0,"priority":0,"attempts":0}}
{"v":1,"event":"fetch.completed","run_id":42,"url":"https://target.com/","data":{"final_url":"https://target.com/","status":200,"bytes":98234,"body_truncated":false,"dns_ms":12,"tcp_connect_ms":18,"tls_handshake_ms":24,"ttfb_ms":142,"download_ms":83,"total_ms":280,"alpn":"h2","tls_version":"TLSv1.3","cipher":"TLS_AES_128_GCM_SHA256"}}
{"v":1,"event":"render.completed","run_id":42,"session_id":"sess_abc","url":"https://target.com/","data":{"final_url":"https://target.com/","status":200,"manifest":true,"service_workers":1,"is_spa":true,"vitals":{"ttfb_ms":142,"first_contentful_paint_ms":380.5,"largest_contentful_paint_ms":920.1,"cumulative_layout_shift":0.03,"total_blocking_time_ms":50.0,"dom_nodes":1842,"js_heap_used_bytes":12345678,"resource_count":45,"total_transfer_bytes":982341}}}
{"v":1,"event":"artifact.saved","run_id":42,"url":"https://target.com/","data":{"kind":"screenshot.full_page","mime":"image/png","size":1234567,"sha256":"a1b2c3...","path":"artifacts/sess_abc/1714123456_screenshot_full_page_a1b2c3d4.png"}}
{"v":1,"event":"challenge.detected","run_id":42,"url":"https://protected.com/","data":{"vendor":"cloudflare_turnstile","level":"widget_present"}}
{"v":1,"event":"decision.made","run_id":42,"url":"https://protected.com/","why":"render:js-challenge","data":{"decision":"retry","reason":{"code":"render:js-challenge"}}}
{"v":1,"event":"run.completed","run_id":42}
Discriminator key: event (snake_case) โ TypeScript narrows via switch (ev.event) { โฆ }. Fallback for malformed lines: { kind: 'raw', line } so consumers can log/recover.
before_each_request โ after_dns โ after_tls โ after_first_byte โ on_response_body
โ after_load โ after_idle โ on_discovery โ on_job_start โ on_job_end
โ on_error โ on_robots_decision
| Language | API | Best for |
|---|---|---|
| Rust | hooks.on_after_first_byte(closure) โ full &mut HookContext access | Embedded library, latency-critical paths |
| JS / TS | defineHooks({...}) via SDK โ IPC bridge, async closures | Production crawls, business logic |
| Lua | --hook-script foo.lua โ page-driving helpers (page_click, page_eval) | Ad-hoc scripts, no build step |
All three modes return the same decision: continue / skip / retry / abort. Hooks can mutate ctx.captured_urls, inject extra URLs, write to user_data to communicate with downstream hooks, or override robots_allowed.
Each persona is a complete bundle โ UA + Sec-CH-UA + screen + viewport + DPR + GPU + fonts + media-device counts + TLS profile + motion timings โ so every signal matches. No mismatched UA + WebGL combo gives you away.
| Codename | OS | GPU | Locale | Form factor |
|---|---|---|---|---|
๐ง tux | Linux | Intel UHD 630 | en-US | desktop 1920ร1080 |
๐ข office | Windows 10 | Intel UHD 620 | en-US | laptop 1920ร1080 (DPR 1.25) |
๐ฎ gamer | Windows 10 | NVIDIA GTX 1060 | pt-BR | desktop 1920ร1080 |
๐ atlas | macOS | Apple M1 | en-US | retina 1440ร900 (DPR 2.0) |
๐ฑ pixel | Android 14 | Adreno 640 | pt-BR | mobile 412ร823 (DPR 2.625) |
crawlex pages run --seed https://target.com --persona atlas # macOS
crawlex pages run --seed https://target.com --persona pixel # mobile
flowchart LR
S[Seeds] --> Q[Frontier<br/>+ dedupe + rate-limit]
Q --> P[Policy Engine]
P -->|http| F[ImpersonateClient<br/>BoringSSL + h2 patched]
P -->|render| R[RenderPool<br/>Chromium + stealth shim]
F --> X[Extractor<br/>+ Asset Refs]
R --> X
X --> D[Discovery<br/>Pipeline]
X --> ST[Storage<br/>5 traits]
D --> Q
P --> EV[NDJSON Events<br/>19 kinds]
R --> H1[Rust Hooks]
R --> H2[JS Bridge]
R --> H3[Lua Scripts]
Module map:
impersonate/ โ TLS catalog + BoringSSL connector + ALPS + GREASErender/ โ Chromium pool + 29-section stealth shim + motion engine + ScriptSpec runnerdiscovery/ โ 17-stage pipeline (DNS, RDAP, sitemap, robots, crtsh, wayback, well-known, โฆ)policy/ โ pure engine: decide_pre_fetch, decide_post_fetch, decide_post_error, decide_post_challengeantibot/ โ vendor classifier + 4 captcha solver adaptersstorage/ โ 5 concern-oriented traits (artifact / state / challenge / telemetry / intel)events/ โ NDJSON envelope + sink (stdout / null / memory)hooks/ โ registry + JS bridge + Lua host| Layer | Implementation |
|---|---|
| TLS | boring-sys โ BoringSSL fork with ALPS / permute_extensions / X25519MLKEM768 |
| HTTP/2 | Vendored h2 crate with pseudo-header order patch (vendor/h2) |
| CDP | chromiumoxide-derived, embedded behind cdp-backend feature |
| Async | tokio multi-thread |
| Storage | rusqlite (SQLite WAL), DashMap (memory), filesystem layout |
| Discovery | hickory-resolver (DNS), reqwest (RDAP), texting_robots (robots.txt) |
| Lua | mlua 0.10 (optional, lua-hooks feature) |
| SDK | Node 20+, CommonJS, zero runtime deps |
Two binaries ship from one source tree:
crawlex โ full build with HTTP impersonation + Chromium rendering + stealth shim + persistent queuecrawlex-mini โ HTTP-only worker, no Chromium dependency, same CLI surface (browser-only flags return Error::RenderDisabled)| crawlex | Playwright stealth | Puppeteer + plugins | curl-impersonate | |
|---|---|---|---|---|
| TLS-perfect ClientHello | โ BoringSSL | โ ๏ธ relies on Chromium | โ ๏ธ relies on Chromium | โ |
| H2 pseudo-header order | โ patched h2 | โ ๏ธ Chromium default | โ ๏ธ Chromium default | โ |
| 29-section JS leak coverage | โ | โ ๏ธ partial | โ ๏ธ via plugins | โ no JS |
| Worker-scope stealth | โ auto-attach | โ ๏ธ manual | โ ๏ธ manual | โ |
| HTTP-only path (no browser) | โ
crawlex-mini | โ | โ | โ |
| Persistent queue + resume | โ SQLite/Redis | โ external | โ external | โ |
| Discovery pipeline | โ 17 stages | โ | โ | โ |
| Streaming NDJSON events | โ versioned | โ | โ | โ |
| Rust embedding | โ | โ | โ | โ ๏ธ libcurl |
| Single binary | โ | โ | โ | โ |
git clone https://github.com/forattini-dev/crawlex
cd crawlex
# Unit tests + offline shim compliance
cargo test --lib # 386+ tests
cargo test --test fpjs_compliance # 27 cases
cargo test --test tls_catalog_coverage --test tls_catalog_roundtrip
# SDK tests
pnpm test # 21 node:test cases
# Quality gates
cargo fmt --check
cargo clippy --all-features -- -D warnings
cargo publish --dry-run --locked
# Live integration tests (require system Chromium)
cargo test --all-features --test stealth_runtime_live -- --ignored
cargo test --all-features --test worker_shim_live -- --ignored
CI runs all of the above on every PR. Contributions welcome โ issues, feature requests, and PRs all reviewed.
Dual-licensed under MIT OR Apache-2.0 at your option. SPDX: MIT OR Apache-2.0.
Third-party attribution: see NOTICE.
Built for crawlers who refuse to be detected.
Docs ยท Releases ยท Issues ยท Discussions
FAQs
Stealth crawler with Chrome-perfect TLS/H2 fingerprint, render pool, hooks, persistent queue
The npm package crawlex receives a total of 80 weekly downloads. As such, crawlex popularity was classified as not popular.
We found that crawlex demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago.ย It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Anthropic says the directive cited national security concerns over a narrow jailbreak, but offered no specific technical details.

Security News
A network of 152 Chrome live wallpaper extensions hid ad tracking and made extension-driven traffic look like Google search clicks.

Company News
Socketโs first CISO brings deep experience securing high-growth SaaS companies as open source supply chain threats accelerate.