LLMCrawl JavaScript SDK

The official JavaScript SDK for LLMCrawl, providing a simple and powerful way to scrape websites, crawl multiple pages, and extract structured data using AI.
Installation
npm install @llmcrawl/llmcrawl-js
Note: Version 1.0.0 introduces breaking changes. If you're upgrading from 0.x, please use named imports: import { LLMCrawl } from '@llmcrawl/llmcrawl-js'
Quick Start
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
apiKey: "your-api-key-here",
});
const result = await client.scrape("https://example.com");
console.log(result.data?.markdown);
Features
- 🌐 Single Page Scraping - Extract content from individual web pages
- 🕷️ Website Crawling - Crawl entire websites with customizable options
- 🗺️ Site Mapping - Get all URLs from a website
- 🤖 AI-Powered Extraction - Extract structured data using custom schemas
- 📷 Screenshot Capture - Take screenshots of web pages
- ⚙️ Flexible Configuration - Extensive customization options
API Reference
Constructor
import { LLMCrawl } from "@llmcrawl/llmcrawl-js";
const client = new LLMCrawl({
apiKey: "your-api-key",
baseUrl: "https://api.llmcrawl.dev",
});
Scraping
scrape(url, options?)
Scrape a single webpage and extract its content.
const result = await client.scrape("https://example.com", {
formats: ["markdown", "html", "links"],
headers: {
"User-Agent": "Mozilla/5.0...",
},
waitFor: 3000,
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
price: { type: "number" },
description: { type: "string" },
},
required: ["title", "price"],
},
},
});
if (result.success) {
console.log("Markdown:", result.data.markdown);
console.log("Extracted data:", result.data.extract);
console.log("Links:", result.data.links);
}
Options:
formats: Array of formats to return ('markdown', 'html', 'rawHtml', 'links', 'screenshot', 'screenshot@fullPage')
headers: Custom HTTP headers
includeTags: HTML tags to include in output
excludeTags: HTML tags to exclude from output
timeout: Request timeout in milliseconds (1000-90000)
waitFor: Delay before capturing content (0-60000ms)
extract: AI extraction configuration
webhookUrls: URLs to send results to
metadata: Additional metadata to include
Crawling
crawl(url, options?)
Start a crawl job to scrape multiple pages from a website.
const crawlResult = await client.crawl("https://example.com", {
limit: 100,
maxDepth: 3,
includePaths: ["/blog/*", "/docs/*"],
excludePaths: ["/admin/*"],
scrapeOptions: {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
content: { type: "string" },
},
},
},
},
});
if (crawlResult.success) {
console.log("Crawl started with ID:", crawlResult.id);
}
Options:
limit: Maximum number of pages to crawl
maxDepth: Maximum crawl depth
includePaths: Array of path patterns to include
excludePaths: Array of path patterns to exclude
allowBackwardLinks: Allow crawling backward links
allowExternalLinks: Allow crawling external domains
ignoreSitemap: Ignore robots.txt and sitemap.xml
scrapeOptions: Scraping options for each page
webhookUrls: URLs for webhook notifications
webhookMetadata: Additional webhook metadata
getCrawlStatus(jobId)
Check the status of a crawl job.
const status = await client.getCrawlStatus("crawl-job-id");
if (status.success) {
console.log("Status:", status.status);
console.log("Progress:", `${status.completed}/${status.total}`);
console.log("Data:", status.data);
}
cancelCrawl(jobId)
Cancel a running crawl job.
const result = await client.cancelCrawl("crawl-job-id");
if (result.success) {
console.log("Crawl cancelled:", result.message);
}
Site Mapping
map(url, options?)
Get all URLs from a website without scraping their content.
const mapResult = await client.map("https://example.com", {
limit: 1000,
includeSubdomains: true,
search: "documentation",
});
if (mapResult.success) {
console.log("Found URLs:", mapResult.links);
}
Options:
limit: Maximum number of links to return (1-5000)
includeSubdomains: Include subdomain URLs
search: Filter links by search query
ignoreSitemap: Ignore robots.txt and sitemap.xml
includePaths: Path patterns to include
excludePaths: Path patterns to exclude
LLMCrawl supports advanced AI-powered data extraction using custom schemas:
const result = await client.scrape("https://example-store.com/product/123", {
formats: ["markdown"],
extract: {
mode: "llm",
schema: {
type: "object",
properties: {
productName: { type: "string" },
price: { type: "number" },
description: { type: "string" },
inStock: { type: "boolean" },
reviews: {
type: "array",
items: {
type: "object",
properties: {
rating: { type: "number" },
comment: { type: "string" },
author: { type: "string" },
},
},
},
},
required: ["productName", "price"],
},
systemPrompt: "Extract product information from this e-commerce page.",
prompt: "Focus on getting accurate pricing and availability information.",
},
});
if (result.success && result.data.extract) {
const productData = JSON.parse(result.data.extract);
console.log("Product:", productData.productName);
console.log("Price:", productData.price);
console.log("In Stock:", productData.inStock);
}
Error Handling
All methods return a response object with a success field:
const result = await client.scrape("https://example.com");
if (result.success) {
console.log(result.data);
} else {
console.error("Error:", result.error);
console.error("Details:", result.details);
}
Type Definitions
The SDK includes comprehensive TypeScript types:
import {
ScrapeResponse,
CrawlResponse,
CrawlStatusResponse,
Document,
ExtractOptions,
ScrapeOptions,
CrawlerOptions,
} from "@llmcrawl/llmcrawl-js";
Examples
E-commerce Product Scraping
const product = await client.scrape("https://store.example.com/product/123", {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
rating: { type: "number" },
availability: { type: "string" },
},
},
},
});
News Article Extraction
const article = await client.scrape("https://news.example.com/article/123", {
formats: ["markdown", "html"],
extract: {
schema: {
type: "object",
properties: {
headline: { type: "string" },
author: { type: "string" },
publishDate: { type: "string" },
content: { type: "string" },
tags: { type: "array", items: { type: "string" } },
},
},
},
});
Documentation Crawling
const crawl = await client.crawl("https://docs.example.com", {
limit: 500,
includePaths: ["/docs/*", "/api/*"],
excludePaths: ["/docs/internal/*"],
scrapeOptions: {
formats: ["markdown"],
extract: {
schema: {
type: "object",
properties: {
title: { type: "string" },
section: { type: "string" },
content: { type: "string" },
},
},
},
},
});
if (crawl.success) {
let status = await client.getCrawlStatus(crawl.id);
while (status.success && status.status === "scraping") {
await new Promise((resolve) => setTimeout(resolve, 5000));
status = await client.getCrawlStatus(crawl.id);
}
if (status.success && status.status === "completed") {
console.log("Crawl completed!");
console.log("Pages scraped:", status.data.length);
}
}
Support
License
MIT License - see LICENSE file for details.