

ts-web-scraper
A powerful, type-safe web scraping library for TypeScript and Bun with zero external dependencies. Built entirely on Bun's native APIs for maximum performance and minimal footprint.
Features
- π Zero Dependencies - Built entirely on Bun native APIs
- πͺ Fully Typed - Complete TypeScript support with type inference
- β‘οΈ High Performance - Optimized for speed with native Bun performance
- π Rate Limiting - Built-in token bucket rate limiter with burst support
- πΎ Smart Caching - LRU cache with TTL support
- π Automatic Retries - Exponential backoff retry logic
- π Data Extraction - Powerful pipeline-based data extraction and transformation
- π― Validation - Built-in schema validation for extracted data
- π Monitoring - Performance metrics and analytics
- π Change Detection - Track content changes over time with diff algorithms
- π€ Ethical Scraping - Robots.txt support and user-agent management
- πͺ Session Management - Cookie jar and session persistence
- π Multiple Export Formats - JSON, CSV, XML, YAML, Markdown, HTML
- π Pagination - Automatic pagination detection and traversal
- π¨ Client-Side Rendering - Support for JavaScript-heavy sites
- π Comprehensive Docs - Full documentation with examples
Installation
bun add ts-web-scraper
Quick Start
import { createScraper } from 'ts-web-scraper'
const scraper = createScraper({
rateLimit: { requestsPerSecond: 2 },
cache: { enabled: true, ttl: 60000 },
retry: { maxRetries: 3 },
})
const result = await scraper.scrape('https://example.com', {
extract: doc => ({
title: doc.querySelector('title')?.textContent,
headings: Array.from(doc.querySelectorAll('h1')).map(h => h.textContent),
}),
})
console.log(result.data)
Core Concepts
Scraper
The main scraper class provides a unified API for all scraping operations:
import { createScraper } from 'ts-web-scraper'
const scraper = createScraper({
rateLimit: {
requestsPerSecond: 2,
burstSize: 5
},
cache: {
enabled: true,
ttl: 60000,
maxSize: 100
},
retry: {
maxRetries: 3,
initialDelay: 1000
},
monitor: true,
trackChanges: true,
cookies: { enabled: true },
})
Extract and transform data using pipelines:
import { extractors, pipeline } from 'ts-web-scraper'
const extractProducts = pipeline()
.step(extractors.structured('.product', {
name: '.product-name',
price: '.product-price',
rating: '.rating',
}))
.map('parse-price', p => ({
...p,
price: Number.parseFloat(p.price.replace(/[^0-9.]/g, '')),
}))
.filter('in-stock', products => products.every(p => p.price > 0))
.sort('by-price', (a, b) => a.price - b.price)
const result = await extractProducts.execute(document)
Change Detection
Track content changes over time:
const scraper = createScraper({ trackChanges: true })
const result1 = await scraper.scrape('https://example.com', {
extract: doc => ({ price: doc.querySelector('.price')?.textContent }),
})
const result2 = await scraper.scrape('https://example.com', {
extract: doc => ({ price: doc.querySelector('.price')?.textContent }),
})
Export Data
Export scraped data to multiple formats:
import { exportData, saveExport } from 'ts-web-scraper'
const json = exportData(data, { format: 'json', pretty: true })
const csv = exportData(data, { format: 'csv' })
await saveExport(data, 'output.csv')
await saveExport(data, 'output.json')
await saveExport(data, 'output.xml')
Advanced Features
Automatically traverse paginated content:
for await (const page of scraper.scrapeAll('https://example.com/posts', {
extract: doc => ({
posts: extractors.structured('article', {
title: 'h2',
content: '.content',
}).execute(doc),
}),
}, { maxPages: 10 })) {
console.log(`Page ${page.pageNumber}:`, page.data)
}
Performance Monitoring
Track and analyze scraping performance:
const scraper = createScraper({ monitor: true })
await scraper.scrape('https://example.com')
await scraper.scrape('https://example.com/page2')
const stats = scraper.getStats()
console.log(stats.totalRequests)
console.log(stats.averageDuration)
console.log(stats.cacheHitRate)
const report = scraper.getReport()
console.log(report)
Content Validation
Validate extracted data against schemas:
const result = await scraper.scrape('https://example.com', {
extract: doc => ({
title: doc.querySelector('title')?.textContent,
price: Number.parseFloat(doc.querySelector('.price')?.textContent || '0'),
}),
validate: {
title: { type: 'string', required: true },
price: { type: 'number', min: 0, required: true },
},
})
if (result.success) {
console.log(result.data.title, result.data.price)
}
else {
console.error(result.error)
}
Documentation
For full documentation, visit https://ts-web-scraper.netlify.app
Testing
bun test
All 482 tests passing with comprehensive coverage of:
- Core scraping functionality
- Rate limiting and caching
- Data extraction pipelines
- Change detection and monitoring
- Export formats
- Error handling and edge cases
Changelog
Please see our releases page for more information on what has changed recently.
Contributing
Please see CONTRIBUTING for details.
For help, discussion about best practices, or any other conversation that would benefit from being searchable:
Discussions on GitHub
For casual chit-chat with others using this package:
Join the Stacks Discord Server
Postcardware
"Software that is free, but hopes for a postcard." We love receiving postcards from around the world showing where Stacks is being used! We showcase them on our website too.
Our address: Stacks.js, 12665 Village Ln #2306, Playa Vista, CA 90094, United States π
We would like to extend our thanks to the following sponsors for funding Stacks development. If you are interested in becoming a sponsor, please reach out to us.
License
The MIT License (MIT). Please see LICENSE for more information.
Made with π