scrape-brrr
Simple web page scraping.
Install
yarn add scrape-brrr
Try it online
Usage examples
*The following examples use typescript style import. For plain nodejs, use
const { scrape } = require('scrape-brrr')
Dead-simple usage
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', 'div p:not(:first-child)')
Scrape single item
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [
{
name: 'stats',
selector: 'div',
},
{
name: 'another-stats',
selector: 'span',
},
])
Scrape multiple items
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'bestWofs',
selector: 'div .name',
many: true
}])
Nested fields
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'bestWofs',
selector: 'div',
many: true,
nested: [
{
name: 'name',
selector: 'span',
}
]
}])
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [
{
name: 'key',
selector: 'span',
attr: 'id'
},
{
name: 'otherLink',
selector: 'a',
attr: 'href'
},
])
Transform
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', [{
name: 'best',
selector: 'div',
many: true,
nested: [
{
name: 'rank',
selector: '.rank',
},
{
name: 'name',
selector: '.name',
}
],
transform: arr => arr[0]
}])
Website with dynamic content by js
Use puppeteer to load page with javascript to scrape dynamic content.
import { scrape } from 'scrape-brrr'
const data = await scrape('http://website.com', 'h1', { dynamic: true })
Other features
- Handle non-utf8 charset response from server (e.g. chinese encoding
big5
)
Development
yarn install
yarn test