scrape-it
A Node.js scraper for humans.
:cloud: Installation
$ npm i --save scrape-it
:clipboard: Example
const scrapeIt = require("scrape-it");
scrapeIt("http://ionicabizau.net", {
title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}).then(page => {
console.log(page);
});
scrapeIt("http://ionicabizau.net", {
articles: {
listItem: ".article"
, data: {
createdAt: {
selector: ".date"
, convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
listItem: ".tags > span"
}
, content: {
selector: ".article-content"
, how: "html"
}
}
}
, pages: {
listItem: "li.page"
, name: "pages"
, data: {
title: "a"
, url: {
selector: "a"
, attr: "href"
}
}
}
, title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}, (err, page) => {
console.log(err || page);
});
:question: Get Help
There are few ways to get help:
- Please post questions on Stack Overflow. You can open issues with questions, as long you add a link to your Stack Overflow question.
- For bug reports and feature requests, open issues. :bug:
- For direct and quick help from me, you can use Codementor. :rocket:
:memo: Documentation
scrapeIt(url, opts, cb)
A scraping module for humans.
Params
- String|Object
url
: The page url or request options. - Object
opts
: The options passed to scrapeHTML
method. - Function
cb
: The callback function.
Return
- Promise A promise object.
scrapeIt.scrapeHTML($, opts)
Scrapes the data in the provided element.
Params
-
Cheerio $
: The input element.
-
Object opts
: An object containing the scraping information.
If you want to scrape a list, you have to use the listItem
selector:
listItem
(String): The list item selector.data
(Object): The fields to include in the list objects:
<fieldName>
(Object|String): The selector or an object containing:
selector
(String): The selector.convert
(Function): An optional function to change the value.how
(Function|String): A function or function name to access the
value.attr
(String): If provided, the value will be taken based on
the attribute name.trim
(Boolean): If false
, the value will not be trimmed
(default: true
).closest
(String): If provided, returns the first ancestor of
the given element.eq
(Number): If provided, it will select the nth element.listItem
(Object): An object, keeping the recursive schema of
the listItem
object. This can be used to create nested lists.
Example:
{
articles: {
listItem: ".article"
, data: {
createdAt: {
selector: ".date"
, convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
listItem: ".tags > span"
}
, content: {
selector: ".article-content"
, how: "html"
}
, traverseOtherNode: {
selector: ".upperNode"
, closest: "div"
, convert: x => x.length
}
}
}
}
If you want to collect specific data from the page, just use the same
schema used for the data
field.
Example:
{
title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}
Return
:yum: How to contribute
Have an idea? Found a bug? See how to contribute.
:sparkling_heart: Support my projects
I open-source almost everything I can, and I try to reply everyone needing help using these projects. Obviously,
this takes time. You can integrate and use these projects in your applications for free! You can even change the source code and redistribute (even resell it).
However, if you get some profit from this or just want to encourage me to continue creating stuff, there are few ways you can do it:
-
Starring and sharing the projects you like :rocket:
-
—You can make one-time donations via PayPal. I'll probably buy a coffee tea. :tea:
-
—Set up a recurring monthly donation and you will get interesting news about what I'm doing (things that I don't share with everyone).
-
Bitcoin—You can send me bitcoins at this address (or scanning the code below): 1P9BRsmazNQcuyTxEqveUsnf5CERdq35V6
Thanks! :heart:
:dizzy: Where is this library used?
If you are using this library in one of your projects, add it in this list. :sparkles:
3abn
—A 3ABN radio client in the terminal.bandcamp-scraper
(by Simon Thiboutôt)—A scraper for https://bandcamp.comcevo-lookup
(by Zack Boehm)—Searchs the CEVO Suspension List for bans by SteamIDcodementor
—A scraper for codementor.io.degusta-scrapper
(by yohendry hurtado)—desgusta scrapper for alexa skillproxylist
(by self_refactor)—Get free proxy listrs-api
(by Alex Kempf)—Simple wrapper for RuneScape APIs written in node.sahibinden
(by Cagatay Cali)—Simple sahibinden.com botsahibindenServer
(by Cagatay Cali)—Simple sahibinden.com bot server sidesgdq-collector
(by Benjamin Congdon)—Collects Twitch / Donation information and pushes data to Firebasetrump-cabinet-picks
(by Linda Haviv)—NYT cabinet predictions for Trump admin.ubersetzung
(by self_refactor)—translate words with examples from German to Englishui-studentsearch
(by Rakha Kanz Kautsar)—API for majapahit.cs.ui.ac.id/studentsearch
:scroll: License
MIT © Ionică Bizău