
Research
npm Malware Targets Telegram Bot Developers with Persistent SSH Backdoors
Malicious npm packages posing as Telegram bot libraries install SSH backdoors and exfiltrate data from Linux developer machines.
Scrape a website efficiently, block by block, page by page. Based on cheerio and cURL.
Scrape a website efficiently, block by block, page by page.
This is a Cheerio based scraper, useful to extract data from a website using CSS selectors.
The motivation behind this package is to provide a simple cheerio-based scraping tool, able to divide a website into blocks, and transform each block into a JSON object using CSS selectors.
https://github.com/cheeriojs/cheerio
https://github.com/chriso/curlrequest
https://github.com/kriskowal/q
https://github.com/dharmafly/noodle
Install the module with: npm install cheers
Configuration options:
config.url
: the URL to scrapeconfig.blockSelector
: the CSS selector to apply on the page to divide it in scraping blocks. This field is optional (will use "body" by default)config.scrape
: the definition of what you want to extract in each block. Each key has two mandatory attributes : selector
(a CSS selector or .
to stay on the current node) and extract
. The possible values for extract
are text, html, outerHTML, a RegExp or the name of an attribute of the html element (e.g. "href")config.curlOptions
: additionnal options you want to pass to curl. See the documentation from https://github.com/chriso/curlrequest for more information.var cheers = require('cheers'); //let's scrape this excellent JS news website var config = { url: "http://www.echojs.com/", curlOptions: { 'useragent': 'Cheers' }, blockSelector: "article", scrape: { title: { selector: "h2 a", extract: "text" }, link: { selector: "h2 a", extract: "href" }, articleInnerHtml: { selector: ".", extract: "html" }, articleOuterHtml: { selector: ".", extract: "outerHTML" }, articlePublishedTime: { selector: 'p', extract: /\d* (?:hour[s]?|day[s]?) ago/ } } }; cheers.scrape(config).then(function (results) { console.log(JSON.stringify(results)); }).catch(function (error) { console.error(error); });
Instead of using cheers with javascript, you can also use the provided shell script that encapsulates the library.
To install the shell script globally on your system, please run the command
npm install cheers -g
or npm install cheers --global
You'll then be able to use cheers command from a terminal.
Cheers will scrape the content according to a config file similar to what is described in the above documentation, except it will take the form of a JSON file.
####Example of config file (same config as above) :
config.json :
{ "url": "http://www.echojs.com/", "blockSelector": "article", "scrape": { "title": { "selector": "h2 a", "extract": "text" }, "link": { "selector": "h2 a", "extract": "href" }, "articleInnerHtml": { "selector": ".", "extract": "html" }, "articleOuterHtml": { "selector": ".", "extract": "outerHTML" }, "articlePublishedTime": { "selector": "p", "extract": "/\\d* (?:hour[s]?|day[s]?) ago/" } } }
The main difference is found when you want to use a regular expression, you have to escape all the \ to respect the JSON format.
####Usage example :
cheers -conf /directory/config.json
Tests can be run by typing the command npm test
If you don't want to use the test dependencies, please use npm install --production
when installing.
Cheers!
Copyright (c) 2015 Fabien Allanic
Licensed under the MIT license.
FAQs
Scrape a website efficiently, block by block, page by page. Based on cheerio and cURL.
The npm package cheers receives a total of 21 weekly downloads. As such, cheers popularity was classified as not popular.
We found that cheers demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Malicious npm packages posing as Telegram bot libraries install SSH backdoors and exfiltrate data from Linux developer machines.
Security News
pip, PDM, pip-audit, and the packaging library are already adding support for Python’s new lock file format.
Product
Socket's Go support is now generally available, bringing automatic scanning and deep code analysis to all users with Go projects.