
Product
Announcing Socket Fix 2.0
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
A simple node package scraping a web page and spitting the results in a CSV file.
A simple node package scraping a web page and spitting the results in a CSV file.
Install the module with: npm install -g scrape2csv
Scraping is pretty straightforward :
var scrape2csv = require('scrape2csv');
//let's scrape a very cool website
var url_to_scrape = "http://www.echojs.com";
var jquery_selector = "article";
//each article of the page will go through this
var handler = function($, elem, index){
var title = $(elem).find("h2 a").text();
var news_url = $(elem).find("p>a").attr("href");
//returning a new row for the csv
return [index,title,"http://www.echojs.com"+news_url];
}
//optional CSV header
var header = ["#", "Title of the article", "URL"];
scrape2csv.scrape("/tmp/echojs.csv", url_to_scrape, jquery_selector, handler, header);
Each element matching the jquery selector will call the handler provided as a parameter. The array returned by the handler will create a new csv line.
That's all folks!
Copyright (c) 2012 Fabien Allanic
Licensed under the MIT license.
FAQs
A simple node package scraping a web page and spitting the results in a CSV file.
We found that scrape2csv demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket Fix 2.0 brings targeted CVE remediation, smarter upgrade planning, and broader ecosystem support to help developers get to zero alerts.
Security News
Socket CEO Feross Aboukhadijeh joins Risky Business Weekly to unpack recent npm phishing attacks, their limited impact, and the risks if attackers get smarter.
Product
Socket’s new Tier 1 Reachability filters out up to 80% of irrelevant CVEs, so security teams can focus on the vulnerabilities that matter.