Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Tantō is a simplified web scraping tool for Node.js that utilizes the excellent request and cheerio modules.
Install via npm:
$ npm install tanto
At it's simplest, Tantō takes a URL and returns a hash containing the body
of the response and the $
object, which allows extracting data out of the response with all of the jQuery syntax available in the cheerio module
var tanto = require('tanto');
tanto("http://www.npmjs.org", function(err, data) {
console.log(data.$('title').text()); // => 'npm'
});
Tantō also accepts an options hash in place of the URL string, and all request options are accepted here.
options = {
method: 'POST',
url: 'http://service.com/api',
json: {
test: 'abc123'
}
};
tanto(options, function(err, data) {
console.log(data.$('title').text());
});
For more complex data retrieval, you can pass Tantō a schema that maps keys to selectors and returns them in the data hash as values
.
schema = {
title: 'title',
home: 'li.home'
};
tanto({url: "http://www.npmjs.org", schema: schema}, function(err, data) {
console.log(data.values); // => { title: 'npm', home: 'Node.js Home' }
});
Schemas support an optional transform function which will run on the data returned from the selector.
toCaps = function(data) {
return data.toUpperCase();
};
schema = {
title: 'title',
home: {
selector: 'li.home',
transform: toCaps
}
};
tanto({url: "http://www.npmjs.org", schema: schema}, function(err, data) {
console.log(data.values.home); // -> 'NODE.JS HOME'
});
Main entry point to scrape a single page. The first argument can be either a url string or an options object.
url
: The url to scrapemethod
: HTTP method to useschema
: Scraping data schema. See below for details.context
: If specfied with a schema, this object will be used to store the returned values rather than creating a new one.The callback has two arguments, the standard error
and a data
object that contains the following keys:
body
: The raw body of the response.$
: The entry point for selectors.values
: If a schema was passed in, this hash will contain the keys and values that were succesfully gathered.errors
: If any of the schema keys took errors (or were not found), their keys and matching errors will be populated here.Schemas are a simple object containing the name of the data element to be returned and an options hash that is used to gather that data. The options hash can be specified as a selector string or as the full object if more options are needed.
Schema definitions support the following options:
selector
: The jQuery selector for this data element. Required.type
: The type of data to return from the selector. Defaults to text
, also supports html
, and value
.transform
: A transformation function to run on the retrieved data.eq
: If the selector used matches multiple elements, reduce it to the one at the specified index. Defaults to 0.// Simple
schema = {
home : 'li.home',
title : 'title'
}
// Full
schema = {
home: {selector: 'li.home', type: 'text', transform: caps},
input: {selector: 'input.name', type: 'value', eq: 0}
}
Transformation functions take data
and context
as parameters and return
the new value. Use the context
parameter to alter or create other keys in the returned values.
The following is an example of a transform function that sets the scraped name value to all lowercase and also saves an uppercase copy in a new key.
formatTitle = function(data, context) {
context.upperTitle = data.toUpperCase()
return data.toLowerCase()
};
schema = {
title: {
selector: 'title',
transform: formatTitle
}
};
// => {title: "npm", upperTitle: "NPM"}
To run the test suite:
$ npm install
$ npm test
Note that the tests will generate several requests to the NPM website.
(The ISC License)
Copyright (c) 2013 Charles Moncrief <cmoncrief@gmail.com>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
FAQs
Lightweight web scraping library
We found that tanto demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.