Research
Security News
Malicious npm Package Targets Solana Developers and Hijacks Funds
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
A JavaScript library that allows for the quick transformation of DOM documents into useful formats.
A JavaScript library that allows for the quick transformation of DOM documents into useful formats.
$ npm install chowdown
Let's suppose there's a webpage, http://somewebpage.com
with the following
markup:
<div>
<div class="author">
<a href="/dennis" class="name">Dennis Reynolds</a>
<span class="age">41</span>
<img src="dennis.jpg"/>
<div class="book">
<span class="title">The Dennis System</span>
<span class="year">2009</span>
</div>
<div class="book">
<span class="title">Chardee MacDennis: A Guide</span>
<span class="year">2011</span>
</div>
</div>
<div class="author">
<a href="/stephen" class="name">Stephen King</a>
<span class="age">69</span>
<img src="stephen.jpg"/>
<div class="book">
<span class="title">Clown Town</span>
<span class="year">1990</span>
</div>
</div>
<a class="next" href="/search?page=2"/>
</div>
To quickly pull out the name and age of each author into an array of objects, we can do the following:
const chowdown = require('chowdown');
// Returns a promise
chowdown('http://somewebpage.com')
.collection('.author', {
name: '.name',
age: '.age'
});
This will resolve to:
[
{ name: 'Dennis Reynolds', age: '41'},
{ name: 'Stephen King', age: '69'}
]
When executed, all chowdown queries return an instance of a bluebird Promise.
Chowdown is built on top of cheerio and hence it uses the familiar jQuery selector format.
However, chowdown's selectors also make it possible to get a DOM element's attribute by appending the attribute's name to the end of a selector (following a /
).
This makes getting the src
attribute of each author's image easy:
chowdown('http://somewebpage.com')
.collection('.author', {
name: '.name',
age: '.age',
image: 'img/src'
});
This will resolve to:
[
{ name: 'Dennis Reynolds', age: '41', image: 'dennis.jpg'},
{ name: 'Stephen King', age: '69', image: 'stephen.jpg'}
]
If no attribute is specified in the selector for simple types of queries (i.e string
or number
queries), then chowdown will automatically grab an element's inner text.
Using chowdown, we can construct much more complex queries. It's possible to construct queries for use inside of other queries.
If we wanted to retrieve each of the author's books, we could do the following:
chowdown('http://somewebpage.com')
.collection('.author', {
name: '.name',
age: '.age',
books: chowdown.query.collection('.book', {
title: '.title',
year: '.year'
})
});
// or, alternatively:
chowdown('http://somewebpage.com')
.collection('.author', {
name: '.name',
age: '.age',
books: (author) => author.collection('.book', {
title: '.title',
year: '.year'
})
});
These will both resolve to:
[
{
name: 'Dennis Reynolds',
age: '41',
books: [
{
title: 'The Dennis System',
year: '2009'
},
{
title: 'Chardee MacDennis: A Guide',
year: '2011'
}
]
},
{
name: 'Stephen King',
age: '69',
books: [
{
title: 'Clown Town',
year: '1990'
}
]
}
]
As seen above, it's possible to take shortcuts to describe queries. Anywhere a
string is found in place of a query, it will be used as the selector
parameter in a string query:
let scope = chowdown('http://somewebpage.com');
scope.collection('.author', '.name')
// => Resolves to: ['Dennis Reynolds', 'Stephen King']
scope.collection('.author', chowdown.query.string('.name'))
// => Resolves to: ['Dennis Reynolds', 'Stephen King']
Likewise, anywhere an object is found in place of a query, it will be used as the pick
parameter in an object query.
let scope = chowdown('http://somewebpage.com');
scope.collection('.author', {name: '.name'})
// => Resolves to: [{name: 'Dennis Reynolds'}, {name: 'Stephen King'}]
scope.collection('.author', chowdown.query.object({name: '.name'}))
// => Resolves to: [{name: 'Dennis Reynolds'}, {name: 'Stephen King'}]
Finally, anywhere a function is found in place of a query, it will be used as the fn
parameter in a callback query.
let scope = chowdown('http://somewebpage.com');
scope.collection('.author', (author) => author.string('.name'))
// => Resolves to: ['Dennis Reynolds', 'Stephen King']
scope.collection('.author', chowdown.query.callback((author) => author.string('.name')))
// => Resolves to: ['Dennis Reynolds', 'Stephen King']
Manually created queries can also be executed directly on a Scope
like this:
let scope = chowdown('http://somewebpage.com');
scope.execute(chowdown.query.string('.author:nth-child(1) .name'))
// => Resolves to: 'Dennis Reynolds'
The library's main function is actually an alias for chowdown.request
; this is one of three functions that
allow for the creation of Scope
objects:
Issues a request using request-promise
with the given
request object or uri string and returns a Scope
created from the response.
request
{string|object}
Either a uri or a request object that will be passed to request-promise
.[options]
{object}
An object of configuration options.
[client=rp]
{function}
A client function to use in place of request-promise
. It will be passed
a request object or uri and should return a promise that resolves to a string
or cheerio
object.Scope
A scope wrapping the response of the request.Reads from the file located at file
and returns a Scope
created from the contents of the file.
file
{string}
The filename.Scope
A scope wrapping the file's contents.Load a DOM document directly from a cheerio object or string and returns
a Scope
created from this document.
body
{cheerio|string}
Either an existing cheerio object
or a DOM string.Scope
A scope wrapping the body.Scope instances have methods that allow you to query directly on a document (or part of a document):
Executes the given query
on the document used by this scope.
query
{Query<T>}
The query to execute within this scope.Promise<T>
A promise resolving to the result of the query.let scope = chowdown.request('http://somewebpage.com');
let query = chowdown.query.string('.author:nth-child(1) .name');
scope.execute(query);
This will resolve to:
'Dennis Reynolds'
The main chowdown
function has a query
property containing methods that allow
for the creation of different types of queries:
All of the following examples use the same sample uri and markup as before.
Creates a query to find a string
at the given selector
in a document.
Any retrieved non-string value will be coerced into a string
.
selector
{string}
A selector to find the string in a document.[options]
{object}
An object of configuration options.
[default='']
{string}
The default value to return if no string is found.[throwOnMissing=false]
{boolean}
A flag that dictates whether or not to throw an error if no string is found.[format=[]]
{function|function[]}
A function or array of functions used to format the retrieved string.Query<string>
The constructed string query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.string('.author:nth-child(1) .name');
scope.execute(query);
This will resolve to:
'Dennis Reynolds'
Creates a query to find a number
at the given selector
in a document.
Any retrieved non-number value will be coerced into a number
.
selector
{string}
A selector to find the number in a document.[options]
{object}
An object of configuration options.
[default=NaN]
{number}
The default value to return if no number is found.Query<number>
The constructed number query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.number('.author:nth-child(1) .age');
scope.execute(query);
This will resolve to:
41
Creates a query to find an array
of values such that each value in the array is the result of the inner
query
executed on a child document. The set of child documents is pointed to by the selector
parameter.
selector
{string}
A selector to find the child documents in a document.inner
{Query<T>}
The inner query to execute on each child document.[options]
{object}
An object of configuration options.
[default=[]]
{any[]}
The default value to return if no child documents are found.[filter]
{function}
A function used to filter the resulting array. Every item in the array
is passed through this function and the values for which the function is truthy are kept.Query<T[]>
The constructed collection query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.collection('.author', chowdown.query.number('.age'));
scope.execute(query);
This will resolve to:
[41, 69]
Creates a query that will find an object in a document such that each value in the object is the result of the corresponding query in the pick
parameter.
pick
{object}
The object of queries to map.[options]
{object}
An object of configuration options.
Query<object>
The constructed object query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.object({
name: chowdown.query.string('.author:nth-child(1) .name'),
age: chowdown.query.number('.author:nth-child(1) .age')
});
scope.execute(query);
This will resolve to:
{
name: 'Dennis Reynolds',
age: 41
}
Creates a query that calls fn
with the underlying cheerio function
and cheerio context. The result of this query will be the result of this call.
fn
{function}
The raw function to be called with the cheerio instance.[options]
{object}
An object of configuration options.
[default=undefined]
{any}
The default value to return if undefined is returned from the function.Query<any>
A promise that resolves to the result of the raw function.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.raw(($, context) => $('.author:nth-child(2) .name').text());
scope.execute(query);
This will resolve to:
'Stephen King'
Creates a query that will find a string
in a document using the given selector
and
perform a regex match on it using pattern
.
selector
{string}
A selector to find the string in a document.pattern
{RegExp}
The pattern used to match on the retrieved string.[group]
{number}
The index of a matched group to return.[options]
{object}
An object of configuration options.
[default=[]]
{any[]}
The default value to return if no matches are made.Query<string|string[]>
The constructed regex query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.regex('.author:nth-child(2)', /(Stephen) (.*)/);
scope.execute(query);
This will resolve to (roughly):
['Stephen King', 'Stephen', 'King']
If we want a specific group:
let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.regex('.author:nth-child(2)', /(Stephen) (.*)/, 2);
scope.execute(query);
This will resolve to:
'King'
Creates a query that executes the inner
query within the context of a child document pointed to by the given selector
.
selector
{string}
A selector to find the child document.inner
{Query<T>}
The inner query to execute on the child document.[options]
{object}
An object of configuration options.
[default=undefined]
{any}
The default value to return if the context can't be found.Query<T>
The constructed context query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.context('.author:nth-child(1) .book:nth-of-type(1)',
chowdown.query.object({
title: '.title',
year: (book) => book.number('.year')
})
);
scope.execute(query);
This will resolve to:
{
title: 'The Dennis System',
year: 2009
}
Creates a query that finds a URI in a document using the given selector
and
resolves it relative to the given base
URI. Will automatically attempt to grab the href
attribute of the
element specified by selector
.
If no URI is retrieved from the document, chowdown will not attempt to resolve the default value agsint the base
URI.
selector
{string}
A selector to find the URI.[base]
{string}
The base URI for the retrieved URI.[options]
{object}
An object of configuration options.
Query<string>
The constructed URI query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.uri('.author:nth-child(1) .name', 'http://somewebpage.com');
scope.execute(query);
This will resolve to:
'http://somewebpage.com/dennis'
Creates a query that follows the URI pointed to by the uri
query and executes the inner
query
on the document at this URI.
uri
{string|object|function}
A query to find the URI.inner
{Query<T>}
A query to execute on the document at the URI.[options]
{object}
An object of configuration options.
[default=undefined]
{any}
The default value to return if there's an error accessing the page.[client=rp]
{function}
A client function to use in place of request-promise
. It will be passed
a request object or URI and should return a promise that resolves to a string
or cheerio
object.[request]
{object}
An object of other request options to pass to client
.Query<T>
The constructed follow query.In the sample markup (for the uri http://somewebpage.com
), we can see the first author's div
contains a link to http://somewebpage.com/dennis
.
Let's assume the markup at this uri is as follows:
<a id="favourite-food">DeVitos</a>
We can use a follow query to read such important information like this:
let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.follow(
(doc) => doc.uri('.author:nth-child(1) .name'),
(otherPage) => otherPage.string('#favourite-food')
);
scope.execute(query);
This will resolve to:
'DeVitos'
Creates a query that executes the inner
query on multiple pages. The link to
the next page is pointed to by the uri
query. Pagination will stop after
max
pages have been requested. If max
is a function, pagination will stop whenever it
returns false
.
inner
{Query<T>}
A query to execute on each document.uri
{string|object|function}
A query to find the next page's URI in each document.[max=Infinity]
{number|function}
The maximum number of pages to retrieve or a function that takes the current number of pages and the last page and returns false when it's desirable to stop.[options]
{object}
An object of configuration options.
[default=undefined]
{any}
The default value to return if there's an error accessing a page.[client=rp]
{function}
A client function to use in place of request-promise
. It will be passed
a request object or URI and should return a promise that resolves to a string
or cheerio
object.[request]
{object}
An object of other request options to pass to client
.[merge=flatten]
{function}
The function used to merge the paginated results. Takes one argument pages
- an array of all page results. Uses lodash.flatten
by default.Query<any>
The constructed paginate query.In the sample markup, there exists a link to the next page of results http://somewebpage.com/search?page=2
at the bottom of the page.
Let's assume the markup at this page is as follows:
<div>
<div class="author">
<a href="/william" class="name">William Shakespeare</a>
<span class="age">453</span>
<img src="william.jpg"/>
<div class="book">
<span class="title">Hamlet</span>
<span class="year">1600</span>
</div>
</div>
<a class="next" href="/search?page=3"/>
</div>
We can execute queries on both the first page and this page (and as many more as we'd like) with the following query:
let scope = chowdown('http://somewebpage.com');
let names = chowdown.query.collection('.author', '.name');
// The last argument is the maximum number of pages to read.
let pages = chowdown.query.paginate(names, '.next', 2);
scope.execute(query);
This will resolve to:
['Dennis Reynolds', 'Stephen King', 'William Shakespeare']
Creates a query that calls fn
with a Scope
that wraps a document (or part of
a document) and returns the result of this call.
fn
{function}
A function to call with a Scope
for a document.[options]
{object}
An object of configuration options.
Query<any>
The constructed callback query.let scope = chowdown('http://somewebpage.com');
let query = chowdown.query.callback((document) => document.string('.author:nth-child(2) .name'));
scope.execute(query);
This will resolve to:
'Stephen King'
If you have cloned this repository, it's possible to run the tests by executing the following command from the root of the repository:
$ npm test
See the LICENSE file for details.
FAQs
A JavaScript library that allows for the quick transformation of DOM documents into useful formats.
The npm package chowdown receives a total of 534 weekly downloads. As such, chowdown popularity was classified as not popular.
We found that chowdown demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Security News
A malicious npm package targets Solana developers, rerouting funds in 2% of transactions to a hardcoded address.
Security News
Research
Socket researchers have discovered malicious npm packages targeting crypto developers, stealing credentials and wallet data using spyware delivered through typosquats of popular cryptographic libraries.
Security News
Socket's package search now displays weekly downloads for npm packages, helping developers quickly assess popularity and make more informed decisions.