
Security News
The Hidden Blast Radius of the Axios Compromise
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.
@coya/web-scraper
Advanced tools
Web scraper on top of PhantomJS or Chromium.
If you chose to use PhantomJS, the module is designed as a connection client/server between the PhantomJS web scraper server and a client acting like a driver and sending scraping HTTP requests to the server.
Chromium is different because it is driven directly from NodeJS.
npm install @coya/web-scraper
git clone https://github.com/Cooya/WebScraper
cd WebScraper
npm install // it will also install the development dependencies
npm install phantomjs -g // if you need PhantomJS, install it globally
npm run build
npm run example // run the example script in "examples" folder
The package allows to inject JS function :
const { ChromiumScraper } = require('@coya/web-scraper');
// if you want to use PhantomJS instead of Chromium
// const { PhantomScraper } = require('@coya/web-scraper');
const scraper = ChromiumScraper.getInstance();
const getLinks = function() { // return all links from the requested page
return $('a').map(function(i, elt) {
return $(elt).attr('href');
}).get();
};
scraper.request({
url: 'cooya.fr',
fct: getLinks // function injected in the page environment
})
.then(function(result) {
console.log(result); // returned value of the injected function
scraper.close(); // end the client/server connection and kill the web scraper subprocess
}, function(error) {
console.error(error);
scraper.close();
});
Or to inject JS function from an external script :
const { ChromiumScraper } = require('@coya/web-scraper');
// if you want to use PhantomJS instead of Chromium
// const { PhantomScraper } = require('@coya/web-scraper');
const scraper = ChromiumScraper.getInstance();
scraper.request({
url: 'cooya.fr',
fct: __dirname + '/externalScript.js', // external script exporting the function to be injected
})
.then(function(result) {
console.log(result); // returned value of the injected function
scraper.close(); // end the client/server connection and kill the web scraper subprocess
}, function(error) {
console.error(error);
scraper.close();
});
externalScript.js :
module.exports = function() { // return all links from the requested page
return $('a').map(function(i, elt) {
return $(elt).attr('href');
}).get();
};
The ScraperClient object is a singleton, only one client can be created, so this method is required to get the client instance.
Send a request to a specific url and inject JavaScript into the page associated. Return a promise with the result in parameter.
| Parameter | Type | Description | Default value |
|---|---|---|---|
| params | object | see below for details about this | none |
Terminate the PhantomJS web scraper process that will allow to end the current NodeJS script properly.
| Parameter | Type | Description | Required |
|---|---|---|---|
| url | string | target url | yes |
| fct | function | JS function to inject into the page | yes |
| fct | string | path to script path and function to inject separated by hash key (e.g. "path/to/script/script.js#functionToCall") | yes |
| referer | string | referer header parameter set in each request | optional |
| args | object | object passed to the injected function | optional |
| debug | boolean | enable the debug mode (verbose) | optional |
FAQs
Web scraper on top of PhantomJS or Chromium
We found that @coya/web-scraper demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
The Axios compromise shows how time-dependent dependency resolution makes exposure harder to detect and contain.

Research
A supply chain attack on Axios introduced a malicious dependency, plain-crypto-js@4.2.1, published minutes earlier and absent from the project’s GitHub releases.

Research
Malicious versions of the Telnyx Python SDK on PyPI delivered credential-stealing malware via a multi-stage supply chain attack.