Security News
Research
Data Theft Repackaged: A Case Study in Malicious Wrapper Packages on npm
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
scrapingbee
Advanced tools
ScrapingBee is a web scraping API that handles headless browsers and rotates proxies for you. The Node SDK makes it easier to interact with ScrapingBee's API.
You can install ScrapingBee Node SDK with npm.
npm install scrapingbee
The ScrapingBee Node SDK is a wrapper around the axios library. ScrapingBee supports GET and POST requests.
Signup to ScrapingBee to get your API key and some free credits to get started.
const scrapingbee = require('scrapingbee');
async function get(url) {
var client = new scrapingbee.ScrapingBeeClient('REPLACE-WITH-YOUR-API-KEY');
var response = await client.get({
// The URL you want to scrape
url: url,
params: {
// Block ads on the page you want to scrape
block_ads: false,
// Block images and CSS on the page you want to scrape
block_resources: true,
// Premium proxy geolocation
country_code: '',
// Control the device the request will be sent from
device: 'desktop',
// Use some data extraction rules
extract_rules: { title: 'h1' },
// Wrap response in JSON
json_response: false,
// JavaScript scenario to execute (clicking on button, scrolling ...)
js_scenario: {
instructions: [
{ wait_for: '#slow_button' },
{ click: '#slow_button' },
{ scroll_x: 1000 },
{ wait: 1000 },
{ scroll_x: 1000 },
{ wait: 1000 },
],
},
// Use premium proxies to bypass difficult to scrape websites (10-25 credits/request)
premium_proxy: false,
// Execute JavaScript code with a Headless Browser (5 credits/request)
render_js: true,
// Return the original HTML before the JavaScript rendering
return_page_source: false,
// Return page screenshot as a png image
screenshot: false,
// Take a full page screenshot without the window limitation
screenshot_full_page: false,
// Transparently return the same HTTP code of the page requested.
transparent_status_code: false,
// Wait, in miliseconds, before returning the response
wait: 0,
// Wait for CSS selector before returning the response, ex ".title"
wait_for: '',
// Set the browser window width in pixel
window_width: 1920,
// Set the browser window height in pixel
window_height: 1080,
},
headers: {
// Forward custom headers to the target website
key: 'value',
},
cookies: {
// Forward custom cookies to the target website
name: 'value',
},
// `timeout` specifies the number of milliseconds before the request times out.
// If the request takes longer than `timeout`, the request will be aborted.
timeout: 10000, // here 10sec, default is `0` (no timeout)
});
var decoder = new TextDecoder();
var text = decoder.decode(response.data);
console.log(text);
}
get('https://httpbin-scrapingbee.cleverapps.io/html').catch((e) => console.log('A problem occurs : ' + e.message));
/* -- output
<!DOCTYPE html><html lang="en"><head>...
*/
ScrapingBee takes various parameters to render JavaScript, execute a custom JavaScript script, use a premium proxy from a specific geolocation and more.
You can find all the supported parameters on ScrapingBee's documentation.
You can send custom cookies and headers like you would normally do with the requests library.
Here a little exemple on how to retrieve and store a screenshot from the ScrapingBee blog in its mobile resolution.
const fs = require('fs');
const scrapingbee = require('scrapingbee');
async function screenshot(url, path) {
var client = new scrapingbee.ScrapingBeeClient('REPLACE-WITH-YOUR-API-KEY');
var response = await client.get({
url: url,
params: {
screenshot: true, // Take a screenshot
screenshot_full_page: true, // Specify that we need the full height
window_width: 375, // Specify a mobile width in pixel
},
});
fs.writeFileSync(path, response.data);
}
screenshot('https://httpbin-scrapingbee.cleverapps.io/html', './httpbin.png').catch((e) =>
console.log('A problem occurs : ' + e.message)
);
The client includes a retry mechanism for 5XX responses.
const spb = require('scrapingbee');
async function get(url) {
let client = new spb.ScrapingBeeClient('REPLACE-WITH-YOUR-API-KEY');
let resp = await client.get({ url: url, params: { render_js: false }, retries: 5 });
let decoder = new TextDecoder();
let text = decoder.decode(resp.data);
console.log(text);
}
get('https://httpbin-scrapingbee.cleverapps.io/html').catch((e) => console.log('A problem occured: ' + e.message));
FAQs
ScrapingBee Node SDK
We found that scrapingbee demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 0 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Research
The Socket Research Team breaks down a malicious wrapper package that uses obfuscation to harvest credentials and exfiltrate sensitive data.
Research
Security News
Attackers used a malicious npm package typosquatting a popular ESLint plugin to steal sensitive data, execute commands, and exploit developer systems.
Security News
The Ultralytics' PyPI Package was compromised four times in one weekend through GitHub Actions cache poisoning and failure to rotate previously compromised API tokens.