![Oracle Drags Its Feet in the JavaScript Trademark Dispute](https://cdn.sanity.io/images/cgdhsj6q/production/919c3b22c24f93884c548d60cbb338e819ff2435-1024x1024.webp?w=400&fit=max&auto=format)
Security News
Oracle Drags Its Feet in the JavaScript Trademark Dispute
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
Ski-bi dibby dib yo da dub dub
Yo da dub dub
Ski-bi dibby dib yo da dub dub
Yo da dub dub
I'm the Scrapman!
###THE FASTEST SCRAPPER EVER*... AND IT SUPPORTS PARALLEL REQUESTS (*arguably)
Scrapman is a blazingly fast real (with Javascript executed) HTML scrapper, built from the ground up to support parallel fetches, with this you can get the HTML code for 50+ URLs in seconds (~30 seconds).
On NodeJS you can easily use request
to fetch the HTML from a page, but what if the page you are trying to load is NOT a static HTML page, but it has dynamic content added with Javascript? What do you do then? Well, you use The Scrapman.
It uses Electron to dynamically load web pages into several <webview>
within a single Chromium instance. This is why it fetches the HTML exactly as you would see it if you inspect the page with DevTools.
This is NOT an browser automation tool (yet), it's a node module that gives you the processed HTML from an URL, it focuses on multiple parallel operations and speed.
##USAGE
1.- Install it
npm install scrapman -S
2.- Require it
var scrapman = require("scrapman");
3.- Use it (as many times as you need)
Single URL request
scrapman.load("http://google.com", function(results){
//results contains the HTML obtained from the url
console.log(results);
});
Parallel URL requests
//yes, you can use it within a loop.
for(var i=1; i<=50; i++){
scrapman.load("https://www.website.com/page/" + i, function(results){
console.log(results);
});
}
##API
###- scrapman.load(url, callback)
####url
Type: String
The URL from which the HTML code is going to be obtained.
####callback(results)
Type: Function
The callback function to be executed when the loading is done. The loaded HTML will be in the results
parameter.
###- scrapman.configure(config)
####config The configuration object can set the following values
maxConcurrentOperations
: Integer - The intensity of processing, how many URLs can be loaded at the same time, default: 50
wait
: Integer - The amount of milliseconds to wait before returning the HTML code of a webpage after it has been completely loaded, default: 0
Feel free to open Issues to ask questions about using this package, PRs are very welcomed and encouraged.
SE HABLA ESPAÑOL
MIT © Daniel Nieto
FAQs
Retrieve real (with Javascript executed) HTML code from an URL, ultra fast and supports multiple parallel loading of webs
The npm package scrapman receives a total of 3 weekly downloads. As such, scrapman popularity was classified as not popular.
We found that scrapman demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Oracle seeks to dismiss fraud claims in the JavaScript trademark dispute, delaying the case and avoiding questions about its right to the name.
Security News
The Linux Foundation is warning open source developers that compliance with global sanctions is mandatory, highlighting legal risks and restrictions on contributions.
Security News
Maven Central now validates Sigstore signatures, making it easier for developers to verify the provenance of Java packages.