Security News
tea.xyz Spam Plagues npm and RubyGems Package Registries
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
web-crawler-nodejs
Advanced tools
Readme
This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using Node.js such as:
This project requires Node.js to run.
Install the dependencies
$ npm install
This project is particular for Imdb website https://www.imdb.com/. We can crawl both by using CLI or running web server and perform as a RESTful API.
To crawl by using cli, we need to indicate argvs after command node index.js
with these options:
Options:
-V, --version output the version number
-p, --project <project> select a specific project. Example -p imdb
-u, --url <url> a path/url to the crawling site
-i, --id <id> id of movie or list of ids of movie delimeter by -
-l, --list <id> id of list or list of ids of list delimeter by -
-o, --out <name> output the result as <name>.json
-h, --help display help for command
Example:
node index.js -p imdb -u http://www.imdb.com/list/ls066692796
node index.js -p imdb -l ls066692796
node index.js -p imdb -l ls066692796 -o result
node index.js -p imdb -i tt0100234
To run this project, following the installation and run command npm run start
or node index.js
.
Go to http://localhost:8000/imdb/:ids
with :ids
is a list of id (delimiter by -
) of the movies that you want to crawl
Append the querry ?out=true
at the end of URL to get the file output name output.json
in directory.
For example with movie Avengers: End Game
, its id is tt4154796
. Thus, go to http://localhost:8000/imdb/tt4154796
and view the result
For example with list of movies such as http://localhost:8000/imdb/tt6723592-tt9686708-tt8579674
To run this project, following the installation and run command npm run start
or node index.js
.
Go to http://localhost:8000/imdb/l/:ids
with :ids
is a list of list (delimiter by -
) of the movies that you want to crawl
Append the querry ?out=true
at the end of URL to get the file output name output.json
in directory.
For example with the list Web series
(https://www.imdb.com/list/ls095501479), its id is ls095501479
. Thus, go to http://localhost:8000/imdb/l/ls095501479
and view the result
FAQs
This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using [Node.js](https://nodejs.org/en/) such as:
The npm package web-crawler-nodejs receives a total of 7 weekly downloads. As such, web-crawler-nodejs popularity was classified as not popular.
We found that web-crawler-nodejs demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Tea.xyz, a crypto project aimed at rewarding open source contributions, is once again facing backlash due to an influx of spam packages flooding public package registries.
Security News
As cyber threats become more autonomous, AI-powered defenses are crucial for businesses to stay ahead of attackers who can exploit software vulnerabilities at scale.
Security News
UnitedHealth Group disclosed that the ransomware attack on Change Healthcare compromised protected health information for millions in the U.S., with estimated costs to the company expected to reach $1 billion.