🚨 Shai-Hulud Strikes Again:834 Packages Compromised.Technical Analysis →

Book a Demo Install Sign in

web-crawler-nodejs

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

web-crawler-nodejs

This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using [Node.js](https://nodejs.org/en/) such as:

latest

Source

npm

Version: 1.0.3

Version published: 2 years ago

Maintainers: 1

Created: 2 years ago

Source

Crawling data from website using Node.js

This is personal project for web crawling/scraping topics. It includes few ways to crawl the data mainly using Node.js such as:

Imdb crawling (Node.js + Cheerio +Request)

Installation

This project requires Node.js to run.
Install the dependencies

$ npm install

Imdb Crawling

This project is particular for Imdb website https://www.imdb.com/. We can crawl both by using CLI or running web server and perform as a RESTful API.

Crawl by using CLI

To crawl by using cli, we need to indicate argvs after command node index.js with these options:

Options:
  -V, --version            output the version number
  -p, --project <project>  select a specific project. Example -p imdb
  -u, --url <url>          a path/url to the crawling site
  -i, --id <id>            id of movie or list of ids of movie delimeter by -
  -l, --list <id>          id of list or list of ids of list delimeter by -
  -o, --out <name>         output the result as <name>.json
  -h, --help               display help for command

Example:

node index.js -p imdb -u http://www.imdb.com/list/ls066692796
node index.js -p imdb -l ls066692796
node index.js -p imdb -l ls066692796 -o result
node index.js -p imdb -i tt0100234

imdb-test-cli

Crawl by id or list of id

To run this project, following the installation and run command npm run start or node index.js.
Go to http://localhost:8000/imdb/:ids with :ids is a list of id (delimiter by -) of the movies that you want to crawl Append the querry ?out=true at the end of URL to get the file output name output.json in directory.

For example with movie Avengers: End Game, its id is tt4154796. Thus, go to http://localhost:8000/imdb/tt4154796 and view the result
imdb-test-img

For example with list of movies such as http://localhost:8000/imdb/tt6723592-tt9686708-tt8579674
imdb-test-list-img

Crawl by list of movies (created by user)

To run this project, following the installation and run command npm run start or node index.js.
Go to http://localhost:8000/imdb/l/:ids with :ids is a list of list (delimiter by -) of the movies that you want to crawl Append the querry ?out=true at the end of URL to get the file output name output.json in directory.

For example with the list Web series (https://www.imdb.com/list/ls095501479), its id is ls095501479. Thus, go to http://localhost:8000/imdb/l/ls095501479 and view the result
imdb-test-img

FAQs

What is web-crawler-nodejs?

Is web-crawler-nodejs well maintained?

Package last updated on 22 Mar 2024

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

web-crawler-nodejs

Crawling data from website using Node.js

Installation

Imdb Crawling

Crawl by using CLI

Crawl by id or list of id

Crawl by list of movies (created by user)

Related posts

npm Revokes Classic Tokens, as OpenJS Warns Maintainers About OIDC Gaps

Rust RFC Proposes a Security Tab on crates.io for RustSec Advisories