Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

ts-scraper

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

ts-scraper

Scarp links present in the website recursively

  • 1.0.3
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
3
Maintainers
1
Weekly downloads
 
Created
Source

ts-scraper

Web scraper written on TypeScript alt tag

It is a webscraper which can be extendable to do multiple tasks on scraped content. It propogates through the links it finds in the page. It makes use of ts-jobrunner library to run everything in terms of jobs.

Installation

npm install --save ts-scraper

Core API

CoreScraper(abstract)
    - protected init(): void
    - protected canFetchUrl(url): boolean
    - protected createJob(link): CoreJob
    - protected onFetchComplete(link, response): void
    - public start(): void
PageScraper(abstract)
    - public async start()
    - public abstract parse(jquery: JQuery): any;
ScrapeJob(abstract)
    - public run()
    - abstract createPageScraper(url: string): PageScraper
  • There are three components in this library CoreScraper, PageScraper and ScrapeJob.
  • ScrapeJob extends CoreJob from ts-jobrunner library. Its object exposes function createPageScraper(url) which creates PageScraper which actally mines/scrapes the page.
  • PageScraper exposes a function parse($) which takes jQuery object. You can mine the page as your wish and return the parsed response
  • CoreScraper is the main object which runs the scraping process. Its object has to have above mentioned functions.
    • init() all initiations can be put here
    • canFetchUrl(url) should tell whether to fetch the found link url
    • createJob(link) should return a CoreJob type job, which then be queued
    • onFetchComplete(link, response) will get triggered when a ScrapeJob job is completed ie., when a PageScraper is done. You can have code which handles the response returned by PageScraper here
    • start() will actually the scraping process (start() on JobRunner)

Example

Please find example usage in src/test/test-scraper folder

Suggestions and contributions are open. Happy coding :)

FAQs

Package last updated on 24 Apr 2017

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc