New Case Study:See how Anthropic automated 95% of dependency reviews with Socket.Learn More →

@pct-org/pop-api-scraper

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

@pct-org/pop-api-scraper

The base modules for the popcorn-api scraper

0.8.2
latest
Source
npm

Version published: 5 years ago

Maintainers: 1

Created: 5 years ago

Source

pop-api-scraper

Features

The pop-api-scraper project aims to provide the core modules for the popcorn-api scraper, but can also be used for other purposes by using middleware.

Strategy pattern with providers
Cronjobs
Scraper wrapper class
HttpService with got

Installation

 $ npm install --save pop-api-scraper pop-api

Documentation

Usage

For the basic setup you need to create a Provider (strategy) the PopApiScraper instance can use. The PopApiScraper implements the strategy pattern, where the providers are the strategies.

The example below makes a HTTP GET request to a web service or website. from there on you are free to implement how and what data you want to get from it.

// ./ExampleProvider.js
import { AbstractProvider, HttpService } from 'pop-api-scraper'

// Extend from the internal AbstractProvider.
export default class ExampleProvider extends AbstractProvider {

  constructor(PopApiScraper, {name, configs, maxWebRequests = 2}) {
    super(PopApiScraper, {name, configs, maxWebRequests})
  }

  // Override the `scrapeConfig` method to get the content from one
  // configuration.
  scrapeConfig(config) {
    // A HTTP service to send HTTP requests.
    this.httpService = new HttpService({
      baseUrl: config.baseUrl
    })

    // HTTP  GET request to: https://jsonplaceholder.typicode.com/posts?foo=bar
    return this.httpService.get('/posts', config.httpOptions)
      .then(res => res.data)
  }

}

Bundle it all up together with pop-api:

// ./index.js
import os from 'os'
import { PopApi } from 'pop-api'
import { join } from 'path'
import { Cron, PopApiScraper } from 'pop-api-scraper'

import ExampleProvider from './ExampleProvider'

(async () => {
  try {
    // Let the PopApiScraper use the ExampleProvider o scrape data.
    PopApiScraper.use(ExampleProvider, {
      name: 'example-provider',
      configs: [{
        baseUrl: 'https://jsonplaceholder.typicode.com',
        httpOptions: {
          query: {
            foo: 'bar'
          }
        }
      }],
      maxWebRequests: 2
    })

    // Register the PopApiScraper middleware to the pop-api instance.
    PopApi.use(PopApiScraper, {
      statusPath: join(...[os.tmpdir(), 'status.json']),
      updatedPath: join(...[os.tmpdir(), 'updated.json'])
    })
    // Optionally you can use the Cron middleware to scrape for content on a
    // regulat basis.
    PopApi.use(Cron, {
      cronTime: '0 0 */6 * * *',
      start: false
    })

    // PopApi now has a `scraper` instance.
    const res = await PopApi.scraper.scrape()
    console.info(res[0])
  } catch (err) {
    console.error(err)
  }
})()

License

MIT License

Keywords

FAQs

What is @pct-org/pop-api-scraper?

Is @pct-org/pop-api-scraper well maintained?

Package last updated on 22 Apr 2020

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@pct-org/pop-api-scraper

pop-api-scraper

Features

Installation

Documentation

Usage

License

Keywords

Related posts

require(esm) Backported to Node.js 20, Paving the Way for ESM-Only Packages

PyPI Now Supports iOS and Android Wheels for Mobile Python Development