Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

scrapq

Package Overview
Dependencies
Maintainers
1
Versions
17
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

scrapq

Lightweight Typescript library for scrapping html

  • 1.0.1
  • npm
  • Socket score

Version published
Weekly downloads
18
increased by20%
Maintainers
1
Weekly downloads
 
Created
Source

ScrapQ

Greenkeeper badge Build Status code style: prettier

Lightweight Typescript library for scrapping html with type inference.

About

There are plenty scrapping libs out there, but only few with full Typescript support - Typescript will infer type based on your query. This is small library with only one purpose to provide scrapping in human readable format with full Typescript support like intellisense and type inference.

Examples

To see all examples, please visit ./test/basic.test.ts

Hacker news

import { scrap, Q } from 'scrapq';

// `fetch` is not included in library, use your own implementation
const html = fetch('https://news.ycombinator.com/').toString();

const data = scrap(html, {
    articles: Q.list('.athing', {
        title: Q.text('.title > a'),
        website: Q.text('.title > span.sitebit'),
        link: Q.attr('.title > a', 'href')
    });
});
console.log(data);
// {
//   articles: [
//       ...,
//       {
//          title: 'The tools humanity will need for living in the year 1 trillion',
//          website: 'phys.org',
//          link: 'https://phys.org/news/2018-06-tools-humanity-year-trillion.html'
//       },
//       ...
//     ]
// }

Custom

import { scrap, Q } from 'scrapq';

const STR_TO_SCRAP = `
    <h1 class="title">Hello</h1>
    <ul>
        <li><span>Guten Tag</span></li>
        <li><span>Ciao</span></li>
        <li><span>Bonjour</span></li>
    </ul>
`;

const result = scrap(STR_TO_SCRAP, {
    title: Q.text('h1.title'),
    items: Q.list('li', {
        text: Q.text('span')
    })
});

console.log(result);
// {
//   title: 'Hello',
//   items: [
//      { text: 'Guten Tag' },
//      { text: 'Ciao' },
//      { text: 'Bonjour' }
//   ]
// }

API

scrap(html: string, query: Query)

Query

Q.text(selector: string)

get text from an element

Q.attr(selector: string, htmlAttribute: string)

get attribute from an element

Q.html(selector: string)

get html

Q.exists(selector: string)

get true/false if element exists

Q.list(selector: string, query: Query)

get list of items

Keywords

FAQs

Package last updated on 21 Jul 2018

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc