Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

table-scraper

Package Overview
Dependencies
Maintainers
1
Versions
4
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

table-scraper

Easily scrape any website's html table data into an array of JavaScript objects.

  • 1.0.3
  • latest
  • Source
  • npm
  • Socket score

Version published
Maintainers
1
Created
Source

build status

table-scraper

Simple utility for scraping data from html tables on a given website into a list of javascript objects.

installation

npm install --save table-scraper

methods

get(url)

Returns a promise that resolves to a list of tables found on the input website. HTML table rows are converted to javascript objects

For example: suppose the website at http://www.some-fake-url.com consisted of the following:

<html>
<head>
</head>
<body>
  <table>
    <thead>
    <tr><th>State</th><th>Capital City</th><th>Pop.<th></tr>
    </thead>
    <tbody>
    <tr><td>Minnesota</td><td>Saint Paul</td><td>3</td></tr>
    <tr><td>New York</td><td>Albany</td><td>Eight Million</td></tr>
    </tbody>
  </table>
</body>
</html>

The following code would result in the array displayed below:

var scraper = require('table-scraper');
scraper
  .get('http://www.some-fake-url.com')
  .then(function(tableData) {
    /*
       tableData === 
        [ 
          [ 
            { State: 'Minnesota', 'Capital City': 'Saint Paul', 'Pop.': '3' },
            { State: 'New York', 'Capital City': 'Albany', 'Pop.': 'Eight Million' } 
          ] 
        ]
    */
  });

Important to note: the tableData returned is a list of lists. So, if some-fake-url.com contained three tables, the structure of the response would look like

[
  [ /* list of data from the first table */ ],
  [ /* list of data from the second table */ ],
  [ /* list of data from the third table */ ]
]

If a table has NO headings (no <th> elements), the object keys are simply the column index:

[
  {'0': <first column data of first row>, '1': <second column data of first row>, .... }
]
Contributing

Feedback/PRs welcome! Please include tests around any new functionality, and make sure existing tests pass:

npm test
Credits

The following node libraries make this utility super easy:

  • tabletojson
  • x-ray
  • request

Keywords

FAQs

Package last updated on 20 Jul 2020

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc