Latest Threat Research:Malicious dYdX Packages Published to npm and PyPI After Maintainer Compromise.Details
Socket
Book a DemoInstallSign in
Socket

linkscrape

Package Overview
Dependencies
Maintainers
1
Versions
3
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

linkscrape

A Node.js module to scrape and normalize links from an HTML string.

Source
npmnpm
Version
0.0.1
Version published
Weekly downloads
1
-83.33%
Maintainers
1
Weekly downloads
 
Created
Source

Node.js - linkscrape

This module allows scrapes links from an HTML string and normalizes them. It does not actually perform the HTTP request. Use superagent or request for that.

Installation

npm install linkscrape

Example

HTML string:

<html>
  <head>
    <title>
      Test File
    </title>
  </head>
  <body>
    <p id="wat">
      <a href="http://google.com"><b>Google</b></a>
    </p>
    <p>
      <a href="#wat" class="pretty">Link in page</a>
      <a href="javascript:alert('hi');">hi</a>
      <a href="alert('hello')">hello</a>
      <a href="/faq/questions">Faq</a>
      <a href="aboutus">About Us</a>
    </p>
  </body>
</html>

Extract links with linkscrape:

scrape('http://someserver.com/mypage', htmlString, function($, links){
    console.log(links.length);// is 6

    console.log(links[0].href); //is 'http://google.com'
    console.log(links[0].text); //is 'Google'
    console.log(links[0].html); //is '<b>Google</b>'
    console.log(links[0].element); //object
    console.log(links[0].link); //is 'http://google.com'

    console.log(links[1].href); //is '#wat'
    console.log(links[1].text); //is 'Link in page'
    console.log(links[1].html); //is 'Link in page'
    console.log(links[1].element); //object
    console.log(links[1].link); //is null
    console.log($(links[1].element).attr('class')); //is 'pretty'

    console.log(links[2].href); //is "javascript:alert('hi');"
    console.log(links[2].text); //is 'hi'
    console.log(links[2].html); //is 'hi'
    console.log(links[2].element); //object
    console.log(links[2].link); //is null

    console.log(links[3].href); //is "alert('hello')"
    console.log(links[3].text); //is 'hello'
    console.log(links[3].html); //is 'hello'
    console.log(links[3].element); //object
    console.log(links[3].link); //is null

    console.log(links[4].href); //is "/faq/questions"
    console.log(links[4].text); //is 'Faq'
    console.log(links[4].html); //is 'Faq'
    console.log(links[4].element); //object
    console.log(links[4].link); //is 'http://someserver.com/faq/questions'

    console.log(links[5].href); //is "aboutus"
    console.log(links[5].text); //is 'About Us'
    console.log(links[5].html); //is 'About Us'
    console.log(links[5].element); //object
    console.log(links[5].link); //is 'http://someserver.com/aboutus'

It's currently backed by cheerio. So you can use the $ with the jQuery selectors. See cheerio docs for more details.

Test

npm test

or...

mocha test

License

Licensed under MIT. See LICENSE for more details.

Copyright (c) 2012 JP Richardson

Keywords

extract

FAQs

Package last updated on 22 Jul 2012

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts