Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More
Socket
Sign inDemoInstall
Socket

wget-parser

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

wget-parser

Parses the wget spider output into an object

  • 2.0.0
  • latest
  • Source
  • npm
  • Socket score

Version published
Weekly downloads
436
increased by21.45%
Maintainers
1
Weekly downloads
 
Created
Source

Table of Contents

  • Spider parser

Spider parser

Build Status npm version Coverage Status.

Parses the spider output from wget into an object structure of links.

This object could then be processed further to create a tree structure of the hierarchy of a website such that sitemap generation could be implemented.

Tested using wget v1.15 on linux.

Usage

var parser = require('wget-parser')
  , buf = new Buffer(0);      // buffer should contain the spider output
console.dir(parser(buf));
  • parser.Parser: The parser class.
  • parser.Link: The class that represents a link.
  • parser.ParseStream: Parse stream class.

Streams support is available, see the test spec for example usage.

wget-parser

A program that reads from stdin and prints the result of the parse as JSON, exits with error code 1 if any broken links are found.

cat test/fixtures/mock.txt | wget-parser
cat test/fixtures/broken.txt | wget-parser; echo $?;

wget-spider

A program that performs a spider with wget and pipes the output to wget-parser:

wget-spider http://google.com

Output

Example output from the parser:

{
  "links": [
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "google.com",
        "port": null,
        "hostname": "google.com",
        "hash": null,
        "search": null,
        "query": null,
        "pathname": "/",
        "path": "/",
        "href": "http://google.com/"
      },
      "link": "http://google.com/",
      "line": "--2016-02-10 16:11:57--  http://google.com/"
    },
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "www.google.co.id",
        "port": null,
        "hostname": "www.google.co.id",
        "hash": null,
        "search": "?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "query": "gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "pathname": "/",
        "path": "/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "href": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
      },
      "link": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
      "line": "--2016-02-10 16:11:57--  http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
    }
  ],
  "broken": []
}

Developer

Test

To run the test suite:

npm test

Cover

To generate code coverage run:

npm run cover

Lint

Run the source tree through jshint and jscs:

npm run lint

Clean

Remove generated files:

npm run clean

Readme

To build the readme file from the partial definitions:

npm run readme

Generated by mdp(1).

Keywords

FAQs

Package last updated on 10 Feb 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap
  • Changelog

Packages

npm

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc