🚀 Big News: Socket Acquires Coana to Bring Reachability Analysis to Every Appsec Team.Learn more
Socket
DemoInstallSign in
Socket

wget-parser

Package Overview
Dependencies
Maintainers
1
Versions
5
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

wget-parser

Parses the wget spider output into an object

2.0.0
latest
Source
npm
Version published
Weekly downloads
956
1.49%
Maintainers
1
Weekly downloads
 
Created
Source

Table of Contents

  • Spider parser

Spider parser

Build Status npm version Coverage Status.

Parses the spider output from wget into an object structure of links.

This object could then be processed further to create a tree structure of the hierarchy of a website such that sitemap generation could be implemented.

Tested using wget v1.15 on linux.

Usage

var parser = require('wget-parser')
  , buf = new Buffer(0);      // buffer should contain the spider output
console.dir(parser(buf));
  • parser.Parser: The parser class.
  • parser.Link: The class that represents a link.
  • parser.ParseStream: Parse stream class.

Streams support is available, see the test spec for example usage.

wget-parser

A program that reads from stdin and prints the result of the parse as JSON, exits with error code 1 if any broken links are found.

cat test/fixtures/mock.txt | wget-parser
cat test/fixtures/broken.txt | wget-parser; echo $?;

wget-spider

A program that performs a spider with wget and pipes the output to wget-parser:

wget-spider http://google.com

Output

Example output from the parser:

{
  "links": [
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "google.com",
        "port": null,
        "hostname": "google.com",
        "hash": null,
        "search": null,
        "query": null,
        "pathname": "/",
        "path": "/",
        "href": "http://google.com/"
      },
      "link": "http://google.com/",
      "line": "--2016-02-10 16:11:57--  http://google.com/"
    },
    {
      "url": {
        "protocol": "http:",
        "slashes": true,
        "auth": null,
        "host": "www.google.co.id",
        "port": null,
        "hostname": "www.google.co.id",
        "hash": null,
        "search": "?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "query": "gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "pathname": "/",
        "path": "/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
        "href": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
      },
      "link": "http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ",
      "line": "--2016-02-10 16:11:57--  http://www.google.co.id/?gws_rd=cr&ei=zfC6Vv6KKYexuATc3pu4DQ"
    }
  ],
  "broken": []
}

Developer

Test

To run the test suite:

npm test

Cover

To generate code coverage run:

npm run cover

Lint

Run the source tree through jshint and jscs:

npm run lint

Clean

Remove generated files:

npm run clean

Readme

To build the readme file from the partial definitions:

npm run readme

Generated by mdp(1).

Keywords

wget

FAQs

Package last updated on 10 Feb 2016

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts