Socket
Socket
Sign inDemoInstall

github.com/henrylee2cn/scrape

Package Overview
Dependencies
0
Maintainers
0
Alerts
File Explorer

Install Socket

Protect your apps from supply chain attacks

Install

github.com/henrylee2cn/scrape

Package scrape provides a searching api on top of golang.org/x/net/html

    v0.0.0-20150524150635-47d070048ae5

Version published
Maintainers
0

Readme

# scrape

A simple, higher level interface for Go web scraping.

When scraping with Go, I find myself redefining tree traversal and other
utility functions.

This package is a place to put some simple tools which build on top of the
[Go HTML parsing library](https://godoc.org/golang.org/x/net/html).

For the full interface check out the godoc
[![GoDoc](https://godoc.org/github.com/yhat/scrape?status.svg)](https://godoc.org/github.com/yhat/scrape)

## Sample

Scrape defines traversal functions like `Find` and `FindAll` while attempting
to be generic. It also defines convenience functions such as `Attr` and `Text`.

```go
// Parse the page
root, err := html.Parse(resp.Body)
if err != nil {
    // handle error
}
// Search for the title
title, ok := scrape.Find(root, scrape.ByTag(atom.Title))
if ok {
    // Print the title
    fmt.Println(scrape.Text(title))
}
```

## A full example: Scraping Hacker News

```go
package main

import (
	"fmt"
	"net/http"

	"github.com/yhat/scrape"
	"golang.org/x/net/html"
	"golang.org/x/net/html/atom"
)

func main() {
	// request and parse the front page
	resp, err := http.Get("https://news.ycombinator.com/")
	if err != nil {
		panic(err)
	}
	root, err := html.Parse(resp.Body)
	if err != nil {
		panic(err)
	}

	// define a matcher
	matcher := func(n *html.Node) bool {
		// must check for nil values
		if n.DataAtom == atom.A && n.Parent != nil && n.Parent.Parent != nil {
			return scrape.Attr(n.Parent.Parent, "class") == "athing"
		}
		return false
	}
	// grab all articles and print them
	articles := scrape.FindAll(root, matcher)
	for i, article := range articles {
		fmt.Printf("%2d %s (%s)\n", i, scrape.Text(article), scrape.Attr(article, "href"))
	}
}
```

FAQs

Last updated on 24 May 2015

Did you know?

Socket installs a GitHub app to automatically flag issues on every pull request and report the health of your dependencies. Find out what is inside your node modules and prevent malicious activity before you update the dependencies.

Install

Related posts

SocketSocket SOC 2 Logo

Product

  • Package Alerts
  • Integrations
  • Docs
  • Pricing
  • FAQ
  • Roadmap

Stay in touch

Get open source security insights delivered straight into your inbox.


  • Terms
  • Privacy
  • Security

Made with ⚡️ by Socket Inc