Security News
38% of CISOs Fear They’re Not Moving Fast Enough on AI
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
github.com/foreversmart/goquery
GoQuery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns tokens (nodes), and not a full-featured DOM object, jQuery's manipulation and modification functions have been left off (no point in modifying data in the parsed tree of the HTML, it has no effect).
Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML.
Supported functions are query-oriented features (hasClass()
, attr()
and the likes), as well as traversing functions that make sense given what we have to work with. This makes GoQuery a great library for scraping web pages.
Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt
package), even though some of its methods are less than intuitive (looking at you, index()...).
Please note that because of the net/html dependency, goquery requires Go1.1+.
$ go get github.com/PuerkitoBio/goquery
(optional) To run unit tests:
$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test
(optional) To run benchmarks (warning: it runs for a few minutes):
$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test -bench=".*"
Note that goquery's API is now stable, and will not break.
NewDocumentFromReader()
(thanks jweir) which allows creating a goquery document from an io.Reader.NewDocumentFromResponse()
(thanks assassingj) which allows creating a goquery document from an http response.EachWithBreak()
which allows to break out of an Each()
loop by returning false. This function was added instead of changing the existing Each()
to avoid breaking compatibility.Document.Root
is removed, Document
is now a Selection
itself (a selection of one, the root element, just like Document.Root
was before). Add jQuery's Closest() method.GoQuery exposes two classes, Document
and Selection
. Unlike jQuery, which is loaded as part of a DOM document, and thus acts on its containing document, GoQuery doesn't know which HTML document to act upon. So it needs to be told, and that's what the Document
class is for. It holds the root document node as the initial Selection object to manipulate.
jQuery often has many variants for the same function (no argument, a selector string argument, a jQuery object argument, a DOM element argument, ...). Instead of exposing the same features in GoQuery as a single method with variadic empty interface arguments, I use statically-typed signatures following this naming convention:
Prev()
), and the version with a selector string argument is called XxxFiltered()
(e.g.: PrevFiltered()
)Is()
)XxxSelection()
and take a *Selection
object as argument (e.g.: FilterSelection()
)XxxNodes()
and take a variadic argument of type *html.Node
(e.g.: FilterNodes()
)XxxFunction()
and take a function as argument (e.g.: FilterFunction()
)GoQuery's complete godoc reference documentation can be found here.
Please note that Cascadia's selectors do NOT necessarily match all supported selectors of jQuery (Sizzle). See the cascadia project for details.
Taken (and adapted as if executed from outside the goquery package) from example_test.go:
package main
import (
"fmt"
"log"
"github.com/PuerkitoBio/goquery"
)
func ExampleScrape() {
doc, err := goquery.NewDocument("http://metalsucks.net")
if err != nil {
log.Fatal(err)
}
doc.Find(".reviews-wrap article .review-rhs").Each(func(i int, s *goquery.Selection) {
band := s.Find("h3").Text()
title := s.Find("i").Text()
fmt.Printf("Review %d: %s - %s\n", i, band, title)
})
}
func main() {
ExampleScrape()
}
The BSD 3-Clause license, the same as the Go language. Cascadia's license is here.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
CISOs are racing to adopt AI for cybersecurity, but hurdles in budgets and governance may leave some falling behind in the fight against cyber threats.
Research
Security News
Socket researchers uncovered a backdoored typosquat of BoltDB in the Go ecosystem, exploiting Go Module Proxy caching to persist undetected for years.
Security News
Company News
Socket is joining TC54 to help develop standards for software supply chain security, contributing to the evolution of SBOMs, CycloneDX, and Package URL specifications.