Security News
Opengrep Emerges as Open Source Alternative Amid Semgrep Licensing Controversy
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
github.com/Puerkitobio/goquery
goquery brings a syntax and a set of features similar to jQuery to the Go language. It is based on Go's net/html package and the CSS Selector library cascadia. Since the net/html parser returns nodes, and not a full-featured DOM tree, jQuery's stateful manipulation functions (like height(), css(), detach()) have been left off.
Also, because the net/html parser requires UTF-8 encoding, so does goquery: it is the caller's responsibility to ensure that the source document provides UTF-8 encoded HTML. See the wiki for various options to do this.
Syntax-wise, it is as close as possible to jQuery, with the same function names when possible, and that warm and fuzzy chainable interface. jQuery being the ultra-popular library that it is, I felt that writing a similar HTML-manipulating library was better to follow its API than to start anew (in the same spirit as Go's fmt
package), even though some of its methods are less than intuitive (looking at you, index()...).
Required Go version:
v1.10.0
of goquery, Go 1.23+ is required due to the use of function-based iterators.v1.9.0
of goquery, Go 1.18+ is required due to the use of generics.net/html
dependency.Ongoing goquery development is tested on the latest 2 versions of Go.
$ go get github.com/PuerkitoBio/goquery
(optional) To run unit tests:
$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test
(optional) To run benchmarks (warning: it runs for a few minutes):
$ cd $GOPATH/src/github.com/PuerkitoBio/goquery
$ go test -bench=".*"
Note that goquery's API is now stable, and will not break.
go.mod
dependencies.EachIter
which provides an iterator that can be used in for..range
loops on the *Selection
object. goquery now requires Go version 1.23+ (thanks @amikai).go.mod
dependencies.go.mod
dependencies.Map
function and Selection.Map
method, better document the cascadia differences (thanks @jwilsson).Map
function, goquery now requires Go version 1.18+ (thanks @Fesaa).go.mod
dependencies, update CI workflow.Render
function to render a Selection
to an io.Writer
(thanks @anthonygedeon).Single
and SingleMatcher
functions to optimize first-match selection (thanks @gdollardollar).{Prepend,Append,Set}Html
on a Selection
that contains non-Element nodes.AfterHtml
, AppendHtml
, etc.). Thanks to @thiemok and @davidjwilkins for their work on this.NewDocumentFromReader
examples.NewDocument(url)
and NewDocumentFromResponse(response)
.ToEnd
constant to Slice
until the end of the selection (thanks to @davidjwilkins for raising the issue).AddBack*
and deprecate AndSelf
(thanks to @davidjwilkins).SetHtml
and SetText
(thanks to @glebtv).Selection.Text
(thanks to @radovskyb).Matcher
implementation that never matches any node (instead of a panic). So for example, doc.Find("~")
returns an empty *Selection
object.NodeName
utility function similar to the DOM's nodeName
property. It returns the tag name of the first element in a selection, and other relevant values of non-element nodes (see doc for details). Add OuterHtml
utility function similar to the DOM's outerHTML
property (named OuterHtml
in small caps for consistency with the existing Html
method on the Selection
).AttrOr
helper method to return the attribute's value or a default value if absent. Thanks to piotrkowalczuk.*Matcher
functions, that receive compiled cascadia selectors instead of selector strings, thus avoiding potential panics thrown by goquery via cascadia.MustCompile
calls. This results in better performance (selectors can be compiled once and reused) and more idiomatic error handling (you can handle cascadia's compilation errors, instead of recovering from panics, which had been bugging me for a long time). Note that the actual type expected is a Matcher
interface, that cascadia.Selector
implements. Other matcher implementations could be used.html.Node
s.NewDocumentFromReader()
(thanks jweir) which allows creating a goquery document from an io.Reader.NewDocumentFromResponse()
(thanks assassingj) which allows creating a goquery document from an http response.EachWithBreak()
which allows to break out of an Each()
loop by returning false. This function was added instead of changing the existing Each()
to avoid breaking compatibility.Document.Root
is removed, Document
is now a Selection
itself (a selection of one, the root element, just like Document.Root
was before). Add jQuery's Closest() method.goquery exposes two structs, Document
and Selection
, and the Matcher
interface. Unlike jQuery, which is loaded as part of a DOM document, and thus acts on its containing document, goquery doesn't know which HTML document to act upon. So it needs to be told, and that's what the Document
type is for. It holds the root document node as the initial Selection value to manipulate.
jQuery often has many variants for the same function (no argument, a selector string argument, a jQuery object argument, a DOM element argument, ...). Instead of exposing the same features in goquery as a single method with variadic empty interface arguments, statically-typed signatures are used following this naming convention:
Prev()
), and the version with a selector string argument is called XxxFiltered()
(e.g.: PrevFiltered()
)Is()
)XxxSelection()
and take a *Selection
object as argument (e.g.: FilterSelection()
)XxxNodes()
and take a variadic argument of type *html.Node
(e.g.: FilterNodes()
)XxxFunction()
and take a function as argument (e.g.: FilterFunction()
)Matcher
interface and are defined as XxxMatcher()
(e.g.: IsMatcher()
)Utility functions that are not in jQuery but are useful in Go are implemented as functions (that take a *Selection
as parameter), to avoid a potential naming clash on the *Selection
's methods (reserved for jQuery-equivalent behaviour).
The complete package reference documentation can be found here.
Please note that Cascadia's selectors do not necessarily match all supported selectors of jQuery (Sizzle). See the cascadia project for details. Also, the selectors work more like the DOM's querySelectorAll
, than jQuery's matchers - they have no concept of contextual matching (for some concrete examples of what that means, see this ticket). In practice, it doesn't matter very often but it's something worth mentioning. Invalid selector strings compile to a Matcher
that fails to match any node. Behaviour of the various functions that take a selector string as argument follows from that fact, e.g. (where ~
is an invalid selector string):
Find("~")
returns an empty selection because the selector string doesn't match anything.Add("~")
returns a new selection that holds the same nodes as the original selection, because it didn't add any node (selector string didn't match anything).ParentsFiltered("~")
returns an empty selection because the selector string doesn't match anything.ParentsUntil("~")
returns all parents of the selection because the selector string didn't match any element to stop before the top element.See some tips and tricks in the wiki.
Adapted from example_test.go:
package main
import (
"fmt"
"log"
"net/http"
"github.com/PuerkitoBio/goquery"
)
func ExampleScrape() {
// Request the HTML page.
res, err := http.Get("http://metalsucks.net")
if err != nil {
log.Fatal(err)
}
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
// Load the HTML document
doc, err := goquery.NewDocumentFromReader(res.Body)
if err != nil {
log.Fatal(err)
}
// Find the review items
doc.Find(".left-content article .post-title").Each(func(i int, s *goquery.Selection) {
// For each item found, get the title
title := s.Find("a").Text()
fmt.Printf("Review %d: %s\n", i, title)
})
}
func main() {
ExampleScrape()
}
There are a number of ways you can support the project:
The BSD 3-Clause license, the same as the Go language. Cascadia's license is here.
FAQs
Unknown package
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
Opengrep forks Semgrep to preserve open source SAST in response to controversial licensing changes.
Security News
Critics call the Node.js EOL CVE a misuse of the system, sparking debate over CVE standards and the growing noise in vulnerability databases.
Security News
cURL and Go security teams are publicly rejecting CVSS as flawed for assessing vulnerabilities and are calling for more accurate, context-aware approaches.