
Research
Malicious npm Packages Impersonate Flashbots SDKs, Targeting Ethereum Wallet Credentials
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Nokogiri (鋸) makes it easy and painless to work with XML and HTML from Ruby. It provides a sensible, easy-to-understand API for reading, writing, modifying, and querying documents. It is fast and standards-compliant by relying on native parsers like libxml2 (C) and xerces (Java).
Some guiding principles Nokogiri tries to follow:
All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).
Your first stops for API documentation should be:
If you'd like to talk to a human:
#nokogiri-💎
at https://discord.gg/UyQnKrT#nokogiri
on freenode.Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.
Please report vulnerabilities at https://hackerone.com/nokogiri
Full information and description of our security policy is in SECURITY.md
Nokogiri follows Semantic Versioning (since 2017 or so).
We bump Major.Minor.Patch
versions following this guidance:
Major
: (we've never done this)
ROADMAP.md
.Minor
:
Patch
:
Requirements:
"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries, or for system dependencies to exist. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.
As of v1.11.0, Nokogiri ships pre-compiled, "native" gems for the following platforms:
x86-linux
and x86_64-linux
(req: glibc >= 2.17
), including musl platforms like Alpinex86_64-darwin
and arm64-darwin
x86-mingw32
and x64-mingw32
To determine whether your system supports one of these gems, look at the output of bundle platform
or ruby -e 'puts Gem::Platform.local.to_s'
.
If you're on a supported platform, either gem install
or bundle install
should install a native gem without any additional action on your part. This installation should only take a few seconds, and your output should look something like:
$ gem install nokogiri
Fetching nokogiri-1.11.0-x86_64-linux.gem
Successfully installed nokogiri-1.11.0-x86_64-linux
1 gem installed
Because Nokogiri is a C extension, it requires that you have a C compiler toolchain, Ruby development header files, and some system dependencies installed.
The following may work for you if you have an appropriately-configured system:
gem install nokogiri
If you have any issues, please visit Installing Nokogiri for more complete instructions and troubleshooting.
Nokogiri is a large library, and so it's challenging to briefly summarize it. We've tried to provide long, real-world examples at Tutorials.
Here is example usage for parsing and querying a document:
#! /usr/bin/env ruby
require 'nokogiri'
require 'open-uri'
# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))
# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
puts link.content
end
# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
puts link.content
end
# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
puts link.content
end
Strings are always stored as UTF-8 internally. Methods that return
text values will always return UTF-8 encoded strings. Methods that
return a string containing markup (like to_xml
, to_html
and
inner_html
) will return a string encoded like the source document.
WARNING
Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?
Data is just a stream of bytes. Humans add meaning to that stream. Any
particular set of bytes could be valid characters in multiple
encodings, so detecting encoding with 100% accuracy is not
possible. libxml2
does its best, but it can't be right all the time.
If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:
doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')
As noted above, two guiding principles of the software are:
Notably, despite all parsers being standards-compliant, there are behavioral inconsistencies between the parsers used in the CRuby and JRuby implementations, and Nokogiri does not and should not attempt to remove these inconsistencies. Instead, we surface these differences in the test suite when they are important/semantic; or we intentionally write tests to depend only on the important/semantic bits (omitting whitespace from regex matchers on results, for example).
The Ruby (a.k.a., CRuby, MRI, YARV) implementation is a C extension that depends on libxml2 and libxslt (which in turn depend on zlib and possibly libiconv).
These dependencies are met by default by Nokogiri's packaged versions of the libxml2 and libxslt source code, but a configuration option --use-system-libraries
is provided to allow specification of alternative library locations. See Installing Nokogiri for full documentation.
We provide native gems by pre-compiling libxml2 and libxslt (and potentially zlib and libiconv) and packaging them into the gem file. In this case, no compilation is necessary at installation time, which leads to faster and more reliable installation.
See LICENSE-DEPENDENCIES.md
for more information on which dependencies are provided in which native and source gems.
The Java (a.k.a. JRuby) implementation is a Java extension that depends primarily on Xerces and NekoHTML for parsing, though additional dependencies are on isorelax
, nekodtd
, jing
, serializer
, xalan-j
, and xml-apis
.
These dependencies are provided by pre-compiled jar files packaged in the java
platform gem.
See LICENSE-DEPENDENCIES.md
for more information on which dependencies are provided in which native and source gems.
bundle install
bundle exec rake compile test
We've adopted the Contributor Covenant code of conduct, which you can read in full in CODE_OF_CONDUCT.md
.
This project is licensed under the terms of the MIT license.
See this license at LICENSE.md
.
Some additional libraries may be distributed with your version of Nokogiri. Please see LICENSE-DEPENDENCIES.md
for a discussion of the variations as well as the licenses thereof.
FAQs
Unknown package
We found that nokogiri-backport demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 10 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Four npm packages disguised as cryptographic tools steal developer credentials and send them to attacker-controlled Telegram infrastructure.
Security News
Ruby maintainers from Bundler and rbenv teams are building rv to bring Python uv's speed and unified tooling approach to Ruby development.
Security News
Following last week’s supply chain attack, Nx published findings on the GitHub Actions exploit and moved npm publishing to Trusted Publishers.