
Company News
Socket Named Top Sales Organization by RepVue
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

var Xray = require('x-ray')
var x = Xray()
x('https://blog.ycombinator.com/', '.post', [
{
title: 'h1 a',
link: '.article-title@href'
}
])
.paginate('.nav-previous a@href')
.limit(3)
.write('results.json')
npm install x-ray
Flexible schema: Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing.
Composable: The API is entirely composable, giving you great flexibility in how you scrape each page.
Pagination support: Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose what you've already scraped.
Crawler support: Start on one page and move to the next easily. The flow is predictable, following a breadth-first crawl through each of the pages.
Responsible: X-ray has support for concurrency, throttles, delays, timeouts and limits to help you scrape any page responsibly.
Pluggable drivers: Swap in different scrapers depending on your needs. Currently supports HTTP and PhantomJS driver drivers. In the future, I'd like to see a Tor driver for requesting pages through the Tor network.
Scrape the url for the following selector, returning an object in the callback fn.
The selector takes an enhanced jQuery-like string that is also able to select on attributes. The syntax for selecting on attributes is selector@attribute. If you do not supply an attribute, the default is selecting the innerText.
Here are a few examples:
xray('http://google.com', 'title')(function(err, title) {
console.log(title) // Google
})
xray('http://reddit.com', '.content')(fn)
xray('http://techcrunch.com', 'img.logo@src')(fn)
innerHTMLxray('http://news.ycombinator.com', 'body@html')(fn)
You can also supply a scope to each selector. In jQuery, this would look something like this: $(scope).find(selector).
Instead of a url, you can also supply raw HTML and all the same semantics apply.
var html = '<body><h2>Pear</h2></body>'
x(html, 'body', 'h2')(function(err, header) {
header // => Pear
})
Specify a driver to make requests through. Available drivers include:
Returns Readable Stream of the data. This makes it easy to build APIs around x-ray. Here's an example with Express:
var app = require('express')()
var x = require('x-ray')()
app.get('/', function(req, res) {
var stream = x('http://google.com', 'title').stream()
stream.pipe(res)
})
Stream the results to a path.
If no path is provided, then the behavior is the same as .stream().
Constructs a Promise object and invoke its then function with a callback cb. Be sure to invoke then() at the last step of xray method chaining, since the other methods are not promisified.
x('https://dribbble.com', 'li.group', [
{
title: '.dribbble-img strong',
image: '.dribbble-img [data-src]@data-src'
}
])
.paginate('.next_page@href')
.limit(3)
.then(function(res) {
console.log(res[0]) // prints first result
})
.catch(function(err) {
console.log(err) // handle error in promise
})
Select a url from a selector and visit that page.
Limit the amount of pagination to n requests.
Abort pagination if validator function returns true.
The validator function receives two arguments:
result: The scrape result object for the current page.nextUrl: The URL of the next page to scrape.Delay the next request between from and to milliseconds.
If only from is specified, delay exactly from milliseconds.
Set the request concurrency to n. Defaults to Infinity.
Throttle the requests to n requests per ms milliseconds.
Specify a timeout of ms milliseconds for each request.
X-ray also has support for selecting collections of tags. While x('ul', 'li') will only select the first list item in an unordered list, x('ul', ['li']) will select all of them.
Additionally, X-ray supports "collections of collections" allowing you to smartly select all list items in all lists with a command like this: x(['ul'], ['li']).
X-ray becomes more powerful when you start composing instances together. Here are a few possibilities:
var Xray = require('x-ray')
var x = Xray()
x('http://google.com', {
main: 'title',
image: x('#gbar a@href', 'title') // follow link to google images
})(function(err, obj) {
/*
{
main: 'Google',
image: 'Google Images'
}
*/
})
var Xray = require('x-ray')
var x = Xray()
x('http://mat.io', {
title: 'title',
items: x('.item', [
{
title: '.item-content h2',
description: '.item-content section'
}
])
})(function(err, obj) {
/*
{
title: 'mat.io',
items: [
{
title: 'The 100 Best Children\'s Books of All Time',
description: 'Relive your childhood with TIME\'s list...'
}
]
}
*/
})
Filters can specified when creating a new Xray instance. To apply filters to a value, append them to the selector using |.
var Xray = require('x-ray')
var x = Xray({
filters: {
trim: function(value) {
return typeof value === 'string' ? value.trim() : value
},
reverse: function(value) {
return typeof value === 'string'
? value
.split('')
.reverse()
.join('')
: value
},
slice: function(value, start, end) {
return typeof value === 'string' ? value.slice(start, end) : value
}
}
})
x('http://mat.io', {
title: 'title | trim | reverse | slice:2,3'
})(function(err, obj) {
/*
{
title: 'oi'
}
*/
})
Support us with a monthly donation and help us continue our activities. [Become a backer]
Become a sponsor and get your logo on our website and on our README on Github with a link to your site. [Become a sponsor]
MIT
FAQs
structure any website
The npm package x-ray receives a total of 1,779 weekly downloads. As such, x-ray popularity was classified as popular.
We found that x-ray demonstrated a not healthy version release cadence and project activity because the last version was released a year ago. It has 4 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Company News
Socket won two 2026 Reppy Awards from RepVue, ranking in the top 5% of all sales orgs. AE Alexandra Lister shares what it's like to grow a sales career here.

Security News
NIST will stop enriching most CVEs under a new risk-based model, narrowing the NVD's scope as vulnerability submissions continue to surge.

Company News
/Security News
Socket is an initial recipient of OpenAI's Cybersecurity Grant Program, which commits $10M in API credits to defenders securing open source software.