![npm](https://img.shields.io/npm/dm/html-extract-data.svg?maxAge=2592000)
Extract data from the DOM using a JSON config
Installation
yarn add html-extract-data
npm i -S html-extract-data
Usage
Basic
import extractFromHTML from 'html-extract-data';
extractFromHTML(
html,
{
query: '.grid-item',
data: {
title: 'h2',
description: { query: 'p', html: true },
}
},
);
{
title: 'title',
description: 'description <b>bold</b>'
}
Advanced
import extractFromHTML from 'html-extract-data';
const data = extractFromHTML(
html,
{
query: '.grid-item',
list: true,
self: {
'category': 'data-category',
'id': { attr: 'data-id', convert: 'number' },
}
data: {
title: 'h2',
description: { query: 'p', html: true },
tags: { query: '.tags > .tag', list: true },
price: { query: '.price', convert: parseFloat }
date: { query: '.date', convert: 'date' }
image: (extract, element) => ({
alt: extract({ query: '.js-image', attr: 'alt' }),
src: extract('.js-image', { attr: 'src' }),
}),
image2: (extract) =>
extract('.js-image', {
data: { src: 'src', alt: 'alt' }
}),
link: {
query: 'a',
data: {
href: 'href',
target: { attr: 'target', convert: 'number' },
text: true,
value: { html: true },
},
},
},
},
{
visible: false,
tags: ['select a value']
}
);
Will output:
[{
category: 'js',
id: 1,
title: 'title',
description: 'description <b>bold</b>',
tags: ['select a value', 'a', 'b', 'c'],
price: 123.45,
date: Date(2018-20-08 ... )
image: {
src: 'foo.jpg',
alt: 'foobar',
},
image2: {
src: "foo.jpg",
alt: "foobar",
},
link: {
href: 'http://www.google.com',
target: '_blank',
text: 'google',
value: '<b>google</b>'
},
visible: false
}]
Production
This library uses Joi to validate the input config structure, but it's quite large.
That's why they are added within process.env.NODE_ENV !== 'production'
checks, which means
that your build process can strip it out.
Documentation
View the unit tests to see all the possible ways this module can be used.
Building
In order to build html-extract-data, ensure that you have Git
and Node.js installed.
Clone a copy of the repo:
git clone https://github.com/ThaNarie/html-extract-data.git
Change to the html-extract-data directory:
cd html-extract-data
Install dev dependencies:
yarn
Use one of the following main scripts:
yarn build
yarn test
yarn test:dev
yarn lint
Contribute
View CONTRIBUTING.md
Changelog
View CHANGELOG.md
Authors
View AUTHORS.md
LICENSE
MIT © Tha Narie