This package is meant to be used in conjunction with @pdftron/pdfnet-node to support IDP data extraction from Apryse. Follow this guide for more info on usage.
https://docs.apryse.com/documentation/core/guides/intelligent-data-extraction/
For further reading checkout our blog post on the project.
https://apryse.com/blog/introducing-automated-data-extraction-pdf-idp
Supported platform, Node.js, and Electron versions
This package depends on unmanaged add-on binaries, and the add-on binaries are not cross-platform. At the moment we have support for
- OS: Linux (excluding Alpine), Windows(x64)
- Node.js version: 8 - 22
- Electron version: 6 - 30
Installation will fail if your OS, Node.js or Electron version is not supported.
Usage
Add the @pdftron/data-extraction
package as a dependency in your package.json
Inside of your @pdftron/pdfnet-node code after initialization you should include the following line:
await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")
Here is an example of data extraction being used with this line.
const { PDFNet } = require('@pdftron/pdfnet-node');
const licenseKey = "Insert license key here"
const inputFile = "Insert input file location here"
async function main() {
await PDFNet.addResourceSearchPath("./node_modules/@pdftron/data-extraction/lib")
console.log('Extract document structure as a JSON file');
let outputFile = 'out/paragraphs_and_tables.json';
await PDFNet.DataExtractionModule.extractData(inputFile, outputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
console.log('Result saved in ' + outputFile);
console.log('Extract document structure as a JSON string');
outputFile = 'out/tagged.json';
const json = await PDFNet.DataExtractionModule.extractDataAsString(inputFile, PDFNet.DataExtractionModule.DataExtractionEngine.e_DocStructure);
fs.writeFileSync(outputFile, json);
}
PDFNet.runWithCleanup(main, licenseKey).catch(function (error) {
console.log('Error: ' + JSON.stringify(error));
}).then(function () { return PDFNet.shutdown(); });;
A larger code sample can be found here
To get started please see the documentation at https://www.pdftron.com/documentation/nodejs/get-started/integration.
Licensing
Please go to https://docs.apryse.com/documentation/core/info/license/ to obtain a demo or production license.