Security News
GitHub Removes Malicious Pull Requests Targeting Open Source Repositories
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
sensible-api
Advanced tools
Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products
Welcome! Sensible is a developer-first platform for extracting structured data from documents, for example, business forms in PDF format. Use Sensible to build document-automation features into your SaaS products. Sensible is highly configurable: you can get simple data in minutes by leveraging GPT-4 and other large-language models (LLMs), or you can tackle complex and idiosyncratic document formatting with Sensible's powerful layout-based document primitives.
This open-source Sensible SDK offers convenient access to the Sensible API. Use this SDK to:
In an environment with Node installed, open a command prompt and enter the following commands to create a test project:
mkdir sensible-test
cd sensible-test
touch index.mjs
Then install the SDK:
npm install sensible-api
Get an account at sensible.so if you don't have one already.
To initialize the SDK, paste the following code into your index.mjs
file and replace *YOUR_API_KEY*
with your API key:
// if you paste in your key, like `SensibleSDK("1ac34b14")` then secure it in production
const sensible = new SensibleSDK(YOUR_API_KEY);
Note: Secure your API key in production, for example as a GitHub secret.
To extract data from a sample document at a URL:
index.mjs
file:import { SensibleSDK } from "sensible-api";
// if you paste in your key, like `SensibleSDK("1ac34b14")` then secure it in production
const sensible = new SensibleSDK(YOUR_API_KEY);
const request = await sensible.extract({
url: "https://github.com/sensible-hq/sensible-docs/raw/main/readme-sync/assets/v0/pdfs/contract.pdf",
documentType: "sensible_instruct_basics",
environment: "development"
});
const results = await sensible.waitFor(request); // polls every 5 seconds. Optional if you configure a webhook
console.log(results);
*YOUR_API_KEY*
with your API key.index.mjs
file, run the code with the following command:node index.mjs
The code extracts data from an example document (contract.pdf
) using an example document type (sensible_instruct_basics
) and an example extraction configuration.
You should see the following extracted document text in the parsed_document
object in the logged response:
{
"purchase_price": {
"source": "$400,000",
"value": 400000,
"unit": "$",
"type": "currency"
},
"street_address": {
"value": "1234 ABC COURT City of SALT LAKE CITY County of Salt Lake -\nState of Utah, Zip 84108",
"type": "address"
}
}
Navigate to the example in the SenseML editor to see how the extraction you just ran works in the Sensible app. You can add more fields to the left pane to extract more data:
You can use this SDK to extract data from a document, as specified by the extraction configurations and document types defined in your Sensible account.
See the following steps for an overview of the SDK's workflow for document data extraction. Every method returns a chainable promise:
new SensibleSDK()
.sensible.extract()
. Use the following required parameters:
url
, path
, or file
parameter.documentType
or documentTypes
parameter.sensible.waitFor()
, or use a webhook.generateExcel()
.You can configure options for document data extraction:
const request = await sensible.extract({
path: ("./1040_john_doe.pdf"),
documentType: "tax_forms",
configurationName: "1040_2021",
environment: "development",
documentName="1040_john_doe.pdf",
webhook: {
url:"YOUR_WEBHOOK_URL",
payload: "additional info, for example, a UUID for verification",
}});
See the following table for information about configuration options:
key | value | description |
---|---|---|
path | string | The path to the document you want to extract from. For more information about supported file size and types, see Supported file types. |
file | string | The non-encoded bytes of the document you want to extract from. |
url | string | The URL of the document you want to extract from. URL must: - respond to a GET request with the bytes of the document you want to extract data from - be either publicly accessible, or presigned with a security token as part of the URL path. To check if the URL meets these criteria, open the URL with a web browser. The browser must either render the document as a full-page view with no other data, or download the document, without prompting for authentication. |
documentType | string | Type of document to extract from. Create your custom type in the Sensible app (for example, rate_confirmation , certificate_of_insurance , or home_inspection_report ), or use Sensible's library of out-of-the-box supported document types. |
documentTypes | array | Types of documents to extract from. Use this parameter to extract from multiple documents that are packaged into one file (a "portfolio"). This parameter specifies the document types contained in the portfolio. Sensible then segments the portfolio into documents using the specified document types (for example, 1099, w2, and bank_statement) and then runs extractions for each document. For more information, see Multi-doc extraction. |
configurationName | string | Sensible uses the specified config to extract data from the document instead of automatically choosing the configuration. If unspecified, Sensible chooses the best-scoring extraction from the configs in the document type. Not applicable for portfolios. |
documentName | string | If you specify the file name of the document using this parameter, then Sensible returns the file name in the extraction response and populates the file name in the Sensible app's list of recent extractions. |
environment | "production" or "development" . default: "production" | If you specify development , Sensible extracts preferentially using config versions published to the development environment in the Sensible app. The extraction runs all configs in the doc type before picking the best fit. For each config, falls back to production version if no development version of the config exists. |
webhook | object | Specifies to return extraction results to the specified webhook URL as soon as they're complete, so you don't have to poll for results status. Sensible also calls this webhook on error. The webhook object has the following parameters: url : string. Webhook destination. Sensible posts to this URL when the extraction is complete.payload : string, number, boolean, object, or array. Information additional to the API response, for example a UUID for verification. |
Get extraction results by using a webhook or calling the Wait For method.
For the extraction results schema, see Extract data from a document and expand the 200 responses in the middle pane and the right pane to see the model and an example, respectively.
See the following code for an example of how to use the SDK for document extraction in your app.
The example:
bank_statements
document type.import { promises as fs } from "fs";
import { SensibleSDK } from "sensible-api";
import got from "got";
const apiKey = process.env.SENSIBLE_API_KEY;
const sensible = new SensibleSDK(apiKey);
const dir = ABSOLUTE_PATH_TO_DOCUMENTS_DIR;
const files = (await fs.readdir(dir)).filter((file) => file.match(/\.pdf$/));
const extractions = await Promise.all(
files.map(async (filename) => {
const path = `${dir}/${filename}`;
return sensible.extract({
path,
documentType: "bank_statements",
});
})
);
const results = await Promise.all(
extractions.map((extraction) => sensible.waitFor(extraction))
);
console.log(extractions);
console.log(results);
const excel = await sensible.generateExcel(extractions);
console.log("Excel download URL:");
console.log(excel);
const excelFile = await got(excel.url);
await fs.writeFile(`${dir}/output.xlsx`, excelFile.rawBody);
You can use this SDK to classify a document by type, as specified by the document types defined in your Sensible account. For more information, see Classifying documents by type.
See the following steps for an overview of the SDK's workflow for document classification. Every method returns a chainable promise:
Instantiate an SDK object (new SensibleSDK()
.
Request a document classification (sensible.classify()
. Specify the document to classify using the path
or file
parameter.
Poll for the result (sensible.waitFor()
.
Consume the data.
You can configure options for document data extraction:
import { SensibleSDK } from "sensible-api";
// if you paste in your key, like `SensibleSDK("1ac34b14")` then secure it in production
const sensible = new SensibleSDK(YOUR_API_KEY);
const request = await sensible.classify({
path:"./boa_sample.pdf"
});
const results = await sensible.waitFor(request);
console.log(results);
See the following table for information about configuration options:
key | value | description |
---|---|---|
path | string | The path to the document you want to classify. For information about supported file size and types, see Supported file types. |
file | string | The non-encoded bytes of the document you want to classify. |
Get results from this method by calling the Wait For method. For the classification results schema, see Classify document by type (sync) and expand the 200 responses in the middle pane and the right pane to see the model and an example, respectively.
FAQs
Javascript SDK for Sensible, the developer-first platform for extracting structured data from documents so that you can build document-automation features into your SaaS products
We found that sensible-api demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
GitHub removed 27 malicious pull requests attempting to inject harmful code across multiple open source repositories, in another round of low-effort attacks.
Security News
RubyGems.org has added a new "maintainer" role that allows for publishing new versions of gems. This new permission type is aimed at improving security for gem owners and the service overall.
Security News
Node.js will be enforcing stricter semver-major PR policies a month before major releases to enhance stability and ensure reliable release candidates.