The official TypeScript/JavaScript SDK for the PDF Vector API: Convert PDF and Word documents to clean, structured markdown format with optional AI enhancement, search across multiple academic databases with a unified API, and fetch specific publications by DOI, PubMed ID, ArXiv ID, and more.

Installation

npm install pdfvector
# or
yarn add pdfvector
# or
pnpm add pdfvector
# or
bun add pdfvector

Quick Start

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

// Parse from document URL or data
const parseResult = await client.parse({
  url: "https://example.com/document.pdf",
  useLLM: "auto",
});

console.log(parseResult.markdown); // Return clean markdown
console.log(
  `Pages: ${parseResult.pageCount}, Credits: ${parseResult.creditCount}`,
);

Authentication

Get your API key from the PDF Vector dashboard. The SDK requires a valid API key for all operations.

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

Usage Examples

Parse from URL

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  url: "https://arxiv.org/pdf/2301.00001.pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Parse from data

import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const result = await client.parse({
  data: await readFile("document.pdf"),
  contentType: "application/pdf",
  useLLM: "auto",
});

console.log(result.markdown);

Search academic publications

import { PDFVector } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const searchResponse = await client.academicSearch({
  query: "quantum computing",
  providers: ["semantic-scholar", "arxiv", "pubmed"], // Search across multiple academic databases
  limit: 20,
  yearFrom: 2021,
  yearTo: 2024,
});

searchResponse.results.forEach((publication) => {
  console.log(`Title: ${publication.title}`);
  console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
  console.log(`Year: ${publication.year}`);
  console.log(`Abstract: ${publication.abstract}`);
  console.log("---");
});

Search with Provider-Specific Data

const searchResponse = await client.academicSearch({
  query: "CRISPR gene editing",
  providers: ["semantic-scholar"],
  fields: ["title", "authors", "year", "providerData"], //providerData is Provider-Specific data field
});

searchResponse.results.forEach((pub) => {
  if (pub.provider === "semantic-scholar" && pub.providerData) {
    const data = pub.providerData;
    console.log(`Influential Citations: ${data.influentialCitationCount}`);
    console.log(`Fields of Study: ${data.fieldsOfStudy?.join(", ")}`);
  }
});

Fetch Academic Publications by ID

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const response = await client.academicFetch({
  ids: [
    "10.1038/nature12373", // DOI
    "12345678", // PubMed ID
    "2301.00001", // ArXiv ID
    "arXiv:2507.16298v1", // ArXiv with prefix
    "ED123456", // ERIC ID
    "0f40b1f08821e22e859c6050916cec3667778613", // Semantic Scholar ID
  ],
  fields: ["title", "authors", "year", "abstract", "doi"], // Optional: specify fields
});

// Handle successful results
response.results.forEach((pub) => {
  console.log(`Title: ${pub.title}`);
  console.log(`Provider: ${pub.detectedProvider}`);
  console.log(`Requested as: ${pub.id}`);
});

// Handle errors for IDs that couldn't be fetched
response.errors?.forEach((error) => {
  console.log(`Failed to fetch ${error.id}: ${error.error}`);
});

Error Handling

import { PDFVector, PDFVectorError } from "pdfvector";

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

try {
  const result = await client.parse({
    url: "https://example.com/document.pdf",
  });
  console.log(result.markdown);
} catch (error) {
  if (error instanceof PDFVectorError) {
    console.error(`API Error: ${error.message}`);
    console.error(`Status: ${error.status}`);
    console.error(`Code: ${error.code}`);
  } else {
    console.error("Unexpected Error:", error);
  }
}

API Reference

The client class for interacting with the PDF Vector API.

Constructor

new PDFVector(config: PDFVectorConfig)

Parameters:

config.apiKey (string): Your PDF Vector API key
config.baseUrl (string, optional): Custom base URL (defaults to https://www.pdfvector.com)

Methods

`parse(request)`

Parse a PDF or Word document and convert it to markdown.

Parameters:

For URL parsing:

{
  url: string;           // Direct URL to PDF/Word document
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

For data parsing:

{
  data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Direct data of PDF/Word document
  contentType: string;   // MIME type (e.g., 'application/pdf')
  useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}

Returns:

{
  markdown: string; // Extracted content as markdown
  pageCount: number; // Number of pages processed
  creditCount: number; // Credits consumed (1-2 per page)
  usedLLM: boolean; // Whether AI enhancement was used
}

LLM Usage Options

auto (default): Automatically decide if AI enhancement is needed (1-2 credits per page)
never: Standard parsing without AI (1 credit per page)
always: Force AI enhancement (2 credits per page)

Note: Free plans are limited to useLLM: 'never'. Upgrade to a paid plan for AI enhancement.

Supported File Types

PDF Documents

application/pdf
application/x-pdf
application/acrobat
application/vnd.pdf
text/pdf
text/x-pdf

Word Documents

application/msword (.doc)
application/vnd.openxmlformats-officedocument.wordprocessingml.document (.docx)

Usage Limits

Processing timeout: 3 minutes per document
File size: No explicit limit, but larger files usually have more pages and consume more credits

Cost

Credits: Consumed per page (1-2 credits depending on LLM usage)

Common error codes:

url-not-found: Document URL not accessible
unsupported-content-type: File type not supported
timeout-error: Processing timeout (3 minutes max)
payment-required: Usage limit reached

`academicSearch(request)`

Search academic publications across multiple databases.

Parameters:

{
  query: string;                              // Search query
  providers?: AcademicSearchProvider[];       // Databases to search (default: ["semantic-scholar"])
  offset?: number;                            // Pagination offset (default: 0)
  limit?: number;                             // Results per page, 1-100 (default: 20)
  yearFrom?: number;                          // Filter by publication year (from) (min: 1900)
  yearTo?: number;                            // Filter by publication year (to) (max: 2050)
  fields?: AcademicSearchPublicationField[];  // Fields to include in response
}

Supported Providers:

"semantic-scholar" - Semantic Scholar
"arxiv" - ArXiv
"pubmed" - PubMed
"google-scholar" - Google Scholar
"eric" - ERIC

Available Fields:

Basic fields: "id", "doi", "title", "url", "providerURL", "authors", "date", "year", "totalCitations", "totalReferences", "abstract", "pdfURL", "provider"
Extended field: "providerData" - Provider-specific metadata

Returns:

{
  estimatedTotalResults: number;              // Total results available
  results: AcademicSearchPublication[];       // Array of publications
  errors?: AcademicSearchProviderError[];     // Any provider errors
}

Cost

Credits: 2 credits per search.

`academicFetch(request)` / `fetch(request)`

Fetch specific academic publications by their IDs with automatic provider detection.

Parameters:

{
  ids: string[];                               // Array of publication IDs to fetch
  fields?: AcademicSearchPublicationField[];   // Fields to include in response
}

Supported ID Types:

DOI: e.g., "10.1038/nature12373"
PubMed ID: e.g., "12345678" (numeric ID)
ArXiv ID: e.g., "2301.00001" or "arXiv:2301.00001" or "math.GT/0309136"
Semantic Scholar ID: e.g., "0f40b1f08821e22e859c6050916cec3667778613"
ERIC ID: e.g., "ED123456"

Returns:

{
  results: AcademicFetchResult[];    // Successfully fetched publications
  errors?: AcademicFetchError[];     // Errors for IDs that couldn't be fetched
}

Each result includes:

{
  id: string; // The ID that was used to fetch
  detectedProvider: string; // Provider that was used
  // ... all publication fields (title, authors, abstract, etc.)
}

Cost

Credits: 2 credit per fetch.

TypeScript Support

The SDK is written in TypeScript and includes full type definitions:

import type {
  // Core classes
  PDFVector,
  PDFVectorConfig,
  PDFVectorError,
  // Parse API types
  ParseURLRequest,
  ParseDataRequest,
  ParseResponse,
  // Academic Search API types
  SearchRequest,
  AcademicSearchResponse,
  AcademicSearchPublication,
  AcademicSearchProvider,
  AcademicSearchAuthor,
  AcademicSearchPublicationField,
  // Academic Fetch API types
  FetchRequest,
  AcademicFetchResponse,
  AcademicFetchResult,
  AcademicFetchError,
  // Provider-specific data types
  AcademicSearchSemanticScholarData,
  AcademicSearchGoogleScholarData,
  AcademicSearchPubMedData,
  AcademicSearchArxivData,
  AcademicSearchEricData,
} from "pdfvector";

// Constants
import {
  AcademicSearchProviderValues, // Array of valid providers
  AcademicSearchPublicationFieldValues, // Array of valid fields
} from "pdfvector";

Node.js Support

Node.js version: Node.js 20+
ESM: Supports ES modules (CommonJS is not supported)
Dependencies: Uses standard fetch API

Examples

Batch Processing

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

const documents = [
  "https://example.com/doc1.pdf",
  "https://example.com/doc2.pdf",
];

const results = await Promise.all(
  documents.map((url) => client.parse({ url, useLLM: "auto" })),
);

results.forEach((result, index) => {
  console.log(`Document ${index + 1}:`);
  console.log(`Pages: ${result.pageCount}`);
  console.log(`Credits: ${result.creditCount}`);
});

Academic Search with Pagination

const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });

let offset = 0;
const limit = 50;
const allResults = [];

// Fetch first page
let response = await client.academicSearch({
  query: "climate change",
  providers: ["semantic-scholar", "arxiv"],
  offset,
  limit,
});

allResults.push(...response.results);

// Fetch more pages as needed
while (
  allResults.length < response.estimatedTotalResults &&
  allResults.length < 200
) {
  offset += limit;
  response = await client.academicSearch({
    query: "climate change",
    providers: ["semantic-scholar", "arxiv"],
    offset,
    limit,
  });
  allResults.push(...response.results);
}

console.log(`Fetched ${allResults.length} publications`);

Custom Base URL

// For development or custom deployments
const client = new PDFVector({
  apiKey: "pdfvector_api_key_here",
  baseUrl: "https://pdfvector.acme.com",
});

Support

API Reference (Scalar): pdfvector.com/v1/api/scalar
API Reference (Swagger): pdfvector.com/v1/api/swagger
Dashboard: pdfvector.com/dashboard

License

This SDK is licensed under the MIT License.

Keywords

FAQs

What is pdfvector?

Is pdfvector popular?

Is pdfvector well maintained?

Package last updated on 26 Jul 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

pdfvector

PDF Vector TypeScript/JavaScript SDK

Installation

Quick Start

Authentication

Usage Examples

Parse from URL

Parse from data

Search academic publications

Search with Provider-Specific Data

Fetch Academic Publications by ID

Error Handling

API Reference

Constructor

Methods

parse(request)

LLM Usage Options

Supported File Types

PDF Documents

Word Documents

Usage Limits

Cost

Common error codes:

academicSearch(request)

Cost

academicFetch(request) / fetch(request)

Cost

TypeScript Support

Node.js Support

Examples

Batch Processing

Academic Search with Pagination

Custom Base URL

Support

License

Keywords

Related posts

AI + a16z Podcast: Vibe Coding, Security Risks, and the Path to Progress

Toptal’s GitHub Organization Hijacked: 10 Malicious Packages Published

`parse(request)`

`academicSearch(request)`

`academicFetch(request)` / `fetch(request)`