
Product
Introducing Scala and Kotlin Support in Socket
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Official TypeScript/JavaScript SDK for PDF Vector API - Parse PDFs to markdown and search academic publications across multiple databases
The official TypeScript/JavaScript SDK for the PDF Vector API: Convert PDF and Word documents to clean, structured markdown format with optional AI enhancement, search across multiple academic databases with a unified API, and fetch specific publications by DOI, PubMed ID, ArXiv ID, and more.
npm install pdfvector
# or
yarn add pdfvector
# or
pnpm add pdfvector
# or
bun add pdfvector
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
// Parse from document URL or data
const parseResult = await client.parse({
url: "https://example.com/document.pdf",
useLLM: "auto",
});
console.log(parseResult.markdown); // Return clean markdown
console.log(
`Pages: ${parseResult.pageCount}, Credits: ${parseResult.creditCount}`,
);
Get your API key from the PDF Vector dashboard. The SDK requires a valid API key for all operations.
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const result = await client.parse({
url: "https://arxiv.org/pdf/2301.00001.pdf",
useLLM: "auto",
});
console.log(result.markdown);
import { readFile } from "fs/promises";
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const result = await client.parse({
data: await readFile("document.pdf"),
contentType: "application/pdf",
useLLM: "auto",
});
console.log(result.markdown);
import { PDFVector } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const searchResponse = await client.academicSearch({
query: "quantum computing",
providers: ["semantic-scholar", "arxiv", "pubmed"], // Search across multiple academic databases
limit: 20,
yearFrom: 2021,
yearTo: 2024,
});
searchResponse.results.forEach((publication) => {
console.log(`Title: ${publication.title}`);
console.log(`Authors: ${publication.authors?.map((a) => a.name).join(", ")}`);
console.log(`Year: ${publication.year}`);
console.log(`Abstract: ${publication.abstract}`);
console.log("---");
});
const searchResponse = await client.academicSearch({
query: "CRISPR gene editing",
providers: ["semantic-scholar"],
fields: ["title", "authors", "year", "providerData"], //providerData is Provider-Specific data field
});
searchResponse.results.forEach((pub) => {
if (pub.provider === "semantic-scholar" && pub.providerData) {
const data = pub.providerData;
console.log(`Influential Citations: ${data.influentialCitationCount}`);
console.log(`Fields of Study: ${data.fieldsOfStudy?.join(", ")}`);
}
});
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const response = await client.academicFetch({
ids: [
"10.1038/nature12373", // DOI
"12345678", // PubMed ID
"2301.00001", // ArXiv ID
"arXiv:2507.16298v1", // ArXiv with prefix
"ED123456", // ERIC ID
"0f40b1f08821e22e859c6050916cec3667778613", // Semantic Scholar ID
],
fields: ["title", "authors", "year", "abstract", "doi"], // Optional: specify fields
});
// Handle successful results
response.results.forEach((pub) => {
console.log(`Title: ${pub.title}`);
console.log(`Provider: ${pub.detectedProvider}`);
console.log(`Requested as: ${pub.id}`);
});
// Handle errors for IDs that couldn't be fetched
response.errors?.forEach((error) => {
console.log(`Failed to fetch ${error.id}: ${error.error}`);
});
import { PDFVector, PDFVectorError } from "pdfvector";
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
try {
const result = await client.parse({
url: "https://example.com/document.pdf",
});
console.log(result.markdown);
} catch (error) {
if (error instanceof PDFVectorError) {
console.error(`API Error: ${error.message}`);
console.error(`Status: ${error.status}`);
console.error(`Code: ${error.code}`);
} else {
console.error("Unexpected Error:", error);
}
}
The client class for interacting with the PDF Vector API.
new PDFVector(config: PDFVectorConfig)
Parameters:
config.apiKey
(string): Your PDF Vector API keyconfig.baseUrl
(string, optional): Custom base URL (defaults to https://www.pdfvector.com
)parse(request)
Parse a PDF or Word document and convert it to markdown.
Parameters:
For URL parsing:
{
url: string; // Direct URL to PDF/Word document
useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}
For data parsing:
{
data: string | Buffer | Uint8Array | ArrayBuffer | Blob | ReadableStream; // Direct data of PDF/Word document
contentType: string; // MIME type (e.g., 'application/pdf')
useLLM?: 'auto' | 'always' | 'never'; // Default: 'auto'
}
Returns:
{
markdown: string; // Extracted content as markdown
pageCount: number; // Number of pages processed
creditCount: number; // Credits consumed (1-2 per page)
usedLLM: boolean; // Whether AI enhancement was used
}
auto
(default): Automatically decide if AI enhancement is needed (1-2 credits per page)never
: Standard parsing without AI (1 credit per page)always
: Force AI enhancement (2 credits per page)Note: Free plans are limited to useLLM: 'never'
. Upgrade to a paid plan for AI enhancement.
application/pdf
application/x-pdf
application/acrobat
application/vnd.pdf
text/pdf
text/x-pdf
application/msword
(.doc)application/vnd.openxmlformats-officedocument.wordprocessingml.document
(.docx)url-not-found
: Document URL not accessibleunsupported-content-type
: File type not supportedtimeout-error
: Processing timeout (3 minutes max)payment-required
: Usage limit reachedacademicSearch(request)
Search academic publications across multiple databases.
Parameters:
{
query: string; // Search query
providers?: AcademicSearchProvider[]; // Databases to search (default: ["semantic-scholar"])
offset?: number; // Pagination offset (default: 0)
limit?: number; // Results per page, 1-100 (default: 20)
yearFrom?: number; // Filter by publication year (from) (min: 1900)
yearTo?: number; // Filter by publication year (to) (max: 2050)
fields?: AcademicSearchPublicationField[]; // Fields to include in response
}
Supported Providers:
"semantic-scholar"
- Semantic Scholar"arxiv"
- ArXiv"pubmed"
- PubMed"google-scholar"
- Google Scholar"eric"
- ERICAvailable Fields:
"id"
, "doi"
, "title"
, "url"
, "providerURL"
, "authors"
, "date"
, "year"
, "totalCitations"
, "totalReferences"
, "abstract"
, "pdfURL"
, "provider"
"providerData"
- Provider-specific metadataReturns:
{
estimatedTotalResults: number; // Total results available
results: AcademicSearchPublication[]; // Array of publications
errors?: AcademicSearchProviderError[]; // Any provider errors
}
academicFetch(request)
/ fetch(request)
Fetch specific academic publications by their IDs with automatic provider detection.
Parameters:
{
ids: string[]; // Array of publication IDs to fetch
fields?: AcademicSearchPublicationField[]; // Fields to include in response
}
Supported ID Types:
"10.1038/nature12373"
"12345678"
(numeric ID)"2301.00001"
or "arXiv:2301.00001"
or "math.GT/0309136"
"0f40b1f08821e22e859c6050916cec3667778613"
"ED123456"
Returns:
{
results: AcademicFetchResult[]; // Successfully fetched publications
errors?: AcademicFetchError[]; // Errors for IDs that couldn't be fetched
}
Each result includes:
{
id: string; // The ID that was used to fetch
detectedProvider: string; // Provider that was used
// ... all publication fields (title, authors, abstract, etc.)
}
The SDK is written in TypeScript and includes full type definitions:
import type {
// Core classes
PDFVector,
PDFVectorConfig,
PDFVectorError,
// Parse API types
ParseURLRequest,
ParseDataRequest,
ParseResponse,
// Academic Search API types
SearchRequest,
AcademicSearchResponse,
AcademicSearchPublication,
AcademicSearchProvider,
AcademicSearchAuthor,
AcademicSearchPublicationField,
// Academic Fetch API types
FetchRequest,
AcademicFetchResponse,
AcademicFetchResult,
AcademicFetchError,
// Provider-specific data types
AcademicSearchSemanticScholarData,
AcademicSearchGoogleScholarData,
AcademicSearchPubMedData,
AcademicSearchArxivData,
AcademicSearchEricData,
} from "pdfvector";
// Constants
import {
AcademicSearchProviderValues, // Array of valid providers
AcademicSearchPublicationFieldValues, // Array of valid fields
} from "pdfvector";
fetch
APIconst client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
const documents = [
"https://example.com/doc1.pdf",
"https://example.com/doc2.pdf",
];
const results = await Promise.all(
documents.map((url) => client.parse({ url, useLLM: "auto" })),
);
results.forEach((result, index) => {
console.log(`Document ${index + 1}:`);
console.log(`Pages: ${result.pageCount}`);
console.log(`Credits: ${result.creditCount}`);
});
const client = new PDFVector({ apiKey: "pdfvector_api_key_here" });
let offset = 0;
const limit = 50;
const allResults = [];
// Fetch first page
let response = await client.academicSearch({
query: "climate change",
providers: ["semantic-scholar", "arxiv"],
offset,
limit,
});
allResults.push(...response.results);
// Fetch more pages as needed
while (
allResults.length < response.estimatedTotalResults &&
allResults.length < 200
) {
offset += limit;
response = await client.academicSearch({
query: "climate change",
providers: ["semantic-scholar", "arxiv"],
offset,
limit,
});
allResults.push(...response.results);
}
console.log(`Fetched ${allResults.length} publications`);
// For development or custom deployments
const client = new PDFVector({
apiKey: "pdfvector_api_key_here",
baseUrl: "https://pdfvector.acme.com",
});
This SDK is licensed under the MIT License.
FAQs
Official TypeScript/JavaScript SDK for PDF Vector API - Parse PDFs to markdown and search academic publications across multiple databases
The npm package pdfvector receives a total of 249 weekly downloads. As such, pdfvector popularity was classified as not popular.
We found that pdfvector demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Product
Socket now supports Scala and Kotlin, bringing AI-powered threat detection to JVM projects with easy manifest generation and fast, accurate scans.
Application Security
/Security News
Socket CEO Feross Aboukhadijeh and a16z partner Joel de la Garza discuss vibe coding, AI-driven software development, and how the rise of LLMs, despite their risks, still points toward a more secure and innovative future.
Research
/Security News
Threat actors hijacked Toptal’s GitHub org, publishing npm packages with malicious payloads that steal tokens and attempt to wipe victim systems.