New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details
Socket
Book a DemoSign in
Socket

arxiv-api-wrapper

Package Overview
Dependencies
Maintainers
1
Versions
10
Alerts
File Explorer

Advanced tools

Socket logo

Install Socket

Detect and block malicious and high-risk dependencies

Install

arxiv-api-wrapper

Provides functions wrapping the arXiv API

latest
Source
npmnpm
Version
2.1.2
Version published
Maintainers
1
Created
Source

arxiv-api-wrapper

A TypeScript package that provides a convenient wrapper around the arXiv API, enabling easy querying and parsing of arXiv papers.

Installation

npm install arxiv-api-wrapper

Quick Start

import { getArxivEntries, getArxivEntriesById } from 'arxiv-api-wrapper';

// Search for papers
const result = await getArxivEntries({
  search: {
    title: ['quantum computing'],
    author: ['John Doe'],
  },
  maxResults: 10,
  sortBy: 'submittedDate',
  sortOrder: 'descending',
});

console.log(`Found ${result.feed.totalResults} papers`);
result.entries.forEach(entry => {
  console.log(`${entry.arxivId}: ${entry.title}`);
});

// Or fetch specific papers by ID
const papers = await getArxivEntriesById(['2101.01234', '2101.05678']);

Features

  • Type-safe: Full TypeScript support with comprehensive type definitions
  • Flexible Search: Support for complex queries with multiple filters, OR groups, and negation
  • Rate Limiting: Built-in token bucket rate limiter to respect arXiv API guidelines
  • Retry Logic: Automatic retries with exponential backoff for transient failures
  • Pagination: Support for paginated results with configurable page size
  • Sorting: Multiple sort options (relevance, submission date, last updated)
  • OAI-PMH: Support for the arXiv Open Archives Initiative interface (Identify, ListSets, GetRecord, ListRecords, ListIdentifiers, ListMetadataFormats)

OAI-PMH interface

The package also supports the arXiv OAI-PMH endpoint (https://oaipmh.arxiv.org/oai), which is useful for metadata harvesting and bulk access. See the arXiv OAI help and the OAI-PMH v2.0 protocol for details.

import {
  oaiIdentify,
  oaiListRecords,
  oaiListRecordsAsyncIterator,
  oaiGetRecord,
  oaiListSets,
  oaiListIdentifiers,
  oaiListMetadataFormats,
} from 'arxiv-api-wrapper';

// Repository info
const identify = await oaiIdentify();
console.log(identify.repositoryName, identify.protocolVersion);

// One page of records (e.g. Dublin Core)
const result = await oaiListRecords('oai_dc', {
  from: '2024-01-01',
  until: '2024-01-31',
  set: 'math:math:LO',  // optional: restrict to a set
  rateLimit: { tokensPerInterval: 1, intervalMs: 1000 },
});
result.records.forEach((rec) => {
  console.log(rec.header.identifier, rec.metadata);
});
if (result.resumptionToken) {
  // Fetch next page with result.resumptionToken.value
}

// Single record by identifier (full or short form)
const record = await oaiGetRecord('cs/0112017', 'oai_dc');

For an intermediate option between manual page-by-page pagination and *All helpers, use async iterators:

for await (const rec of oaiListRecordsAsyncIterator('oai_dc', {
  from: '2024-01-01',
  until: '2024-01-02',
  maxRecords: 50,
})) {
  console.log(rec.header.identifier);
}

If you omit maxRecords (or maxHeaders / maxSets on the corresponding iterators), iteration continues until the API is exhausted.

The oaiListRecordsAll / oaiListIdentifiersAll / oaiListSetsAll helpers are convenience wrappers that collect from the corresponding async iterators.

Async iterators keep continuation token metadata in memory while paging. If a token includes an expirationDate and that time has passed, iterators fail fast locally with OaiError (code: 'badResumptionToken') before attempting another request.

All OAI functions accept optional timeoutMs, retries, userAgent, and rateLimit (same as the Atom API). Other OAI errors (e.g. idDoesNotExist) are thrown as OaiError with a code and messageText. noRecordsMatch is treated as “no results”: the wrapper returns an empty list (empty records or headers) instead of throwing, so you always get a normal result shape from oaiListRecords and oaiListIdentifiers.

Differences from OAI-PMH: The underlying arXiv OAI server returns an error response when a list request matches no records. This wrapper normalises that to an empty list so callers can assume a consistent result type without handling noRecordsMatch as an exception.

API Reference

For complete API documentation with detailed type information and examples, see the generated API documentation.

getArxivEntriesById(ids: string[], options?): Promise<ArxivQueryResult>

Simpler function to fetch arXiv papers by their IDs using the id_list API mode.

Parameters:

  • ids: string[] - Array of arXiv paper IDs (e.g., ['2101.01234', '2101.05678'])
  • options?: object - Optional request configuration
    • rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configuration
    • retries?: number - Number of retry attempts (default: 3)
    • timeoutMs?: number - Request timeout in milliseconds (default: 10000)
    • userAgent?: string - Custom User-Agent header

Returns: Same as getArxivEntries - see return type below.

getArxivEntries(options: ArxivQueryOptions): Promise<ArxivQueryResult>

Main function to query the arXiv API with search filters or ID lists.

Options:

  • idList?: string[] - List of arXiv IDs to fetch (e.g., ['2101.01234', '2101.05678'])
  • search?: ArxivSearchFilters - Search filters (when used with idList, filters the entries from idList to only return those matching the search query)
  • start?: number - Pagination offset (0-based)
  • maxResults?: number - Maximum number of results (≤ 300)
  • sortBy?: 'relevance' | 'lastUpdatedDate' | 'submittedDate' - Sort field
  • sortOrder?: 'ascending' | 'descending' - Sort direction
  • timeoutMs?: number - Request timeout in milliseconds (default: 10000)
  • retries?: number - Number of retry attempts (default: 3)
  • rateLimit?: { tokensPerInterval: number, intervalMs: number } - Rate limit configuration
  • userAgent?: string - Custom User-Agent header

Search Filters:

  • title?: string[] - Search in titles
  • author?: string[] - Search by author names
  • abstract?: string[] - Search in abstracts
  • category?: string[] - Filter by arXiv categories
  • submittedDateRange?: { from: string, to: string } - Date range filter (YYYYMMDDTTTT format)
  • or?: ArxivSearchFilters[] - OR group of filters
  • andNot?: ArxivSearchFilters - Negated filter (ANDNOT)

Returns:

{
  feed: {
    id: string;
    updated: string;
    title: string;
    link: string;
    totalResults: number;
    startIndex: number;
    itemsPerPage: number;
  };
  entries: Array<{
    id: string;
    arxivId: string;
    title: string;
    summary: string;
    published: string;
    updated: string;
    authors: Array<{ name: string; affiliation?: string }>;
    categories: string[];
    primaryCategory?: string;
    links: Array<{ href: string; rel?: string; type?: string; title?: string }>;
    doi?: string;
    journalRef?: string;
    comment?: string;
  }>;
}

Examples

Search by title and author

const result = await getArxivEntries({
  search: {
    title: ['machine learning'],
    author: ['Geoffrey Hinton'],
  },
  maxResults: 5,
});

Fetch specific papers by ID

Using the simpler getArxivEntriesById function:

const result = await getArxivEntriesById(['2101.01234', '2101.05678']);

Or using getArxivEntries:

const result = await getArxivEntries({
  idList: ['2101.01234', '2101.05678'],
});

Complex search with OR and date range

const result = await getArxivEntries({
  search: {
    or: [
      { title: ['quantum'] },
      { abstract: ['quantum'] },
    ],
    submittedDateRange: {
      from: '202301010600',
      to: '202401010600',
    },
  },
  sortBy: 'submittedDate',
  sortOrder: 'descending',
});

Fetch papers by ID with rate limiting

const result = await getArxivEntriesById(
  ['2101.01234', '2101.05678'],
  {
    rateLimit: {
      tokensPerInterval: 1,
      intervalMs: 3000, // 1 request per 3 seconds
    },
    timeoutMs: 15000,
  }
);

Search with rate limiting

const result = await getArxivEntries({
  search: { title: ['neural networks'] },
  rateLimit: {
    tokensPerInterval: 1,
    intervalMs: 3000, // 1 request per 3 seconds
  },
});

Documentation

Generating API Documentation

To generate browsable API documentation from the source code:

npm run docs:generate

This will create HTML documentation in the docs/ directory. You can then view it locally:

npm run docs:serve

The generated documentation includes:

  • Complete API reference for all exported functions and types
  • Detailed parameter descriptions and examples
  • Type information and relationships
  • Search functionality

IDE IntelliSense

All exported functions and types include JSDoc comments for enhanced IDE IntelliSense support. Hover over any exported symbol in your IDE to see inline documentation.

TypeScript Types

All types are exported from the package:

import type {
  ArxivQueryOptions,
  ArxivQueryResult,
  ArxivSearchFilters,
  ArxivEntry,
  ArxivFeedMeta,
  ArxivAuthor,
  ArxivLink,
  ArxivSortBy,
  ArxivSortOrder,
  ArxivRateLimitConfig,
  ArxivDateRange,
  // OAI-PMH types
  OaiIdentifyResponse,
  OaiRecord,
  OaiHeader,
  OaiSet,
  OaiMetadataFormat,
  OaiResumptionToken,
  OaiListRecordsResult,
  OaiListIdentifiersResult,
  OaiListSetsResult,
  OaiRequestOptions,
  OaiListOptions,
  OaiErrorCode,
  OaiError
  } from 'arxiv-api-wrapper';

License

ISC

Author

Vilhelm Agdur

Repository

https://github.com/vagdur/arxiv-api-wrapper

Keywords

arxiv

FAQs

Package last updated on 21 Mar 2026

Did you know?

Socket

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

Related posts