@basetenlabs/performance-client

Package Overview

Dependencies

Maintainers

Versions

Alerts

File Explorer

Advanced tools

License

Install Socket

Detect and block malicious and high-risk dependencies

Install

@basetenlabs/performance-client

This library provides a high-performance Node.js client for Baseten.co endpoints including embeddings, reranking, and classification. It was built for massive concurrent POST requests to any URL, also outside of baseten.co. The PerformanceClient is built

latest

Source

npm

Version: 0.0.10

Version published: 3 months ago

Weekly downloads: 1.8K

Maintainers: 2

Weekly downloads

Created: 4 months ago

Source

High performance client for Baseten.co - Node.js Bindings

Similar to the Python version, this client supports >1200 rps per client and was benchmarked in our blog.

benchmarks

Installation

npm install @basetenlabs/performance-client

Usage

Basic Setup

Since different endpoints require different clients, you'll typically need to create separate clients for embeddings and reranking deployments.

const { PerformanceClient } = require('@basetenlabs/performance-client');

const apiKey = process.env.BASETEN_API_KEY;
const embedBaseUrl = "https://model-yqv4yjjq.api.baseten.co/environments/production/sync";
const rerankBaseUrl = "https://model-abc123.api.baseten.co/environments/production/sync";

// Create separate clients for different endpoints
const embedClient = new PerformanceClient(embedBaseUrl, apiKey);
const rerankClient = new PerformanceClient(rerankBaseUrl, apiKey);

Embeddings

const texts = ["Hello world", "Example text", "Another sample"];

try {
    const response = embedClient.embed(
        texts,
        "text-embedding-3-small", // model
        null, // encoding_format
        null, // dimensions
        null, // user
        8,    // max_concurrent_requests
        2,    // batch_size
        30    // timeout_s
    );

    console.log(`Model used: ${response.model}`);
    console.log(`Total tokens used: ${response.usage.total_tokens}`);
    console.log(`Total time: ${response.total_time.toFixed(4)}s`);

    if (response.individual_request_times) {
        response.individual_request_times.forEach((time, i) => {
            console.log(`  Time for batch ${i}: ${time.toFixed(4)}s`);
        });
    }

    response.data.forEach((embedding, i) => {
        console.log(`Embedding for text ${i} (original input index ${embedding.index}):`);
        console.log(`  First 3 dimensions: ${embedding.embedding.slice(0, 3)}`);
        console.log(`  Length: ${embedding.embedding.length}`);
    });
} catch (error) {
    console.error('Embedding failed:', error.message);
}

Reranking

const query = "What is the best framework?";
const documents = [
    "Machine learning is a subset of artificial intelligence",
    "JavaScript is a programming language",
    "Deep learning uses neural networks",
    "Python is popular for data science"
];

try {
    const response = rerankClient.rerank(
        query,
        documents,
        false, // raw_scores
        true,  // return_text
        false, // truncate
        "Right", // truncation_direction
        4,     // max_concurrent_requests
        2,     // batch_size
        30     // timeout_s
    );

    console.log(`Reranked ${response.data.length} documents`);
    console.log(`Total time: ${response.total_time.toFixed(4)}s`);

    response.data.forEach((result, i) => {
        console.log(`${i + 1}. Score: ${result.score.toFixed(3)} - ${result.text?.substring(0, 50)}...`);
    });
} catch (error) {
    console.error('Reranking failed:', error.message);
}

Classification

const textsToClassify = [
    "This is great!",
    "I did not like it.",
    "Neutral experience."
];

try {
    const response = rerankClient.classify(
        textsToClassify,
        false, // raw_scores
        false, // truncate
        "Right", // truncation_direction
        4,     // max_concurrent_requests
        2,     // batch_size
        30     // timeout_s
    );

    console.log(`Classified ${response.data.length} texts`);
    console.log(`Total time: ${response.total_time.toFixed(4)}s`);

    response.data.forEach((group, i) => {
        console.log(`Text ${i + 1}:`);
        group.forEach(result => {
            console.log(`  ${result.label}: ${result.score.toFixed(3)}`);
        });
    });
} catch (error) {
    console.error('Classification failed:', error.message);
}

General Batch POST

The batch_post method is generic and can be used to send POST requests to any URL, not limited to Baseten endpoints:

const payloads = [
    { "model": "text-embedding-3-small", "input": ["Hello"] },
    { "model": "text-embedding-3-small", "input": ["World"] }
];

try {
    const response = embedClient.batchPost(
        "/v1/embeddings", // URL path
        payloads,
        4,  // max_concurrent_requests
        30  // timeout_s
    );

    console.log(`Processed ${response.data.length} batch requests`);
    console.log(`Total time: ${response.total_time.toFixed(4)}s`);

    response.data.forEach((result, i) => {
        console.log(`Request ${i + 1}: ${JSON.stringify(result).substring(0, 100)}...`);
    });

    // Access response headers and individual request times
    response.response_headers.forEach((headers, i) => {
        console.log(`Response ${i + 1} headers:`, headers);
    });

    response.individual_request_times.forEach((time, i) => {
        console.log(`Request ${i + 1} took: ${time.toFixed(4)}s`);
    });
} catch (error) {
    console.error('Batch POST failed:', error.message);
}

API Reference

Constructor

new PerformanceClient(baseUrl, apiKey)

baseUrl (string): The base URL for the API endpoint
apiKey (string, optional): API key. If not provided, will use BASETEN_API_KEY or OPENAI_API_KEY environment variables

Methods

embed(input, model, encoding_format, dimensions, user, max_concurrent_requests, batch_size, timeout_s)

input (Array): List of texts to embed
model (string): Model name
encoding_format (string, optional): Encoding format
dimensions (number, optional): Number of dimensions
user (string, optional): User identifier
max_concurrent_requests (number, optional): Maximum concurrent requests (default: 32)
batch_size (number, optional): Batch size (default: 128)
timeout_s (number, optional): Timeout in seconds (default: 3600)

rerank(query, texts, raw_scores, return_text, truncate, truncation_direction, max_concurrent_requests, batch_size, timeout_s)

query (string): Query text
texts (Array): List of texts to rerank
raw_scores (boolean, optional): Return raw scores (default: false)
return_text (boolean, optional): Return text in response (default: false)
truncate (boolean, optional): Truncate long texts (default: false)
truncation_direction (string, optional): "Left" or "Right" (default: "Right")
max_concurrent_requests (number, optional): Maximum concurrent requests (default: 32)
batch_size (number, optional): Batch size (default: 128)
timeout_s (number, optional): Timeout in seconds (default: 3600)

classify(inputs, raw_scores, truncate, truncation_direction, max_concurrent_requests, batch_size, timeout_s)

inputs (Array): List of texts to classify
raw_scores (boolean, optional): Return raw scores (default: false)
truncate (boolean, optional): Truncate long texts (default: false)
truncation_direction (string, optional): "Left" or "Right" (default: "Right")
max_concurrent_requests (number, optional): Maximum concurrent requests (default: 32)
batch_size (number, optional): Batch size (default: 128)
timeout_s (number, optional): Timeout in seconds (default: 3600)

batchPost(url_path, payloads, max_concurrent_requests, timeout_s)

url_path (string): URL path for the POST request
payloads (Array): List of JSON payloads
max_concurrent_requests (number, optional): Maximum concurrent requests (default: 32)
timeout_s (number, optional): Timeout in seconds (default: 3600)

Error Handling

The client throws standard JavaScript errors for various failure cases:

try {
    const response = embedClient.embed(texts, "model");
} catch (error) {
    if (error.message.includes('cannot be empty')) {
        console.error('Parameter validation error:', error.message);
    } else if (error.message.includes('HTTP')) {
        console.error('Network error:', error.message);
    } else {
        console.error('Other error:', error.message);
    }
}

Testing

Run the test suite:

npm test

The tests use a simple built-in test framework and validate parameter handling, constructor behavior, and error conditions.

Development

To build the native module:

# Install dependencies
npm install

# Build release version
npm run build

# Build debug version
npm run build:debug

Benchmarks

Like the Python version, this Node.js client provides significant performance improvements over standard HTTP clients, especially for high-throughput embedding and reranking workloads.

License

MIT License

Acknowledgements:

Venkatesh Narayan (Clay.com) for the prototpe of this here https://github.com/basetenlabs/truss/pull/1778 and Suren (Baseten) for getting a PoC and protyping the release pipeline. https://github.com/suren-atoyan/rust-ts-package

Keywords

FAQs

What is @basetenlabs/performance-client?

Is @basetenlabs/performance-client popular?

Is @basetenlabs/performance-client well maintained?

Package last updated on 07 Aug 2025

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

@basetenlabs/performance-client

High performance client for Baseten.co - Node.js Bindings

Installation

Usage

Basic Setup

Embeddings

Reranking

Classification

General Batch POST

API Reference

Constructor

Methods

embed(input, model, encoding_format, dimensions, user, max_concurrent_requests, batch_size, timeout_s)

rerank(query, texts, raw_scores, return_text, truncate, truncation_direction, max_concurrent_requests, batch_size, timeout_s)

classify(inputs, raw_scores, truncate, truncation_direction, max_concurrent_requests, batch_size, timeout_s)

batchPost(url_path, payloads, max_concurrent_requests, timeout_s)

Error Handling

Testing

Development

Benchmarks

License

Acknowledgements:

Keywords

Related posts

9 Malicious NuGet Packages Deliver Time-Delayed Destructive Payloads

How Enterprise Security Is Adapting to AI-Accelerated Threats