
Security News
AGENTS.md Gains Traction as an Open Format for AI Coding Agents
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.
@basetenlabs/performance-client
Advanced tools
This library provides a high-performance Node.js client for Baseten.co endpoints including embeddings, reranking, and classification. It was built for massive concurrent POST requests to any URL, also outside of baseten.co. The PerformanceClient is built
This library provides a high-performance Node.js client for Baseten.co endpoints including embeddings, reranking, and classification. It was built for massive concurrent POST requests to any URL, also outside of baseten.co. The PerformanceClient is built on top of Rust (using napi-rs), reqwest and tokio and is MIT licensed.
Similar to the Python version, this client supports >1200 rps per client and was benchmarked in our blog.
npm install @basetenlabs/performance-client
Since different endpoints require different clients, you'll typically need to create separate clients for embeddings and reranking deployments.
const { PerformanceClient } = require('@basetenlabs/performance-client');
const apiKey = process.env.BASETEN_API_KEY;
const embedBaseUrl = "https://model-yqv4yjjq.api.baseten.co/environments/production/sync";
const rerankBaseUrl = "https://model-abc123.api.baseten.co/environments/production/sync";
// Create separate clients for different endpoints
const embedClient = new PerformanceClient(embedBaseUrl, apiKey);
const rerankClient = new PerformanceClient(rerankBaseUrl, apiKey);
const texts = ["Hello world", "Example text", "Another sample"];
try {
const response = embedClient.embed(
texts,
"text-embedding-3-small", // model
null, // encoding_format
null, // dimensions
null, // user
8, // max_concurrent_requests
2, // batch_size
30 // timeout_s
);
console.log(`Model used: ${response.model}`);
console.log(`Total tokens used: ${response.usage.total_tokens}`);
console.log(`Total time: ${response.total_time.toFixed(4)}s`);
if (response.individual_request_times) {
response.individual_request_times.forEach((time, i) => {
console.log(` Time for batch ${i}: ${time.toFixed(4)}s`);
});
}
response.data.forEach((embedding, i) => {
console.log(`Embedding for text ${i} (original input index ${embedding.index}):`);
console.log(` First 3 dimensions: ${embedding.embedding.slice(0, 3)}`);
console.log(` Length: ${embedding.embedding.length}`);
});
} catch (error) {
console.error('Embedding failed:', error.message);
}
const query = "What is the best framework?";
const documents = [
"Machine learning is a subset of artificial intelligence",
"JavaScript is a programming language",
"Deep learning uses neural networks",
"Python is popular for data science"
];
try {
const response = rerankClient.rerank(
query,
documents,
false, // raw_scores
true, // return_text
false, // truncate
"Right", // truncation_direction
4, // max_concurrent_requests
2, // batch_size
30 // timeout_s
);
console.log(`Reranked ${response.data.length} documents`);
console.log(`Total time: ${response.total_time.toFixed(4)}s`);
response.data.forEach((result, i) => {
console.log(`${i + 1}. Score: ${result.score.toFixed(3)} - ${result.text?.substring(0, 50)}...`);
});
} catch (error) {
console.error('Reranking failed:', error.message);
}
const textsToClassify = [
"This is great!",
"I did not like it.",
"Neutral experience."
];
try {
const response = rerankClient.classify(
textsToClassify,
false, // raw_scores
false, // truncate
"Right", // truncation_direction
4, // max_concurrent_requests
2, // batch_size
30 // timeout_s
);
console.log(`Classified ${response.data.length} texts`);
console.log(`Total time: ${response.total_time.toFixed(4)}s`);
response.data.forEach((group, i) => {
console.log(`Text ${i + 1}:`);
group.forEach(result => {
console.log(` ${result.label}: ${result.score.toFixed(3)}`);
});
});
} catch (error) {
console.error('Classification failed:', error.message);
}
The batch_post method is generic and can be used to send POST requests to any URL, not limited to Baseten endpoints:
const payloads = [
{ "model": "text-embedding-3-small", "input": ["Hello"] },
{ "model": "text-embedding-3-small", "input": ["World"] }
];
try {
const response = embedClient.batchPost(
"/v1/embeddings", // URL path
payloads,
4, // max_concurrent_requests
30 // timeout_s
);
console.log(`Processed ${response.data.length} batch requests`);
console.log(`Total time: ${response.total_time.toFixed(4)}s`);
response.data.forEach((result, i) => {
console.log(`Request ${i + 1}: ${JSON.stringify(result).substring(0, 100)}...`);
});
// Access response headers and individual request times
response.response_headers.forEach((headers, i) => {
console.log(`Response ${i + 1} headers:`, headers);
});
response.individual_request_times.forEach((time, i) => {
console.log(`Request ${i + 1} took: ${time.toFixed(4)}s`);
});
} catch (error) {
console.error('Batch POST failed:', error.message);
}
new PerformanceClient(baseUrl, apiKey)
baseUrl
(string): The base URL for the API endpointapiKey
(string, optional): API key. If not provided, will use BASETEN_API_KEY
or OPENAI_API_KEY
environment variablesinput
(Array): List of texts to embedmodel
(string): Model nameencoding_format
(string, optional): Encoding formatdimensions
(number, optional): Number of dimensionsuser
(string, optional): User identifiermax_concurrent_requests
(number, optional): Maximum concurrent requests (default: 32)batch_size
(number, optional): Batch size (default: 128)timeout_s
(number, optional): Timeout in seconds (default: 3600)query
(string): Query texttexts
(Array): List of texts to rerankraw_scores
(boolean, optional): Return raw scores (default: false)return_text
(boolean, optional): Return text in response (default: false)truncate
(boolean, optional): Truncate long texts (default: false)truncation_direction
(string, optional): "Left" or "Right" (default: "Right")max_concurrent_requests
(number, optional): Maximum concurrent requests (default: 32)batch_size
(number, optional): Batch size (default: 128)timeout_s
(number, optional): Timeout in seconds (default: 3600)inputs
(Array): List of texts to classifyraw_scores
(boolean, optional): Return raw scores (default: false)truncate
(boolean, optional): Truncate long texts (default: false)truncation_direction
(string, optional): "Left" or "Right" (default: "Right")max_concurrent_requests
(number, optional): Maximum concurrent requests (default: 32)batch_size
(number, optional): Batch size (default: 128)timeout_s
(number, optional): Timeout in seconds (default: 3600)url_path
(string): URL path for the POST requestpayloads
(Array): List of JSON payloads
max_concurrent_requests
(number, optional): Maximum concurrent requests (default: 32)timeout_s
(number, optional): Timeout in seconds (default: 3600)The client throws standard JavaScript errors for various failure cases:
try {
const response = embedClient.embed(texts, "model");
} catch (error) {
if (error.message.includes('cannot be empty')) {
console.error('Parameter validation error:', error.message);
} else if (error.message.includes('HTTP')) {
console.error('Network error:', error.message);
} else {
console.error('Other error:', error.message);
}
}
Run the test suite:
npm test
The tests use a simple built-in test framework and validate parameter handling, constructor behavior, and error conditions.
To build the native module:
# Install dependencies
npm install
# Build release version
npm run build
# Build debug version
npm run build:debug
Like the Python version, this Node.js client provides significant performance improvements over standard HTTP clients, especially for high-throughput embedding and reranking workloads.
MIT License
Venkatesh Narayan (Clay.com) for the prototpe of this here https://github.com/basetenlabs/truss/pull/1778 and Suren (Baseten) for getting a PoC and protyping the release pipeline. https://github.com/suren-atoyan/rust-ts-package
FAQs
This library provides a high-performance Node.js client for Baseten.co endpoints including embeddings, reranking, and classification. It was built for massive concurrent POST requests to any URL, also outside of baseten.co. The PerformanceClient is built
The npm package @basetenlabs/performance-client receives a total of 15 weekly downloads. As such, @basetenlabs/performance-client popularity was classified as not popular.
We found that @basetenlabs/performance-client demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 2 open source maintainers collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Security News
AGENTS.md is a fast-growing open format giving AI coding agents a shared, predictable way to understand project setup, style, and workflows.
Security News
/Research
Malicious npm package impersonates Nodemailer and drains wallets by hijacking crypto transactions across multiple blockchains.
Security News
This episode explores the hard problem of reachability analysis, from static analysis limits to handling dynamic languages and massive dependency trees.