Cartesia JavaScript Client
This client provides convenient access to Cartesia's TTS models. Sonic is the fastest text-to-speech model around—it can generate a second of audio in just 650ms, and it can stream out the first audio chunk in just 135ms. Alongside Sonic, we also offer an extensive prebuilt voice library for a variety of use cases.
The full API documentation can be found on docs.cartesia.ai.
Installation
npm install @cartesia/cartesia-js
yarn add @cartesia/cartesia-js
pnpm add @cartesia/cartesia-js
bun add @cartesia/cartesia-js
Usage
CRUD on Voices
import Cartesia from "@cartesia/cartesia-js";
const cartesia = new Cartesia({
apiKey: "your-api-key",
});
const voices = await cartesia.voices.list();
console.log(voices);
const voice = await cartesia.voices.get("<voice-id>");
console.log(voice);
const newVoice = await cartesia.voices.create({
name: "Tim",
description: "A deep, resonant voice.",
embedding: Array(192).fill(1.0),
});
console.log(newVoice);
const clonedVoice = await cartesia.voices.clone({
mode: "url",
url: "https://youtu.be/AdtLxlttrHg?si=07OLmDPg__0IN14f&t=6",
});
const clonedVoice = await cartesia.voices.clone({
mode: "clip",
clip: myFile,
});
TTS over WebSocket
import Cartesia from "@cartesia/cartesia-js";
const cartesia = new Cartesia({
apiKey: "your-api-key",
});
const websocket = cartesia.tts.websocket({ sampleRate: 44100 });
try {
await websocket.connect();
} catch (error) {
console.error(`Failed to connect to Cartesia: ${error}`);
}
const response = await websocket.send({
model: "upbeat-moon",
voice: {
mode: "embedding",
embedding: Array(192).fill(1.0),
},
transcript: "Hello, world!"
});
response.on("message", (message) => {
console.log("Received message:", message);
});
for await (const message of response.events('message')) {
console.log("Received message:", message);
}
Playing audio in the browser
(The WebPlayer
class only supports playing audio in the browser.)
import { WebPlayer } from "@cartesia/cartesia-js";
console.log("Playing stream...");
const player = new WebPlayer();
await player.play(response.source);
console.log("Done playing.");
React
We export a React hook that simplifies the process of using the TTS API. The hook manages the WebSocket connection and provides a simple interface for buffering, playing, pausing and restarting audio.
import { useTTS } from '@cartesia/cartesia-js/react';
function TextToSpeech() {
const tts = useTTS({
apiKey: "your-api-key",
sampleRate: 44100,
})
const [text, setText] = useState("");
const handlePlay = async () => {
const response = await tts.buffer({
model_id: "upbeat-moon",
voice: {
mode: "embedding",
embedding: Array(192).fill(1.0),
},
transcript: text,
});
await tts.play();
}
return (
<div>
<input type="text" value={text} onChange={(event) => setText(event.target.value)} />
<button onClick={handlePlay}>Play</button>
<div>
{tts.playbackStatus} | {tts.bufferStatus} | {tts.isWaiting}
</div>
</div>
);
}