bitHuman Avatar Runtime

Real-time avatar engine for visual AI agents, digital humans, and creative characters.

Photorealistic avatars with audio-driven lip sync at 25 FPS. Runs on edge devices โ typically 1โ2 CPU cores, <200 ms end-to-end latency. Use it for voice agents with faces, video chatbots, tutors, NPCs, digital humans.
Which avatar should I use?
The SDK ships one API โ AsyncBithuman.create(model_path=โฆ) โ driving two model types. Start with Essence. It's the default, runs on every supported platform, and is what every new user should reach for.
| Runs on | Linux / macOS / Windows, any CPU | macOS 14+ on Apple Silicon M3 or later (on-device) |
| Rendering | Pre-built .imx bundle, in-process | Bundled Swift daemon via IPC |
| Footprint | 1โ2 CPU cores, <200 MB RAM | ~4 GB RAM working set |
| Best for | Voice agents, kiosks, edge devices, everywhere | Custom-face avatars on Mac M3+ |
Loading an Expression .imx on an unsupported host raises a typed ExpressionModelNotSupported โ not a crash. For cloud or self-hosted-GPU Expression dispatch (Linux + NVIDIA, or bitHuman's cloud workers), use the LiveKit plugin (bithuman.AvatarSession), not AsyncBithuman.
Architecture deep dive + production patterns at docs.bithuman.ai.
Install
pip install bithuman --upgrade
Pre-built wheels for Python 3.9 โ 3.14 on Linux x86_64 + ARM64, macOS Intel + Apple Silicon, Windows x86_64. No compile step.
For LiveKit Agent integration (voice agents with faces over WebRTC):
pip install bithuman[agent]
Quick start โ Essence (cross-platform, default)
Grab an .imx from your bitHuman dashboard (โฎ โ Download), export your API secret, then:
CLI
export BITHUMAN_API_SECRET=your_secret
bithuman generate avatar.imx --audio speech.wav --output demo.mp4
Don't have a WAV to test with? Grab the 13-second sample bundled in the examples repo:
curl -O https://raw.githubusercontent.com/bithuman-product/bithuman-examples/main/essence-selfhosted/speech.wav
Python
import asyncio, os
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16
async def main():
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret=os.environ["BITHUMAN_API_SECRET"],
)
await runtime.start()
pcm, sr = load_audio("speech.wav")
pcm = float32_to_int16(pcm)
chunk = sr // 25
for i in range(0, len(pcm), chunk):
await runtime.push_audio(pcm[i : i + chunk].tobytes(), sr)
await runtime.flush()
async for frame in runtime.run():
if frame.has_image:
image = frame.bgr_image
audio = frame.audio_chunk
if frame.end_of_speech:
break
await runtime.stop()
asyncio.run(main())
Bithuman (no Async) is the threaded sync equivalent โ same surface, no await. Usage examples live in the bithuman-examples repo.
Quick start โ Expression on macOS M3+ (on-device, optional)
macOS 14+ on Apple Silicon M3 or later (M3+ recommended). M1 / M2 / Intel / Linux / Windows: use Essence above, or the LiveKit cloud plugin for Expression dispatch.
Expression bundles a diffusion-based animator that renders any face image in real time. Same API โ just point at an Expression .imx.
bithuman demo --model expression.imx --audio speech.wav
runtime = await AsyncBithuman.create(
model_path="expression.imx",
api_secret=os.environ["BITHUMAN_API_SECRET"],
identity="alice.jpg",
quality="medium",
)
await runtime.set_identity("bob.jpg")
await runtime.set_identity("bob_cached.npy")
None | 0 (bundle's baked-in face) | n/a |
.jpg / .png | ~300 ms | ~300 ms |
.npy (pre-encoded) | instant | instant |
"medium" (default) | 1.84ร | 1.14ร |
"high" | 1.05ร | 0.67ร โ sub-realtime โ offline only |
On a supported Mac, AsyncBithuman transparently spawns the bundled bithuman-expression-daemon subprocess when it sees an Expression manifest โ there's nothing to configure.
API surface
from bithuman import (
AsyncBithuman, Bithuman,
AudioChunk, VideoFrame, VideoControl,
Emotion, EmotionPrediction,
BithumanError, TokenExpiredError, ...
)
await runtime.push_audio(pcm_bytes, sample_rate)
await runtime.flush()
runtime.interrupt()
async for frame in runtime.run():
frame.bgr_image
frame.audio_chunk
frame.end_of_speech
frame.frame_index
await runtime.push(VideoControl(action="wave"))
await runtime.push(VideoControl(target_video="idle"))
await runtime.set_identity("face.jpg")
Full reference: docs.bithuman.ai.
CLI
Every command reads $BITHUMAN_API_SECRET by default.
bithuman generate <model> --audio <file> | Essence + Expression | Render a lip-synced MP4 |
bithuman stream <model> | Essence + Expression | Live streaming server at localhost:3001 |
bithuman speak <audio> | โ | Send audio to a running stream server |
bithuman demo --model <imx> [--audio <file>] | Expression (macOS M3+) | Zero-friction Expression demo with a bundled sample clip |
bithuman convert <model> | Essence | Convert legacy TAR .imx to IMX v2 (smaller, 1000ร faster load) |
bithuman pack โฆ | Expression | Pack an Expression bundle from raw animator + encoder + renderer weights |
bithuman info <model> | Essence + Expression | Show model metadata |
bithuman validate <path> | Essence + Expression | Sanity-check a model file |
Environment variables
BITHUMAN_API_SECRET | API secret โ recommended |
BITHUMAN_API_KEY | Legacy alias read by generate / stream; use BITHUMAN_API_SECRET in new code |
BITHUMAN_RUNTIME_TOKEN | Pre-minted JWT (alternative to API secret) |
BITHUMAN_VERBOSE | Enable debug logging |
LiveKit agents
Build full voice agents with faces:
from bithuman import AsyncBithuman
from bithuman.utils.agent import LocalAvatarRunner, LocalVideoPlayer, LocalAudioIO
runtime = await AsyncBithuman.create(model_path="avatar.imx", api_secret="โฆ")
avatar = LocalAvatarRunner(
bithuman_runtime=runtime,
audio_input=session.audio,
audio_output=LocalAudioIO(session, agent_output),
video_output=LocalVideoPlayer(window_size=(1280, 720)),
)
await avatar.start()
For end-to-end Docker Compose stacks (LiveKit + OpenAI + bitHuman) see the bithuman-examples repo.
Troubleshooting
objc: Class AVFFrameReceiver is implemented in both โฆ/cv2/โฆ and โฆ/av/โฆ
Both OpenCV and PyAV (av) ship their own FFmpeg dylibs. The warning appears as soon as both are imported โ it's printed once at import and usually harmless. If you have the full opencv-python installed (rather than opencv-python-headless that bithuman depends on), it's a much larger collision โ fix that variant explicitly:
pip uninstall -y opencv-python && pip install opencv-python-headless
Links
License
Commercial license required. See bithuman.ai for pricing.