New Research: Supply Chain Attack on Axios Pulls Malicious Dependency from npm.Details → →

Book a Demo Sign in

bithuman

Package Overview

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

bithuman

Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream live avatars to browsers. 1-2 CPU cores, <200ms latency. ARM, x86, macOS.

PyPI

Version: 1.7.9

Maintainers: 3

bitHuman Avatar Runtime

Real-time avatar engine for visual AI agents, digital humans, and creative characters.

bitHuman powers visual AI agents and conversational AI with photorealistic avatars and real-time lip-sync. Build voice agents with faces, video chatbots, AI assistants, and interactive digital humans — all running on edge devices with just 1-2 CPU cores and <200ms latency. Raw generation speed is 100+ FPS on CPU alone, enabling real-time streaming applications.

Installation

pip install bithuman --upgrade

Pre-built wheels for all major platforms — no compilation required:

	Linux	macOS	Windows
x86_64	yes	yes	yes
ARM64	yes	yes (Apple Silicon)	—
Python	3.9 — 3.14	3.9 — 3.14	3.9 — 3.14

For LiveKit agent integration:

pip install bithuman[agent]

Quick Start

Generate a lip-synced video

bithuman generate avatar.imx --audio speech.wav --key YOUR_API_KEY

Stream a live avatar to your browser

# Terminal 1: Start the streaming server
bithuman stream avatar.imx --key YOUR_API_KEY

# Terminal 2: Send audio to trigger lip-sync
bithuman speak speech.wav

Open http://localhost:3001 to see the avatar streaming live.

Python API (async)

import asyncio
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16

async def main():
    runtime = await AsyncBithuman.create(
        model_path="avatar.imx",
        api_secret="YOUR_API_KEY",
    )
    await runtime.start()

    # Load and stream audio
    audio, sr = load_audio("speech.wav")
    audio_int16 = float32_to_int16(audio)

    async def stream_audio():
        chunk_size = sr // 25  # match video FPS
        for i in range(0, len(audio_int16), chunk_size):
            await runtime.push_audio(
                audio_int16[i:i + chunk_size].tobytes(), sr
            )
        await runtime.flush()

    asyncio.create_task(stream_audio())

    # Receive lip-synced video frames
    async for frame in runtime.run():
        if frame.has_image:
            image = frame.bgr_image       # numpy (H, W, 3), uint8
            audio = frame.audio_chunk     # synchronized audio
        if frame.end_of_speech:
            break

    await runtime.stop()

asyncio.run(main())

Python API (sync)

from bithuman import Bithuman
from bithuman.audio import load_audio, float32_to_int16

runtime = Bithuman.create(model_path="avatar.imx", api_secret="YOUR_API_KEY")

audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)

chunk_size = sr // 100
for i in range(0, len(audio_int16), chunk_size):
    runtime.push_audio(audio_int16[i:i+chunk_size].tobytes(), sr)
runtime.flush()

for frame in runtime.run():
    if frame.has_image:
        image = frame.bgr_image
    if frame.end_of_speech:
        break

How It Works

Load model — .imx file contains the avatar's appearance, animations, and lip-sync data
Push audio — Stream audio bytes in real-time via push_audio(), call flush() when done
Get frames — Iterate runtime.run() to receive lip-synced video frames with synchronized audio

The runtime handles the full motion graph internally: idle animations, talking with lip-sync, head movements, blinking, and smooth transitions between states.

Performance

Metric	Value
Raw FPS	100+ on CPU (Intel i5-12400, Apple M2)
CPU cores	1-2 cores at 25 FPS
End-to-end latency	<200ms
Memory (IMX v2)	~200 MB per session
Model load time	<10ms (IMX v2)
Audio formats	WAV, MP3, FLAC, OGG, M4A

Features

Real-time lip-sync — Audio-driven mouth animation at 25 FPS with synchronized audio output
Cross-platform — Linux, macOS, Windows; x86_64 and ARM64; Python 3.9-3.14
Edge-ready — 1-2 CPU cores, no GPU required for inference
Sync + Async — Bithuman for threads, AsyncBithuman for async/await
Streaming-first — Push audio chunks in real-time, receive frames as they're generated
Actions & emotions — Trigger avatar gestures (wave, nod) and emotion states (joy, surprise)
Interrupt support — Cancel mid-speech for natural conversation flow
LiveKit integration — Built-in support for LiveKit Agents (WebRTC streaming)
CLI tools — Generate videos, stream live, convert models, validate setups
IMX v2 format — Optimized binary container with O(1) random access and WebP patches
Zero scipy dependency — Pure numpy audio pipeline, minimal install footprint

API Reference

`AsyncBithuman` / `Bithuman`

The main runtime for avatar animation.

# Create and initialize
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",     # Path to .imx model
    api_secret="API_KEY",        # API secret (recommended)
    # token="JWT_TOKEN",         # Or JWT token directly
)
await runtime.start()

# Push audio (int16 PCM, any sample rate — auto-resampled to 16kHz)
await runtime.push_audio(audio_bytes, sample_rate)
await runtime.flush()            # Signal end of speech
runtime.interrupt()              # Cancel current playback

# Receive frames
async for frame in runtime.run():
    frame.bgr_image              # np.ndarray (H, W, 3) uint8 BGR
    frame.rgb_image              # np.ndarray (H, W, 3) uint8 RGB
    frame.audio_chunk            # AudioChunk — synchronized audio
    frame.end_of_speech          # True when all audio processed
    frame.has_image              # True if image available
    frame.frame_index            # Frame number
    frame.source_message_id      # Correlates to input

# Controls
await runtime.push(VideoControl(action="wave"))          # Trigger action
await runtime.push(VideoControl(target_video="idle"))    # Switch state
runtime.set_muted(True)                                  # Mute processing

# Info
runtime.get_frame_size()          # (width, height)
runtime.get_first_frame()         # First idle frame as np.ndarray
runtime.get_expiration_time()     # Token expiry (unix timestamp)
runtime.is_token_validated()      # Auth status

await runtime.stop()

Data Classes

from bithuman import AudioChunk, VideoControl, VideoFrame, Emotion, EmotionPrediction

# AudioChunk — container for audio data
chunk = AudioChunk(data=np.array([...], dtype=np.int16), sample_rate=16000)
chunk.duration    # float — length in seconds
chunk.bytes       # bytes — raw PCM bytes

# VideoControl — input to the runtime
ctrl = VideoControl(
    audio=chunk,                    # Audio to lip-sync
    action="wave",                  # Trigger action (wave, nod, etc.)
    target_video="talking",         # Switch video state
    end_of_speech=True,             # Mark end of speech
    force_action=False,             # Override action deduplication
    emotion_preds=[                 # Set emotion state
        EmotionPrediction(emotion=Emotion.JOY, score=0.9),
    ],
)

# VideoFrame — output from runtime.run()
frame.bgr_image           # np.ndarray (H, W, 3) uint8 — BGR
frame.rgb_image           # np.ndarray (H, W, 3) uint8 — RGB
frame.audio_chunk         # AudioChunk — synchronized audio
frame.end_of_speech       # bool — True when done
frame.has_image           # bool — True if image available
frame.frame_index         # int — frame number
frame.source_message_id   # Hashable — correlates to VideoControl

# Emotion enum
Emotion.ANGER | Emotion.DISGUST | Emotion.FEAR | Emotion.JOY
Emotion.NEUTRAL | Emotion.SADNESS | Emotion.SURPRISE

Audio Utilities

from bithuman.audio import (
    load_audio,               # Load WAV/MP3/FLAC/OGG/M4A -> (float32, sr)
    float32_to_int16,         # float32 -> int16
    int16_to_float32,         # int16 -> float32
    resample,                 # Resample to target rate
    write_video_with_audio,   # Save MP4 with audio track
    AudioStreamBatcher,       # Real-time audio buffer
)

audio, sr = load_audio("speech.mp3")             # Any format
audio_int16 = float32_to_int16(audio)            # Ready for push_audio
audio_16k = resample(audio, sr, 16000)           # Resample
write_video_with_audio("out.mp4", frames, audio, sr, fps=25)

Exceptions

All exceptions inherit from BithumanError:

Exception	When
`TokenExpiredError`	JWT has expired
`TokenValidationError`	Invalid signature or claims
`TokenRequestError`	Auth server unreachable
`AccountStatusError`	Billing or access issue (HTTP 402/403)
`ModelNotFoundError`	Model file doesn't exist
`ModelLoadError`	Corrupt or incompatible model
`ModelSecurityError`	Security restriction triggered
`RuntimeNotReadyError`	Operation called before initialization

LiveKit Agent Integration

Build conversational AI agents with avatar faces using LiveKit Agents:

from bithuman import AsyncBithuman
from bithuman.utils.agent import LocalAvatarRunner, LocalVideoPlayer, LocalAudioIO

# Initialize bitHuman runtime
runtime = await AsyncBithuman.create(
    model_path="avatar.imx",
    api_secret="YOUR_API_KEY",
)

# Connect to LiveKit agent session
avatar = LocalAvatarRunner(
    bithuman_runtime=runtime,
    audio_input=session.audio,
    audio_output=LocalAudioIO(session, agent_output),
    video_output=LocalVideoPlayer(window_size=(1280, 720)),
)
await avatar.start()

See examples/livekit_agent/ for a complete working example with OpenAI Realtime voice.

Optimize Your Models

Convert existing .imx models to IMX v2 for dramatically better performance:

bithuman convert avatar.imx

Metric	Legacy (TAR)	IMX v2	Improvement
Model size	100 MB	50-70 MB	30-50% smaller
Load time	~10s	<10ms	1000x faster
Runtime speed	~30 FPS	100+ FPS	3-10x faster
Peak memory	~10 GB	~200 MB	98% less

Conversion is automatic on first load, but pre-converting saves startup time.

CLI Reference

Command	Description
`bithuman generate <model> --audio <file>`	Generate lip-synced MP4 from model + audio
`bithuman stream <model>`	Start live streaming server at localhost:3001
`bithuman speak <audio>`	Send audio to running stream server
`bithuman action <name>`	Trigger avatar action (wave, nod, etc.)
`bithuman info <model>`	Show model metadata
`bithuman list-videos <model>`	List all videos in a model
`bithuman convert <model>`	Convert legacy to optimized IMX v2
`bithuman validate <path>`	Validate model files load correctly

Configuration

Environment Variables

Variable	Description
`BITHUMAN_API_SECRET`	API secret for authentication
`BITHUMAN_RUNTIME_TOKEN`	JWT token (alternative to API secret)
`BITHUMAN_VERBOSE`	Enable debug logging
`CONVERT_THREADS`	Number of threads for model conversion (0 or unset = auto-detect)

Runtime Settings

Setting	Default	Description
`FPS`	`25`	Target frames per second
`OUTPUT_WIDTH`	`1280`	Output frame width (0 = native resolution)
`PRELOAD_TO_MEMORY`	`False`	Cache model in RAM for faster decode
`PROCESS_IDLE_VIDEO`	`True`	Run inference during silence (natural idle)

Use Cases

Visual AI Agents — Give your voice agents a face with real-time lip-sync
Conversational AI — Build video chatbots and AI assistants with human-like presence
Live Streaming — Stream avatars to browsers via WebSocket, LiveKit, or WebRTC
Video Generation — Generate lip-synced content from audio at 100+ FPS
Edge AI — Run locally on Raspberry Pi, Mac Mini, Chromebook, or any edge device
Digital Twins — Photorealistic replicas for customer service, education, or entertainment

Examples

Example	Description
`example.py`	Async runtime with live video + audio playback
`example_sync.py`	Synchronous runtime with threading
`livekit_agent/`	LiveKit Agent with OpenAI Realtime voice
`livekit_webrtc/`	WebRTC streaming server

Troubleshooting

macOS: Duplicate FFmpeg library warnings

objc: Class AVFFrameReceiver is implemented in both .../cv2/.dylibs/libavdevice...
and .../av/.dylibs/libavdevice...

This happens when opencv-python (full) is installed alongside av (PyAV) — both bundle FFmpeg dylibs. Fix by switching to the headless variant:

pip install opencv-python-headless

This replaces opencv-python and removes the duplicate dylibs. The bithuman package already depends on opencv-python-headless, so this only occurs when another package has pulled in the full opencv-python.

Model conversion fails with TypeError

If you see TypeError: an integer is required during conversion, upgrade to the latest version:

pip install bithuman --upgrade

This was fixed in v1.6.2. The issue affected models in legacy TAR format during auto-conversion.

Getting a bitHuman Model

To create your own avatar model (.imx file):

Visit bithuman.ai
Register and subscribe
Upload a photo or video to create your avatar
Download your .imx model file

License

Commercial license required. See bithuman.ai for pricing.

Keywords

FAQs

What is bithuman?

Is bithuman well maintained?

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

bithuman

bitHuman Avatar Runtime

Installation

Quick Start

Generate a lip-synced video

Stream a live avatar to your browser

Python API (async)

Python API (sync)

How It Works

Performance

Features

API Reference

`AsyncBithuman` / `Bithuman`

Data Classes

Audio Utilities

Exceptions

LiveKit Agent Integration

Optimize Your Models

CLI Reference

Configuration

Environment Variables

Runtime Settings

Use Cases

Examples

Troubleshooting

macOS: Duplicate FFmpeg library warnings

Model conversion fails with TypeError

Getting a bitHuman Model

Links

License

Keywords

Related posts

bithuman

bitHuman Avatar Runtime

Installation

Quick Start

Generate a lip-synced video

Stream a live avatar to your browser

Python API (async)

Python API (sync)

How It Works

Performance

Features

API Reference

AsyncBithuman / Bithuman

Data Classes

Audio Utilities

Exceptions

LiveKit Agent Integration

Optimize Your Models

CLI Reference

Configuration

Environment Variables

Runtime Settings

Use Cases

Examples

Troubleshooting

macOS: Duplicate FFmpeg library warnings

Model conversion fails with TypeError

Getting a bitHuman Model

Links

License

Keywords

Related posts

Attackers Are Impersonating a Linux Foundation Leader in Slack to Target Open Source Developers

North Korea’s Contagious Interview Campaign Spreads Across 5 Ecosystems, Delivering Staged RAT Payloads

`AsyncBithuman` / `Bithuman`