
Security News
Feross on TBPN: How North Korea Hijacked Axios
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.
bithuman
Advanced tools
Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream live avatars to browsers. 1-2 CPU cores, <200ms latency. ARM, x86, macOS.

Real-time avatar engine for visual AI agents, digital humans, and creative characters.
bitHuman powers visual AI agents and conversational AI with photorealistic avatars and real-time lip-sync. Build voice agents with faces, video chatbots, AI assistants, and interactive digital humans — all running on edge devices with just 1-2 CPU cores and <200ms latency. Raw generation speed is 100+ FPS on CPU alone, enabling real-time streaming applications.
pip install bithuman --upgrade
Pre-built wheels for all major platforms — no compilation required:
| Linux | macOS | Windows | |
|---|---|---|---|
| x86_64 | yes | yes | yes |
| ARM64 | yes | yes (Apple Silicon) | — |
| Python | 3.9 — 3.14 | 3.9 — 3.14 | 3.9 — 3.14 |
For LiveKit agent integration:
pip install bithuman[agent]
bithuman generate avatar.imx --audio speech.wav --key YOUR_API_KEY
# Terminal 1: Start the streaming server
bithuman stream avatar.imx --key YOUR_API_KEY
# Terminal 2: Send audio to trigger lip-sync
bithuman speak speech.wav
Open http://localhost:3001 to see the avatar streaming live.
import asyncio
from bithuman import AsyncBithuman
from bithuman.audio import load_audio, float32_to_int16
async def main():
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret="YOUR_API_KEY",
)
await runtime.start()
# Load and stream audio
audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)
async def stream_audio():
chunk_size = sr // 25 # match video FPS
for i in range(0, len(audio_int16), chunk_size):
await runtime.push_audio(
audio_int16[i:i + chunk_size].tobytes(), sr
)
await runtime.flush()
asyncio.create_task(stream_audio())
# Receive lip-synced video frames
async for frame in runtime.run():
if frame.has_image:
image = frame.bgr_image # numpy (H, W, 3), uint8
audio = frame.audio_chunk # synchronized audio
if frame.end_of_speech:
break
await runtime.stop()
asyncio.run(main())
from bithuman import Bithuman
from bithuman.audio import load_audio, float32_to_int16
runtime = Bithuman.create(model_path="avatar.imx", api_secret="YOUR_API_KEY")
audio, sr = load_audio("speech.wav")
audio_int16 = float32_to_int16(audio)
chunk_size = sr // 100
for i in range(0, len(audio_int16), chunk_size):
runtime.push_audio(audio_int16[i:i+chunk_size].tobytes(), sr)
runtime.flush()
for frame in runtime.run():
if frame.has_image:
image = frame.bgr_image
if frame.end_of_speech:
break
.imx file contains the avatar's appearance, animations, and lip-sync datapush_audio(), call flush() when doneruntime.run() to receive lip-synced video frames with synchronized audioThe runtime handles the full motion graph internally: idle animations, talking with lip-sync, head movements, blinking, and smooth transitions between states.
| Metric | Value |
|---|---|
| Raw FPS | 100+ on CPU (Intel i5-12400, Apple M2) |
| CPU cores | 1-2 cores at 25 FPS |
| End-to-end latency | <200ms |
| Memory (IMX v2) | ~200 MB per session |
| Model load time | <10ms (IMX v2) |
| Audio formats | WAV, MP3, FLAC, OGG, M4A |
Bithuman for threads, AsyncBithuman for async/awaitAsyncBithuman / BithumanThe main runtime for avatar animation.
# Create and initialize
runtime = await AsyncBithuman.create(
model_path="avatar.imx", # Path to .imx model
api_secret="API_KEY", # API secret (recommended)
# token="JWT_TOKEN", # Or JWT token directly
)
await runtime.start()
# Push audio (int16 PCM, any sample rate — auto-resampled to 16kHz)
await runtime.push_audio(audio_bytes, sample_rate)
await runtime.flush() # Signal end of speech
runtime.interrupt() # Cancel current playback
# Receive frames
async for frame in runtime.run():
frame.bgr_image # np.ndarray (H, W, 3) uint8 BGR
frame.rgb_image # np.ndarray (H, W, 3) uint8 RGB
frame.audio_chunk # AudioChunk — synchronized audio
frame.end_of_speech # True when all audio processed
frame.has_image # True if image available
frame.frame_index # Frame number
frame.source_message_id # Correlates to input
# Controls
await runtime.push(VideoControl(action="wave")) # Trigger action
await runtime.push(VideoControl(target_video="idle")) # Switch state
runtime.set_muted(True) # Mute processing
# Info
runtime.get_frame_size() # (width, height)
runtime.get_first_frame() # First idle frame as np.ndarray
runtime.get_expiration_time() # Token expiry (unix timestamp)
runtime.is_token_validated() # Auth status
await runtime.stop()
from bithuman import AudioChunk, VideoControl, VideoFrame, Emotion, EmotionPrediction
# AudioChunk — container for audio data
chunk = AudioChunk(data=np.array([...], dtype=np.int16), sample_rate=16000)
chunk.duration # float — length in seconds
chunk.bytes # bytes — raw PCM bytes
# VideoControl — input to the runtime
ctrl = VideoControl(
audio=chunk, # Audio to lip-sync
action="wave", # Trigger action (wave, nod, etc.)
target_video="talking", # Switch video state
end_of_speech=True, # Mark end of speech
force_action=False, # Override action deduplication
emotion_preds=[ # Set emotion state
EmotionPrediction(emotion=Emotion.JOY, score=0.9),
],
)
# VideoFrame — output from runtime.run()
frame.bgr_image # np.ndarray (H, W, 3) uint8 — BGR
frame.rgb_image # np.ndarray (H, W, 3) uint8 — RGB
frame.audio_chunk # AudioChunk — synchronized audio
frame.end_of_speech # bool — True when done
frame.has_image # bool — True if image available
frame.frame_index # int — frame number
frame.source_message_id # Hashable — correlates to VideoControl
# Emotion enum
Emotion.ANGER | Emotion.DISGUST | Emotion.FEAR | Emotion.JOY
Emotion.NEUTRAL | Emotion.SADNESS | Emotion.SURPRISE
from bithuman.audio import (
load_audio, # Load WAV/MP3/FLAC/OGG/M4A -> (float32, sr)
float32_to_int16, # float32 -> int16
int16_to_float32, # int16 -> float32
resample, # Resample to target rate
write_video_with_audio, # Save MP4 with audio track
AudioStreamBatcher, # Real-time audio buffer
)
audio, sr = load_audio("speech.mp3") # Any format
audio_int16 = float32_to_int16(audio) # Ready for push_audio
audio_16k = resample(audio, sr, 16000) # Resample
write_video_with_audio("out.mp4", frames, audio, sr, fps=25)
All exceptions inherit from BithumanError:
| Exception | When |
|---|---|
TokenExpiredError | JWT has expired |
TokenValidationError | Invalid signature or claims |
TokenRequestError | Auth server unreachable |
AccountStatusError | Billing or access issue (HTTP 402/403) |
ModelNotFoundError | Model file doesn't exist |
ModelLoadError | Corrupt or incompatible model |
ModelSecurityError | Security restriction triggered |
RuntimeNotReadyError | Operation called before initialization |
Build conversational AI agents with avatar faces using LiveKit Agents:
from bithuman import AsyncBithuman
from bithuman.utils.agent import LocalAvatarRunner, LocalVideoPlayer, LocalAudioIO
# Initialize bitHuman runtime
runtime = await AsyncBithuman.create(
model_path="avatar.imx",
api_secret="YOUR_API_KEY",
)
# Connect to LiveKit agent session
avatar = LocalAvatarRunner(
bithuman_runtime=runtime,
audio_input=session.audio,
audio_output=LocalAudioIO(session, agent_output),
video_output=LocalVideoPlayer(window_size=(1280, 720)),
)
await avatar.start()
See examples/livekit_agent/ for a complete working example with OpenAI Realtime voice.
Convert existing .imx models to IMX v2 for dramatically better performance:
bithuman convert avatar.imx
| Metric | Legacy (TAR) | IMX v2 | Improvement |
|---|---|---|---|
| Model size | 100 MB | 50-70 MB | 30-50% smaller |
| Load time | ~10s | <10ms | 1000x faster |
| Runtime speed | ~30 FPS | 100+ FPS | 3-10x faster |
| Peak memory | ~10 GB | ~200 MB | 98% less |
Conversion is automatic on first load, but pre-converting saves startup time.
| Command | Description |
|---|---|
bithuman generate <model> --audio <file> | Generate lip-synced MP4 from model + audio |
bithuman stream <model> | Start live streaming server at localhost:3001 |
bithuman speak <audio> | Send audio to running stream server |
bithuman action <name> | Trigger avatar action (wave, nod, etc.) |
bithuman info <model> | Show model metadata |
bithuman list-videos <model> | List all videos in a model |
bithuman convert <model> | Convert legacy to optimized IMX v2 |
bithuman validate <path> | Validate model files load correctly |
| Variable | Description |
|---|---|
BITHUMAN_API_SECRET | API secret for authentication |
BITHUMAN_RUNTIME_TOKEN | JWT token (alternative to API secret) |
BITHUMAN_VERBOSE | Enable debug logging |
CONVERT_THREADS | Number of threads for model conversion (0 or unset = auto-detect) |
| Setting | Default | Description |
|---|---|---|
FPS | 25 | Target frames per second |
OUTPUT_WIDTH | 1280 | Output frame width (0 = native resolution) |
PRELOAD_TO_MEMORY | False | Cache model in RAM for faster decode |
PROCESS_IDLE_VIDEO | True | Run inference during silence (natural idle) |
| Example | Description |
|---|---|
example.py | Async runtime with live video + audio playback |
example_sync.py | Synchronous runtime with threading |
livekit_agent/ | LiveKit Agent with OpenAI Realtime voice |
livekit_webrtc/ | WebRTC streaming server |
objc: Class AVFFrameReceiver is implemented in both .../cv2/.dylibs/libavdevice...
and .../av/.dylibs/libavdevice...
This happens when opencv-python (full) is installed alongside av (PyAV) — both bundle FFmpeg dylibs. Fix by switching to the headless variant:
pip install opencv-python-headless
This replaces opencv-python and removes the duplicate dylibs. The bithuman package already depends on opencv-python-headless, so this only occurs when another package has pulled in the full opencv-python.
If you see TypeError: an integer is required during conversion, upgrade to the latest version:
pip install bithuman --upgrade
This was fixed in v1.6.2. The issue affected models in legacy TAR format during auto-conversion.
To create your own avatar model (.imx file):
.imx model fileCommercial license required. See bithuman.ai for pricing.
FAQs
Real-time avatar engine — 100+ FPS on CPU. Generate lip-synced video, stream live avatars to browsers. 1-2 CPU cores, <200ms latency. ARM, x86, macOS.
We found that bithuman demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 3 open source maintainers collaborating on the project.
Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Security News
Socket CEO Feross Aboukhadijeh breaks down how North Korea hijacked Axios and what it means for the future of software supply chain security.

Security News
OpenSSF has issued a high-severity advisory warning open source developers of an active Slack-based campaign using impersonation to deliver malware.

Research
/Security News
Malicious packages published to npm, PyPI, Go Modules, crates.io, and Packagist impersonate developer tooling to fetch staged malware, steal credentials and wallets, and enable remote access.