achatbot
achatbot factory, create chat bots with llm(tools), asr, tts, vad, ocr, detect object etc..
:evergreen_tree: Project Structure
Project Structure
Β
:herb: Features
Features
demo
podcast AI PodcastοΌhttps://podcast-997.pages.dev/ :)
# need GOOGLE_API_KEY in environment variables
# default use language English
# websit
python -m demo.content_parser_tts instruct-content-tts \
"https://en.wikipedia.org/wiki/Large_language_model"
python -m demo.content_parser_tts instruct-content-tts \
--role-tts-voices zh-CN-YunjianNeural \
--role-tts-voices zh-CN-XiaoxiaoNeural \
--language zh \
"https://en.wikipedia.org/wiki/Large_language_model"
# pdf
# https://web.stanford.edu/~jurafsky/slp3/ed3bookaug20_2024.pdf 600 page is ok~ :)
python -m demo.content_parser_tts instruct-content-tts \
"/Users/wuyong/Desktop/Speech and Language Processing.pdf"
python -m demo.content_parser_tts instruct-content-tts \
--role-tts-voices zh-CN-YunjianNeural \
--role-tts-voices zh-CN-XiaoxiaoNeural \
--language zh \
"/Users/wuyong/Desktop/Speech and Language Processing.pdf"
cmd chat bots:
support transport connector:
chat bot processors:
aggreators(llm use, assistant message),
ai_frameworks
realtime voice inference(RTVI),
transport:
ai processor: llm, tts, asr etc..
core module:
local llm:
llama-cpp (support text,vision with function-call model)
fastdeploy:
tensorrt_llm:
sglang:
vllm:
transformers(manual, pipeline) (support text; vision,vision+image; speech,voice; vision+voice)
llm_transformers_manual_vision_llama
llm_transformers_manual_vision_molmo
llm_transformers_manual_vision_qwen
llm_transformers_manual_vision_deepseek
llm_transformers_manual_vision_janus_flow
llm_transformers_manual_vision_janus
llm_transformers_manual_vision_smolvlm
llm_transformers_manual_vision_gemma
llm_transformers_manual_vision_fastvlm
llm_transformers_manual_vision_kimi
llm_transformers_manual_vision_mimo
llm_transformers_manual_vision_keye
llm_transformers_manual_vision_glm4v
llm_transformers_manual_vision_skyworkr1v
llm_transformers_manual_image_janus_flow
llm_transformers_manual_image_janus
llm_transformers_manual_speech_llasa
llm_transformers_manual_speech_step
llm_transformers_manual_voice_glm
llm_transformers_manual_vision_voice_minicpmo, llm_transformers_manual_voice_minicpmo,llm_transformers_manual_audio_minicpmo,llm_transformers_manual_text_speech_minicpmo,llm_transformers_manual_instruct_speech_minicpmo,llm_transformers_manual_vision_minicpmo
llm_transformers_manual_qwen2_5omni, llm_transformers_manual_qwen2_5omni_audio_asr,llm_transformers_manual_qwen2_5omni_vision,llm_transformers_manual_qwen2_5omni_speech,llm_transformers_manual_qwen2_5omni_vision_voice,llm_transformers_manual_qwen2_5omni_text_voice,llm_transformers_manual_qwen2_5omni_audio_voice
llm_transformers_manual_kimi_voice,llm_transformers_manual_kimi_audio_asr,llm_transformers_manual_kimi_text_voice
llm_transformers_manual_vita_text llm_transformers_manual_vita_audio_asr llm_transformers_manual_vita_tts llm_transformers_manual_vita_text_voice llm_transformers_manual_vita_voice
llm_transformers_manual_phi4_vision_speech,llm_transformers_manual_phi4_audio_asr,llm_transformers_manual_phi4_audio_translation,llm_transformers_manual_phi4_vision,llm_transformers_manual_phi4_audio_chat
llm_transformers_manual_vision_speech_gemma3n,llm_transformers_manual_vision_gemma3n,llm_transformers_manual_gemma3n_audio_asr,llm_transformers_manual_gemma3n_audio_translation
llm_transformers_manual_voice_step2
remote api llm: personal-ai(like openai api, other ai provider)
AI modules:
functions:
speech:
vision
gen modules config(*.yaml, local/test/prod) from env with file: .env
u also use HfArgumentParser this module's args to local cmd parse args
deploy to cloud βοΈ serverless:
:sunflower: Service Deployment Architecture
Service Deployment Architecture
UI (easy to deploy with github like pages)
Server Deploy (CD)
Install
[!NOTE]
python --version
>=3.10 with asyncio-task
if install achatbot[tts_openvoicev2]
need install melo-tts pip install git+https://github.com/myshell-ai/MeloTTS.git
if some other nested loop code with achatbot lib, you need to add the following code: (PS: cmd/bots/base.py had done)
import nest_asyncio
nest_asyncio.apply()
[!TIP]
use uv + pip to run, install the required dependencies fastly, e.g.:
uv pip install achatbot
uv pip install "achatbot[fastapi_bot_server]"
pypi
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
pip install achatbot
pip install "achatbot[fastapi_bot_server]"
local
git clone --recursive https://github.com/ai-bot-pro/chat-bot.git
cd chat-bot
python3 -m venv .venv_achatbot
source .venv_achatbot/bin/activate
bash scripts/pypi_achatbot.sh dev
pip install "dist/achatbot-{$version }-py3-none-any.whl[fastapi_bot_server]"
run local lite avatar chat bot
# install dependencies (replace $version ) (if use cpu(default) install lite_avatar)
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server,livekit,livekit-api,daily,agora,silero_vad_analyzer,sense_voice_asr,openai_llm_processor,google_llm_processor,litellm_processor,together_ai,tts_edge,lite_avatar]"
# install dependencies (replace $version ) (if use gpu(cuda) install lite_avatar_gpu)
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server,livekit,livekit-api,daily,agora,silero_vad_analyzer,sense_voice_asr,openai_llm_processor,google_llm_processor,litellm_processor,together_ai,tts_edge,lite_avatar_gpu]"
# download model weights
huggingface-cli download weege007/liteavatar --local-dir ./models/weege007/liteavatar
huggingface-cli download FunAudioLLM/SenseVoiceSmall --local-dir ./models/FunAudioLLM/SenseVoiceSmall
# run local lite-avatar chat bot
python -m src.cmd.bots.main -f config/bots/daily_liteavatar_echo_bot.json
python -m src.cmd.bots.main -f config/bots/daily_liteavatar_chat_bot.json
More details: https://github.com/ai-bot-pro/achatbot/pull/161
run local lam_audio2expression avatar chat bot
# install dependencies (replace $version )
pip install "dist/achatbot-{$version}-py3-none-any.whl[fastapi_bot_server,silero_vad_analyzer,sense_voice_asr,openai_llm_processor,google_llm_processor,litellm_processor,together_ai,tts_edge,lam_audio2expression_avatar]"
pip install spleeter==2.4.2
pip install typing_extensions==4.14.0 aiortc==1.13.0 transformers==4.36.2 protobuf==5.29.4
# download model weights
wget https://virutalbuy-public.oss-cn-hangzhou.aliyuncs.com/share/aigc3d/data/LAM/LAM_audio2exp_streaming.tar -P ./models/LAM_audio2exp/
tar -xzvf ./models/LAM_audio2exp/LAM_audio2exp_streaming.tar -C ./models/LAM_audio2exp && rm ./models/LAM_audio2exp/LAM_audio2exp_streaming.tar
git clone --depth 1 https://www.modelscope.cn/AI-ModelScope/wav2vec2-base-960h.git ./models/facebook/wav2vec2-base-960h
huggingface-cli download FunAudioLLM/SenseVoiceSmall --local-dir ./models/FunAudioLLM/SenseVoiceSmall
# run http signaling service + webrtc + websocket local lam_audio2expression-avatar chat bot
python -m src.cmd.webrtc_websocket.fastapi_ws_signaling_bot_serve -f config/bots/small_webrtc_fastapi_websocket_avatar_echo_bot.json
python -m src.cmd.webrtc_websocket.fastapi_ws_signaling_bot_serve -f config/bots/small_webrtc_fastapi_websocket_avatar_chat_bot.json
# run http signaling service + webrtc + websocket voice avatar agent web ui
cd ui/webrtc_websocket/lam_audio2expression_avatar_ts && npm install && npm run dev
# run websocket signaling service + webrtc + websocket local lam_audio2expression-avatar chat bot
python -m src.cmd.webrtc_websocket.fastapi_ws_signaling_bot_serve_v2 -f config/bots/small_webrtc_fastapi_websocket_avatar_echo_bot.json
python -m src.cmd.webrtc_websocket.fastapi_ws_signaling_bot_serve_v2 -f config/bots/small_webrtc_fastapi_websocket_avatar_chat_bot.json
# run websocket signaling service + webrtc + websocket voice avatar agent web ui
cd ui/webrtc_websocket/lam_audio2expression_avatar_ts_v2 && npm install && npm run dev
More details: https://github.com/ai-bot-pro/achatbot/pull/164 | online lam_audio2expression avatar: https://avatar-2lm.pages.dev/
HTTP signaling service + webrtc + websocket transports I/O bridge:
Websocket signaling service + webrtc + websocket transports I/O bridge:
Websocket signaling service + websocket + webrtc-queue transports I/O bridge:
Local/Global Scheduler + webrtc-queue bots :
Run chat bots
:memo: Run chat bots with colab notebook
daily_bot livekit_bot agora_bot e.g.: agora_channel_audio_stream| daily_room_audio_stream | livekit_room_audio_stream, sense_voice_asr, groq | together api llm(text), tts_edge CPU (free, 2 cores) e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> groq | together (llm) -> edge (tts) -> daily | livekit room out stream generate_audio2audio remote_queue_chat_bot_be_worker T4(free) e.g.: pyaudio in stream -> silero (vad) -> sense_voice (asr) -> qwen (llm) -> cosy_voice (tts) -> pyaudio out stream daily_describe_vision_tools_bot livekit_describe_vision_tools_bot agora_describe_vision_tools_bot e.g.: daily_room_audio_stream |livekit_room_audio_stream deepgram_asr, goole_gemini, tts_edge CPU(free, 2 cores) e.g.: daily |livekit room in stream -> silero (vad) -> deepgram (asr) -> google gemini -> edge (tts) -> daily |livekit room out stream daily_describe_vision_bot livekit_describe_vision_bot agora_describe_vision_bot e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, llm_transformers_manual_vision_qwen, tts_edge achatbot_vision_qwen_vl.ipynb: achatbot_vision_janus.ipynb: achatbot_vision_minicpmo.ipynb: achatbot_kimivl.ipynb: achatbot_phi4_multimodal.ipynb: - Qwen2-VL-2B-Instruct T4(free) - Qwen2-VL-7B-Instruct L4 - Llama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> qwen-vl (llm) -> edge (tts) -> daily | livekit room out stream daily_chat_vision_bot livekit_chat_vision_bot agora_chat_vision_bot e.g.: daily_room_audio_stream |livekit_room_audio_stream sense_voice_asr, llm_transformers_manual_vision_qwen, tts_edge - Qwen2-VL-2B-Instruct T4(free) - Qwen2-VL-7B-Instruct L4 - Ll ama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) -> llm answer guide qwen-vl (llm) -> edge (tts) -> daily | livekit room out stream daily_chat_tools_vision_bot livekit_chat_tools_vision_bot agora_chat_tools_vision_bot e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, groq api llm(text), tools: - llm_transformers_manual_vision_qwen, tts_edge - Qwen2-VL-2B-Instruct<br /> T4(free) - Qwen2-VL-7B-Instruct L4 - Llama-3.2-11B-Vision-Instruct L4 - allenai/Molmo-7B-D-0924 A100 e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) ->llm with tools qwen-vl -> edge (tts) -> daily | livekit room out stream daily_annotate_vision_bot livekit_annotate_vision_bot agora_annotate_vision_bot e.g.: daily_room_audio_stream | livekit_room_audio_stream vision_yolo_detector tts_edge T4(free) e.g.: daily | livekit room in stream vision_yolo_detector -> edge (tts) -> daily | livekit room out stream daily_detect_vision_bot livekit_detect_vision_bot agora_detect_vision_bot e.g.: daily_room_audio_stream | livekit_room_audio_stream vision_yolo_detector tts_edge T4(free) e.g.: daily | livekit room in stream vision_yolo_detector -> edge (tts) -> daily | livekit room out stream daily_ocr_vision_bot livekit_ocr_vision_bot agora_ocr_vision_bot e.g.: daily_room_audio_stream | livekit_room_audio_stream sense_voice_asr, vision_transformers_got_ocr tts_edge T4(free) e.g.: daily | livekit room in stream -> silero (vad) -> sense_voice (asr) vision_transformers_got_ocr -> edge (tts) -> daily | livekit room out stream daily_month_narration_bot e.g.: daily_room_audio_stream groq |together api llm(text), hf_sd, together api (image) tts_edge when use sd model with diffusers T4(free) cpu+cuda (slow) L4 cpu+cuda A100 all cuda e.g.: daily room in stream -> together (llm) -> hf sd gen image model -> edge (tts) -> daily room out stream daily_storytelling_bot e.g.: daily_room_audio_stream groq |together api llm(text), hf_sd, together api (image) tts_edge cpu (2 cores) when use sd model with diffusers T4(free) cpu+cuda (slow) L4 cpu+cuda A100 all cuda e.g.: daily room in stream -> together (llm) -> hf sd gen image model -> edge (tts) -> daily room out stream websocket_server_bot fastapi_websocket_server_bot e.g.: websocket_server sense_voice_asr, groq |together api llm(text), tts_edge cpu(2 cores) e.g.: websocket protocol in stream -> silero (vad) -> sense_voice (asr) -> together (llm) -> edge (tts) -> websocket protocol out stream daily_natural_conversation_bot e.g.: daily_room_audio_stream sense_voice_asr, groq |together api llm(NLP task), gemini-1.5-flash (chat) tts_edge cpu(2 cores) e.g.: daily room in stream -> together (llm NLP task) -> gemini-1.5-flash model (chat) -> edge (tts) -> daily room out stream fastapi_websocket_moshi_bot e.g.: websocket_server moshi opus stream voice llm L4/A100 websocket protocol in stream -> silero (vad) -> moshi opus stream voice llm -> websocket protocol out stream daily_asr_glm_voice_bot daily_glm_voice_bot e.g.: daily_room_audio_stream glm voice llm T4/L4/A100 e.g.: daily room in stream ->glm4-voice -> daily room out stream daily_freeze_omni_voice_bot e.g.: daily_room_audio_stream freezeOmni voice llm L4/A100 e.g.: daily room in stream ->freezeOmni-voice -> daily room out stream daily_asr_minicpmo_voice_bot daily_minicpmo_voice_bot daily_minicpmo_vision_voice_bot e.g.: daily_room_audio_stream minicpmo llm T4: MiniCPM-o-2_6-int4 L4/A100: MiniCPM-o-2_6 e.g.: daily room in stream ->minicpmo -> daily room out stream livekit_asr_qwen2_5omni_voice_bot livekit_qwen2_5omni_voice_bot livekit_qwen2_5omni_vision_voice_bot e.g.: livekit_room_audio_stream qwen2.5omni llm A100 e.g.: livekit room in stream ->qwen2.5omni -> livekit room out stream livekit_asr_kimi_voice_bot livekit_kimi_voice_bot e.g.: livekit_room_audio_stream kimi audio llm A100 e.g.: livekit room in stream -> Kimi-Audio -> livekit room out stream livekit_asr_vita_voice_bot livekit_vita_voice_bot e.g.: livekit_room_audio_stream vita audio llm L4/100 e.g.: livekit room in stream -> VITA-Audio -> livekit room out stream daily_phi4_voice_bot daily_phi4_vision_speech_bot e.g.: daily_room_audio_stream phi4-multimodal llm L4/100 e.g.: daily room in stream -> phi4-multimodal -> edge (tts) -> daily room out stream daliy_multi_mcp_bot livekit_multi_mcp_bot agora_multi_mcp_bot e.g.: agora_channel_audio_stream |daily_room_audio_stream |livekit_room_audio_stream, sense_voice_asr, groq |together api llm(text), mcp tts_edge CPU (free, 2 cores) e.g.: agora | daily |livekit room in stream -> silero (vad) -> sense_voice (asr) -> groq |together (llm) -> mcp server tools -> edge (tts) -> daily |livekit room out stream daily_liteavatar_chat_bot daily_liteavatar_echo_bot livekit_musetalk_chat_bot livekit_musetalk_echo_bot e.g.: agora_channel_audio_stream |daily_room_audio_stream |livekit_room_audio_stream, sense_voice_asr, groq |together api llm(text), tts_edge avatar achatbot_avatar_musetalk.ipynb: CPU/T4/L4 e.g.: agora |daily |livekit room in stream -> silero (vad) -> sense_voice (asr) -> groq |together (llm) -> edge (tts) -> avatar -> daily |livekit room out stream
:new_moon: Run local chat bots
Run local chat bots
[!NOTE]
run pip install "achatbot[local_terminal_chat_bot]"
to install dependencies to run local terminal chat bot;
create achatbot data dir in $HOME
dir mkdir -p ~/.achatbot/{log,config,models,records,videos}
;
cp .env.example .env
, and check .env
, add key/value env params;
select a model ckpt to download:
vad model ckpt (default vad ckpt model use silero vad )
# vad pyannote segmentation ckpt
huggingface-cli download pyannote/segmentation-3.0 --local-dir ~/.achatbot/models/pyannote/segmentation-3.0 --local-dir-use-symlinks False
asr model ckpt (default whipser ckpt model use base size)
# asr openai whisper ckpt
wget https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt -O ~/.achatbot/models/base.pt
# asr hf openai whisper ckpt for transformers pipeline to load
huggingface-cli download openai/whisper-base --local-dir ~/.achatbot/models/openai/whisper-base --local-dir-use-symlinks False
# asr hf faster whisper (CTranslate2)
huggingface-cli download Systran/faster-whisper-base --local-dir ~/.achatbot/models/Systran/faster-whisper-base --local-dir-use-symlinks False
# asr SenseVoice ckpt
huggingface-cli download FunAudioLLM/SenseVoiceSmall --local-dir ~/.achatbot/models/FunAudioLLM/SenseVoiceSmall --local-dir-use-symlinks False
llm model ckpt (default llamacpp ckpt(ggml) model use qwen-2 instruct 1.5B size)
# llm llamacpp Qwen2-Instruct
huggingface-cli download Qwen/Qwen2-1.5B-Instruct-GGUF qwen2-1_5b-instruct-q8_0.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
# llm llamacpp Qwen1.5-chat
huggingface-cli download Qwen/Qwen1.5-7B-Chat-GGUF qwen1_5-7b-chat-q8_0.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
# llm llamacpp phi-3-mini-4k-instruct
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-gguf Phi-3-mini-4k-instruct-q4.gguf --local-dir ~/.achatbot/models --local-dir-use-symlinks False
tts model ckpt (default whipser ckpt model use base size)
# tts chatTTS
huggingface-cli download 2Noise/ChatTTS --local-dir ~/.achatbot/models/2Noise/ChatTTS --local-dir-use-symlinks False
# tts coquiTTS
huggingface-cli download coqui/XTTS-v2 --local-dir ~/.achatbot/models/coqui/XTTS-v2 --local-dir-use-symlinks False
# tts cosy voice
git lfs install
git clone https://www.modelscope.cn/iic/CosyVoice-300M.git ~/.achatbot/models/CosyVoice-300M
git clone https://www.modelscope.cn/iic/CosyVoice-300M-SFT.git ~/.achatbot/models/CosyVoice-300M-SFT
git clone https://www.modelscope.cn/iic/CosyVoice-300M-Instruct.git ~/.achatbot/models/CosyVoice-300M-Instruct
#git clone https://www.modelscope.cn/iic/CosyVoice-ttsfrd.git ~/.achatbot/models/CosyVoice-ttsfrd
run local terminal chat bot with env; e.g.
use dufault env params to run local chat bot
ACHATBOT_PKG=1 TQDM_DISABLE=True \
python -m achatbot.cmd.local-terminal-chat.generate_audio2audio > ~/.achatbot/log/std_out.log
:waxing_crescent_moon: Run remote http fastapi daily chat bots
Run remote http fastapi daily chat bots
run pip install "achatbot[fastapi_daily_bot_server]"
to install dependencies to run http fastapi daily chat bot;
run below cmd to start http server, see api docs: http://0.0.0.0:4321/docs
ACHATBOT_PKG=1 python -m achatbot.cmd.http.server.fastapi_daily_bot_serve
run chat bot processor, e.g.
run a daily langchain rag bot api, with ui/educator-client
[!NOTE]
need process youtube audio save to local file with pytube
, run pip install "achatbot[pytube,deep_translator]"
to install dependencies
and transcribe/translate to text, then chunks to vector store, and run langchain rag bot api;
run data process:
ACHATBOT_PKG=1 python -m achatbot.cmd.bots.rag.data_process.youtube_audio_transcribe_to_tidb
or download processed data from hf dataset weege007/youtube_videos , then chunks to vector store .
curl -XPOST "http://0.0.0.0:4321/bot_join/chat-bot/DailyLangchainRAGBot" \
-H "Content-Type: application/json" \
-d $'{"config":{"llm":{"model":"llama-3.1-70b-versatile","messages":[{"role":"system","content":""}],"language":"zh"},"tts":{"tag":"cartesia_tts_processor","args":{"voice_id":"eda5bbff-1ff1-4886-8ef1-4e69a77640a0","language":"zh"}},"asr":{"tag":"deepgram_asr_processor","args":{"language":"zh","model":"nova-2"}}}}' | jq .
run a simple daily chat bot api, with ui/web-client-ui (default language: zh)
curl -XPOST "http://0.0.0.0:4321/bot_join/DailyBot" \
-H "Content-Type: application/json" \
-d '{}' | jq .
:first_quarter_moon: Run remote rpc chat bot worker
Run remote rpc chat bot worker
run pip install "achatbot[remote_rpc_chat_bot_be_worker]"
to install dependencies to run rpc chat bot BE worker; e.g. :
use dufault env params to run rpc chat bot BE worker
ACHATBOT_PKG=1 RUN_OP=be TQDM_DISABLE=True \
TTS_TAG=tts_edge \
python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
run pip install "achatbot[remote_rpc_chat_bot_fe]"
to install dependencies to run rpc chat bot FE;
ACHATBOT_PKG=1 RUN_OP=fe \
TTS_TAG=tts_edge \
python -m achatbot.cmd.grpc.terminal-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
:waxing_gibbous_moon: Run remote queue chat bot worker
Run remote queue chat bot worker
run pip install "achatbot[remote_queue_chat_bot_be_worker]"
to install dependencies to run queue chat bot worker; e.g.:
use default env params to run
ACHATBOT_PKG=1 REDIS_PASSWORD=$redis_pwd RUN_OP=be TQDM_DISABLE=True \
python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
sense_voice(asr) -> qwen (llm) -> cosy_voice (tts)
u can login redislabs create 30M free databases; set REDIS_HOST
,REDIS_PORT
and REDIS_PASSWORD
to run, e.g.:
ACHATBOT_PKG=1 RUN_OP=be \
TQDM_DISABLE=True \
REDIS_PASSWORD=$redis_pwd \
REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
REDIS_PORT=14241 \
ASR_TAG=sense_voice_asr \
ASR_LANG=zn \
ASR_MODEL_NAME_OR_PATH=~/.achatbot/models/FunAudioLLM/SenseVoiceSmall \
N_GPU_LAYERS=33 FLASH_ATTN=1 \
LLM_MODEL_NAME=qwen \
LLM_MODEL_PATH=~/.achatbot/models/qwen1_5-7b-chat-q8_0.gguf \
TTS_TAG=tts_cosy_voice \
python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/be_std_out.log
run pip install "achatbot[remote_queue_chat_bot_fe]"
to install the required packages to run quueue chat bot frontend; e.g.:
use default env params to run (default vad_recorder)
ACHATBOT_PKG=1 RUN_OP=fe \
REDIS_PASSWORD=$redis_pwd \
REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
REDIS_PORT=14241 \
python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
ACHATBOT_PKG=1 RUN_OP=fe \
REDIS_PASSWORD=$redis_pwd \
REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
REDIS_PORT=14241 \
RECORDER_TAG=wakeword_rms_recorder \
python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
default pyaudio player stream with tts tag out sample info(rate,channels..), e.g.: (be use tts_cosy_voice out stream info)
ACHATBOT_PKG=1 RUN_OP=fe \
REDIS_PASSWORD=$redis_pwd \
REDIS_HOST=redis-14241.c256.us-east-1-2.ec2.redns.redis-cloud.com \
REDIS_PORT=14241 \
RUN_OP=fe \
TTS_TAG=tts_cosy_voice \
python -m achatbot.cmd.remote-queue-chat.generate_audio2audio > ~/.achatbot/log/fe_std_out.log
remote_queue_chat_bot_be_worker in colab examples :
sense_voice(asr) -> qwen (llm) -> cosy_voice (tts)
:full_moon: Run remote grpc tts speaker bot
Run remote grpc tts speaker bot
run pip install "achatbot[remote_grpc_tts_server]"
to install dependencies to run grpc tts speaker bot server;
ACHATBOT_PKG=1 python -m achatbot.cmd.grpc.speaker.server.serve
run pip install "achatbot[remote_grpc_tts_client]"
to install dependencies to run grpc tts speaker bot client;
ACHATBOT_PKG=1 TTS_TAG=tts_edge IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_g IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_coqui IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_chat IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_fishspeech IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_f5 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_openvoicev2 IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_kokoro IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_onnx_kokoro IS_RELOAD=1 KOKORO_ESPEAK_NG_LIB_PATH=/usr/local/lib/libespeak-ng.1.dylib KOKORO_LANGUAGE=cmn python -m src.cmd.grpc.speaker.client
ACHATBOT_PKG=1 TTS_TAG=tts_cosy_voice2 \
COSY_VOICE_MODELS_DIR=./models/FunAudioLLM/CosyVoice2-0.5B \
COSY_VOICE_REFERENCE_AUDIO_PATH=./test/audio_files/asr_example_zh.wav \
IS_RELOAD=1 python -m src.cmd.grpc.speaker.client
:video_camera: Multimodal Interaction
Multimodal Interaction
audio (voice)
vision (CV)
stream-ocr (realtime-object-detection)
more
Embodied Intelligence: Robots that touch the world, perceive and move
License
achatbot is released under the BSD 3 license . (Additional code in this distribution is covered by the MIT and Apache Open Source
licenses.) However you may have other legal obligations that govern your use of content, such as the terms of service for third-party models.