|
3 days ago | |
---|---|---|
.. | ||
docs | 4 days ago | |
main | 3 days ago | |
.dockerignore | 4 days ago | |
.gitignore | 4 days ago | |
.python-version | 4 days ago | |
Dockerfile-server | 4 days ago | |
Dockerfile-web | 4 days ago | |
LICENSE | 4 days ago | |
README.md | 4 days ago | |
README_en.md | 4 days ago | |
docker-setup.sh | 4 days ago | |
main.py | 4 days ago | |
opus.dll | 4 days ago |
README_en.md
Xiaozhi Backend Service xiaozhi-esp32-server
This project provides backend services for the open-source smart hardware project
xiaozhi-esp32
Implemented using Python, Java, and Vue according to the Xiaozhi Communication Protocol
Helps you quickly set up your Xiaozhi server
中文 · FAQ · Report Issues · Deployment Guide · Release Notes
Target Users 👥
This project requires ESP32 hardware devices. If you have purchased ESP32-related hardware, successfully connected to Brother Xia's backend service, and want to set up your own xiaozhi-esp32
backend service, then this project is perfect for you.
Want to see it in action? Check out these videos 🎥
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Warning ⚠️
-
This project is open-source software. This software has no commercial relationship with any third-party API service providers (including but not limited to speech recognition, large models, speech synthesis, and other platforms) and does not provide any form of guarantee for their service quality or financial security. It is recommended that users prioritize service providers with relevant business licenses and carefully read their service agreements and privacy policies. This software does not host any account keys, does not participate in fund transfers, and does not bear the risk of recharge fund losses.
-
This project's functionality is not complete and has not passed network security testing. Please do not use it in production environments. If you deploy this project for learning in a public network environment, please ensure necessary protection measures are in place.
Deployment Documentation
This project provides two deployment methods. Please choose according to your specific needs:
🚀 Deployment Method Selection
Deployment Method | Features | Suitable Scenarios | Deployment Guide | Requirements | Video Tutorial |
---|---|---|---|---|---|
Simplified Installation | Smart dialogue, IOT functionality, data stored in configuration files | Low-configuration environment, no database needed | Docker Version / Source Code Deployment | 2 cores 4G if using FunASR , 2 cores 2G if using all APIs |
- |
Full Module Installation | Smart dialogue, IOT, OTA, Control Panel, data stored in database | Complete functionality experience | Docker Version / Source Code Deployment | 4 cores 8G if using FunASR , 2 cores 4G if using all APIs |
Local Source Code Startup Video Tutorial / Local Source Code Auto-Update Tutorial |
💡 Note: Below are the test platforms deployed with the latest code. You can flash and test if needed. Concurrent users: 6, data will be cleared daily
Control Panel Address: https://2662r3426b.vicp.fun
Service Test Tool: https://2662r3426b.vicp.fun/test/
OTA Interface Address: https://2662r3426b.vicp.fun/xiaozhi/ota/
Websocket Interface Address: wss://2662r3426b.vicp.fun/xiaozhi/v1/
🚩 Configuration Description and Recommendations
[!Note] The default configuration of this project is
Entry Level Free
settings. For better results, we recommend usingFull Streaming Configuration
.Since version
0.5.2
, this project supports full streaming throughout the entire lifecycle. Compared to versions before0.5
, response speed has improved by approximately2.5 seconds
Module Name | Entry Level Free Settings | Full Streaming Configuration |
---|---|---|
ASR(Speech Recognition) | FunASR(Local) | ✅DoubaoASR(Volcano Streaming Speech Recognition) |
LLM(Large Language Model) | ChatGLMLLM(Zhipu glm-4-flash) | ✅DoubaoLLM(Volcano doubao-1-5-pro-32k-250115) |
VLLM(Vision Large Model) | ChatGLMVLLM(Zhipu glm-4v-flash) | ✅ChatGLMVLLM(Zhipu glm-4v-flash) |
TTS(Speech Synthesis) | EdgeTTS(Microsoft Speech) | ✅HuoshanDoubleStreamTTS(Volcano Double Streaming Speech Synthesis) |
Intent(Intent Recognition) | function_call(Function Call) | ✅function_call(Function Call) |
Memory(Memory Function) | mem_local_short(Local Short-term Memory) | ✅mem_local_short(Local Short-term Memory) |
Feature List ✨
Implemented ✅
Feature Module | Description |
---|---|
Communication Protocol | Based on xiaozhi-esp32 protocol, implements data interaction through WebSocket |
Dialogue Interaction | Supports wake-up dialogue, manual dialogue, and real-time interruption. Auto-sleep after long periods of no dialogue |
Intent Recognition | Supports LLM intent recognition, function call, reducing hard-coded intent judgment |
Multi-language Recognition | Supports Mandarin, Cantonese, English, Japanese, Korean (default using FunASR) |
LLM Module | Supports flexible LLM module switching, default using ChatGLMLLM, can also use Ali Bailian, DeepSeek, Ollama, etc. |
TTS Module | Supports EdgeTTS (default), Volcano Engine Doubao TTS, and other TTS interfaces |
Memory Function | Supports ultra-long memory, local summary memory, and no memory modes |
IOT Function | Supports managing registered device IOT functionality, supports smart IoT control based on dialogue context |
Control Panel | Provides Web management interface, supports agent management, user management, system configuration, etc. |
In Development 🚧
To learn about specific development progress, click here
If you are a software developer, here is an Open Letter to Developers. Welcome to join!
Product Ecosystem 👬
Xiaozhi is an ecosystem. When using this product, you might also want to check out other excellent projects in this ecosystem
Project Name | Project Address | Project Description |
---|---|---|
Xiaozhi Android Client | xiaozhi-android-client | A Flutter-based Android and iOS voice dialogue application supporting real-time voice interaction and text dialogue. |
Xiaozhi PC Client | py-xiaozhi | This project provides a Python-based Xiaozhi AI client, allowing you to experience Xiaozhi AI's functionality through code even without physical hardware. |
Xiaozhi Java Server | xiaozhi-esp32-server-java | The Java version of Xiaozhi open-source backend service is a Java-based open-source project. It includes both frontend and backend services, aiming to provide users with a complete backend service solution. |
Supported Platforms/Components List 📋
LLM Language Models
Usage Method | Supported Platforms | Free Platforms |
---|---|---|
openai interface call | Ali Bailian, Volcano Engine Doubao, DeepSeek, Zhipu ChatGLM, Gemini | Zhipu ChatGLM, Gemini |
ollama interface call | Ollama | - |
dify interface call | Dify | - |
fastgpt interface call | Fastgpt | - |
coze interface call | Coze | - |
In fact, any LLM that supports openai interface calls can be integrated and used.
TTS Speech Synthesis
Usage Method | Supported Platforms | Free Platforms |
---|---|---|
API Call | EdgeTTS, Volcano Engine Doubao TTS, Tencent Cloud, Alibaba Cloud TTS, CosyVoiceSiliconflow, TTS302AI, CozeCnTTS, GizwitsTTS, ACGNTTS, OpenAITTS | EdgeTTS, CosyVoiceSiliconflow(partial) |
Local Service | FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS | FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS |
VAD Voice Activity Detection
Type | Platform Name | Usage Method | Pricing Model | Notes |
---|---|---|---|---|
VAD | SileroVAD | Local Usage | Free |
ASR Speech Recognition
Usage Method | Supported Platforms | Free Platforms |
---|---|---|
Local Usage | FunASR, SherpaASR | FunASR, SherpaASR |
API Call | DoubaoASR, FunASRServer, TencentASR, AliyunASR | FunASRServer |
Memory Storage
Type | Platform Name | Usage Method | Pricing Model | Notes |
---|---|---|---|---|
Memory | mem0ai | API Call | 1000 calls/month quota | |
Memory | mem_local_short | Local Summary | Free |
Intent Recognition
Type | Platform Name | Usage Method | Pricing Model | Notes |
---|---|---|---|---|
Intent | intent_llm | API Call | Based on LLM pricing | Uses large model for intent recognition, highly versatile |
Intent | function_call | API Call | Based on LLM pricing | Uses large model function calls for intent, fast and effective |
Acknowledgments 🙏
Logo | Project/Company | Description |
---|---|---|
![]() |
Bailing Voice Dialogue Robot | This project was inspired by Bailing Voice Dialogue Robot and implemented based on it |
![]() |
Tenclass | Thanks to Tenclass for establishing standard communication protocols, multi-device compatibility solutions, and high-concurrency scenario practices for the Xiaozhi ecosystem; providing full-chain technical documentation support for this project |
![]() |
Xuanfeng Technology | Thanks to Xuanfeng Technology for contributing the function call framework, MCP communication protocol, and plugin call mechanism implementation code, significantly improving front-end device (IoT) interaction efficiency and functional extensibility through standardized instruction scheduling system and dynamic expansion capabilities |
![]() |
Huiyuan Design | Thanks to Huiyuan Design for providing professional visual solutions for this project, empowering the product user experience with their design experience serving over a thousand enterprises |
![]() |
Xi'an Qinren Information Technology | Thanks to Xi'an Qinren Information Technology for deepening the visual system of this project, ensuring consistency and extensibility of the overall design style in multi-scenario applications |