You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
HuangHai 5f6c6ecce7
'commit'
3 days ago
..
docs 'commit' 4 days ago
main 'commit' 3 days ago
.dockerignore 'commit' 4 days ago
.gitignore 'commit' 4 days ago
.python-version 'commit' 4 days ago
Dockerfile-server 'commit' 4 days ago
Dockerfile-web 'commit' 4 days ago
LICENSE 'commit' 4 days ago
README.md 'commit' 4 days ago
README_en.md 'commit' 4 days ago
docker-setup.sh 'commit' 4 days ago
main.py 'commit' 4 days ago
opus.dll 'commit' 4 days ago

README_en.md

Banners

Xiaozhi Backend Service xiaozhi-esp32-server

This project provides backend services for the open-source smart hardware project xiaozhi-esp32
Implemented using Python, Java, and Vue according to the Xiaozhi Communication Protocol
Helps you quickly set up your Xiaozhi server

中文 · FAQ · Report Issues · Deployment Guide · Release Notes

GitHub Contributors GitHub Contributors Issues GitHub pull requests GitHub pull requests stars


Target Users 👥

This project requires ESP32 hardware devices. If you have purchased ESP32-related hardware, successfully connected to Brother Xia's backend service, and want to set up your own xiaozhi-esp32 backend service, then this project is perfect for you.

Want to see it in action? Check out these videos 🎥

Xiaozhi esp32 connecting to own backend model Custom voice Using Cantonese Control home appliances Lowest cost configuration
Custom voice Play music Weather plugin IOT command control News broadcast
Real-time interruption Photo recognition Multi-command tasks

Warning ⚠️

  1. This project is open-source software. This software has no commercial relationship with any third-party API service providers (including but not limited to speech recognition, large models, speech synthesis, and other platforms) and does not provide any form of guarantee for their service quality or financial security. It is recommended that users prioritize service providers with relevant business licenses and carefully read their service agreements and privacy policies. This software does not host any account keys, does not participate in fund transfers, and does not bear the risk of recharge fund losses.

  2. This project's functionality is not complete and has not passed network security testing. Please do not use it in production environments. If you deploy this project for learning in a public network environment, please ensure necessary protection measures are in place.


Deployment Documentation

Banners

This project provides two deployment methods. Please choose according to your specific needs:

🚀 Deployment Method Selection

Deployment Method Features Suitable Scenarios Deployment Guide Requirements Video Tutorial
Simplified Installation Smart dialogue, IOT functionality, data stored in configuration files Low-configuration environment, no database needed Docker Version / Source Code Deployment 2 cores 4G if using FunASR, 2 cores 2G if using all APIs -
Full Module Installation Smart dialogue, IOT, OTA, Control Panel, data stored in database Complete functionality experience Docker Version / Source Code Deployment 4 cores 8G if using FunASR, 2 cores 4G if using all APIs Local Source Code Startup Video Tutorial / Local Source Code Auto-Update Tutorial

💡 Note: Below are the test platforms deployed with the latest code. You can flash and test if needed. Concurrent users: 6, data will be cleared daily

Control Panel Address: https://2662r3426b.vicp.fun

Service Test Tool: https://2662r3426b.vicp.fun/test/
OTA Interface Address: https://2662r3426b.vicp.fun/xiaozhi/ota/
Websocket Interface Address: wss://2662r3426b.vicp.fun/xiaozhi/v1/

🚩 Configuration Description and Recommendations

[!Note] The default configuration of this project is Entry Level Free settings. For better results, we recommend using Full Streaming Configuration.

Since version 0.5.2, this project supports full streaming throughout the entire lifecycle. Compared to versions before 0.5, response speed has improved by approximately 2.5 seconds

Module Name Entry Level Free Settings Full Streaming Configuration
ASR(Speech Recognition) FunASR(Local) DoubaoASR(Volcano Streaming Speech Recognition)
LLM(Large Language Model) ChatGLMLLM(Zhipu glm-4-flash) DoubaoLLM(Volcano doubao-1-5-pro-32k-250115)
VLLM(Vision Large Model) ChatGLMVLLM(Zhipu glm-4v-flash) ChatGLMVLLM(Zhipu glm-4v-flash)
TTS(Speech Synthesis) EdgeTTS(Microsoft Speech) HuoshanDoubleStreamTTS(Volcano Double Streaming Speech Synthesis)
Intent(Intent Recognition) function_call(Function Call) function_call(Function Call)
Memory(Memory Function) mem_local_short(Local Short-term Memory) mem_local_short(Local Short-term Memory)

Feature List

Implemented

Feature Module Description
Communication Protocol Based on xiaozhi-esp32 protocol, implements data interaction through WebSocket
Dialogue Interaction Supports wake-up dialogue, manual dialogue, and real-time interruption. Auto-sleep after long periods of no dialogue
Intent Recognition Supports LLM intent recognition, function call, reducing hard-coded intent judgment
Multi-language Recognition Supports Mandarin, Cantonese, English, Japanese, Korean (default using FunASR)
LLM Module Supports flexible LLM module switching, default using ChatGLMLLM, can also use Ali Bailian, DeepSeek, Ollama, etc.
TTS Module Supports EdgeTTS (default), Volcano Engine Doubao TTS, and other TTS interfaces
Memory Function Supports ultra-long memory, local summary memory, and no memory modes
IOT Function Supports managing registered device IOT functionality, supports smart IoT control based on dialogue context
Control Panel Provides Web management interface, supports agent management, user management, system configuration, etc.

In Development 🚧

To learn about specific development progress, click here

If you are a software developer, here is an Open Letter to Developers. Welcome to join!


Product Ecosystem 👬

Xiaozhi is an ecosystem. When using this product, you might also want to check out other excellent projects in this ecosystem

Project Name Project Address Project Description
Xiaozhi Android Client xiaozhi-android-client A Flutter-based Android and iOS voice dialogue application supporting real-time voice interaction and text dialogue.
Xiaozhi PC Client py-xiaozhi This project provides a Python-based Xiaozhi AI client, allowing you to experience Xiaozhi AI's functionality through code even without physical hardware.
Xiaozhi Java Server xiaozhi-esp32-server-java The Java version of Xiaozhi open-source backend service is a Java-based open-source project.
It includes both frontend and backend services, aiming to provide users with a complete backend service solution.

Supported Platforms/Components List 📋

LLM Language Models

Usage Method Supported Platforms Free Platforms
openai interface call Ali Bailian, Volcano Engine Doubao, DeepSeek, Zhipu ChatGLM, Gemini Zhipu ChatGLM, Gemini
ollama interface call Ollama -
dify interface call Dify -
fastgpt interface call Fastgpt -
coze interface call Coze -

In fact, any LLM that supports openai interface calls can be integrated and used.

TTS Speech Synthesis

Usage Method Supported Platforms Free Platforms
API Call EdgeTTS, Volcano Engine Doubao TTS, Tencent Cloud, Alibaba Cloud TTS, CosyVoiceSiliconflow, TTS302AI, CozeCnTTS, GizwitsTTS, ACGNTTS, OpenAITTS EdgeTTS, CosyVoiceSiliconflow(partial)
Local Service FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS

VAD Voice Activity Detection

Type Platform Name Usage Method Pricing Model Notes
VAD SileroVAD Local Usage Free

ASR Speech Recognition

Usage Method Supported Platforms Free Platforms
Local Usage FunASR, SherpaASR FunASR, SherpaASR
API Call DoubaoASR, FunASRServer, TencentASR, AliyunASR FunASRServer

Memory Storage

Type Platform Name Usage Method Pricing Model Notes
Memory mem0ai API Call 1000 calls/month quota
Memory mem_local_short Local Summary Free

Intent Recognition

Type Platform Name Usage Method Pricing Model Notes
Intent intent_llm API Call Based on LLM pricing Uses large model for intent recognition, highly versatile
Intent function_call API Call Based on LLM pricing Uses large model function calls for intent, fast and effective

Acknowledgments 🙏

Logo Project/Company Description
Bailing Voice Dialogue Robot This project was inspired by Bailing Voice Dialogue Robot and implemented based on it
Tenclass Thanks to Tenclass for establishing standard communication protocols, multi-device compatibility solutions, and high-concurrency scenario practices for the Xiaozhi ecosystem; providing full-chain technical documentation support for this project
Xuanfeng Technology Thanks to Xuanfeng Technology for contributing the function call framework, MCP communication protocol, and plugin call mechanism implementation code, significantly improving front-end device (IoT) interaction efficiency and functional extensibility through standardized instruction scheduling system and dynamic expansion capabilities
Huiyuan Design Thanks to Huiyuan Design for providing professional visual solutions for this project, empowering the product user experience with their design experience serving over a thousand enterprises
Xi'an Qinren Information Technology Thanks to Xi'an Qinren Information Technology for deepening the visual system of this project, ensuring consistency and extensibility of the overall design style in multi-scenario applications
Star History Chart