History

HuangHai 5f6c6ecce7 'commit'		3 days ago
..
docs	'commit'	4 days ago
main	'commit'	3 days ago
.dockerignore	'commit'	4 days ago
.gitignore	'commit'	4 days ago
.python-version	'commit'	4 days ago
Dockerfile-server	'commit'	4 days ago
Dockerfile-web	'commit'	4 days ago
LICENSE	'commit'	4 days ago
README.md	'commit'	4 days ago
README_en.md	'commit'	4 days ago
docker-setup.sh	'commit'	4 days ago
main.py	'commit'	4 days ago
opus.dll	'commit'	4 days ago

README_en.md

Xiaozhi Backend Service xiaozhi-esp32-server

This project provides backend services for the open-source smart hardware project xiaozhi-esp32
Implemented using Python, Java, and Vue according to the Xiaozhi Communication Protocol
Helps you quickly set up your Xiaozhi server

中文 · FAQ · Report Issues · Deployment Guide · Release Notes

Target Users 👥

This project requires ESP32 hardware devices. If you have purchased ESP32-related hardware, successfully connected to Brother Xia's backend service, and want to set up your own xiaozhi-esp32 backend service, then this project is perfect for you.

Want to see it in action? Check out these videos 🎥

Warning ⚠️

This project is open-source software. This software has no commercial relationship with any third-party API service providers (including but not limited to speech recognition, large models, speech synthesis, and other platforms) and does not provide any form of guarantee for their service quality or financial security. It is recommended that users prioritize service providers with relevant business licenses and carefully read their service agreements and privacy policies. This software does not host any account keys, does not participate in fund transfers, and does not bear the risk of recharge fund losses.
This project's functionality is not complete and has not passed network security testing. Please do not use it in production environments. If you deploy this project for learning in a public network environment, please ensure necessary protection measures are in place.

Deployment Documentation

This project provides two deployment methods. Please choose according to your specific needs:

🚀 Deployment Method Selection

Deployment Method	Features	Suitable Scenarios	Deployment Guide	Requirements	Video Tutorial
Simplified Installation	Smart dialogue, IOT functionality, data stored in configuration files	Low-configuration environment, no database needed	Docker Version / Source Code Deployment	2 cores 4G if using `FunASR`, 2 cores 2G if using all APIs	-
Full Module Installation	Smart dialogue, IOT, OTA, Control Panel, data stored in database	Complete functionality experience	Docker Version / Source Code Deployment	4 cores 8G if using `FunASR`, 2 cores 4G if using all APIs	Local Source Code Startup Video Tutorial / Local Source Code Auto-Update Tutorial

💡 Note: Below are the test platforms deployed with the latest code. You can flash and test if needed. Concurrent users: 6, data will be cleared daily

Control Panel Address: https://2662r3426b.vicp.fun

Service Test Tool: https://2662r3426b.vicp.fun/test/
OTA Interface Address: https://2662r3426b.vicp.fun/xiaozhi/ota/
Websocket Interface Address: wss://2662r3426b.vicp.fun/xiaozhi/v1/

🚩 Configuration Description and Recommendations

[!Note] The default configuration of this project is Entry Level Free settings. For better results, we recommend using Full Streaming Configuration.

Since version 0.5.2, this project supports full streaming throughout the entire lifecycle. Compared to versions before 0.5, response speed has improved by approximately 2.5 seconds

Module Name	Entry Level Free Settings	Full Streaming Configuration
ASR(Speech Recognition)	FunASR(Local)	✅DoubaoASR(Volcano Streaming Speech Recognition)
LLM(Large Language Model)	ChatGLMLLM(Zhipu glm-4-flash)	✅DoubaoLLM(Volcano doubao-1-5-pro-32k-250115)
VLLM(Vision Large Model)	ChatGLMVLLM(Zhipu glm-4v-flash)	✅ChatGLMVLLM(Zhipu glm-4v-flash)
TTS(Speech Synthesis)	EdgeTTS(Microsoft Speech)	✅HuoshanDoubleStreamTTS(Volcano Double Streaming Speech Synthesis)
Intent(Intent Recognition)	function_call(Function Call)	✅function_call(Function Call)
Memory(Memory Function)	mem_local_short(Local Short-term Memory)	✅mem_local_short(Local Short-term Memory)

Feature List ✨

Implemented ✅

Feature Module	Description
Communication Protocol	Based on `xiaozhi-esp32` protocol, implements data interaction through WebSocket
Dialogue Interaction	Supports wake-up dialogue, manual dialogue, and real-time interruption. Auto-sleep after long periods of no dialogue
Intent Recognition	Supports LLM intent recognition, function call, reducing hard-coded intent judgment
Multi-language Recognition	Supports Mandarin, Cantonese, English, Japanese, Korean (default using FunASR)
LLM Module	Supports flexible LLM module switching, default using ChatGLMLLM, can also use Ali Bailian, DeepSeek, Ollama, etc.
TTS Module	Supports EdgeTTS (default), Volcano Engine Doubao TTS, and other TTS interfaces
Memory Function	Supports ultra-long memory, local summary memory, and no memory modes
IOT Function	Supports managing registered device IOT functionality, supports smart IoT control based on dialogue context
Control Panel	Provides Web management interface, supports agent management, user management, system configuration, etc.

In Development 🚧

To learn about specific development progress, click here

If you are a software developer, here is an Open Letter to Developers. Welcome to join!

Product Ecosystem 👬

Xiaozhi is an ecosystem. When using this product, you might also want to check out other excellent projects in this ecosystem

Project Name	Project Address	Project Description
Xiaozhi Android Client	xiaozhi-android-client	A Flutter-based Android and iOS voice dialogue application supporting real-time voice interaction and text dialogue.
Xiaozhi PC Client	py-xiaozhi	This project provides a Python-based Xiaozhi AI client, allowing you to experience Xiaozhi AI's functionality through code even without physical hardware.
Xiaozhi Java Server	xiaozhi-esp32-server-java	The Java version of Xiaozhi open-source backend service is a Java-based open-source project. It includes both frontend and backend services, aiming to provide users with a complete backend service solution.

Supported Platforms/Components List 📋

LLM Language Models

Usage Method	Supported Platforms	Free Platforms
openai interface call	Ali Bailian, Volcano Engine Doubao, DeepSeek, Zhipu ChatGLM, Gemini	Zhipu ChatGLM, Gemini
ollama interface call	Ollama	-
dify interface call	Dify	-
fastgpt interface call	Fastgpt	-
coze interface call	Coze	-

In fact, any LLM that supports openai interface calls can be integrated and used.

TTS Speech Synthesis

Usage Method	Supported Platforms	Free Platforms
API Call	EdgeTTS, Volcano Engine Doubao TTS, Tencent Cloud, Alibaba Cloud TTS, CosyVoiceSiliconflow, TTS302AI, CozeCnTTS, GizwitsTTS, ACGNTTS, OpenAITTS	EdgeTTS, CosyVoiceSiliconflow(partial)
Local Service	FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS	FishSpeech, GPT_SOVITS_V2, GPT_SOVITS_V3, MinimaxTTS

VAD Voice Activity Detection

Type	Platform Name	Usage Method	Pricing Model	Notes
VAD	SileroVAD	Local Usage	Free

ASR Speech Recognition

Usage Method	Supported Platforms	Free Platforms
Local Usage	FunASR, SherpaASR	FunASR, SherpaASR
API Call	DoubaoASR, FunASRServer, TencentASR, AliyunASR	FunASRServer

Memory Storage

Type	Platform Name	Usage Method	Pricing Model	Notes
Memory	mem0ai	API Call	1000 calls/month quota
Memory	mem_local_short	Local Summary	Free

Intent Recognition

Type	Platform Name	Usage Method	Pricing Model	Notes
Intent	intent_llm	API Call	Based on LLM pricing	Uses large model for intent recognition, highly versatile
Intent	function_call	API Call	Based on LLM pricing	Uses large model function calls for intent, fast and effective

Acknowledgments 🙏

Logo	Project/Company	Description
	Bailing Voice Dialogue Robot	This project was inspired by Bailing Voice Dialogue Robot and implemented based on it
	Tenclass	Thanks to Tenclass for establishing standard communication protocols, multi-device compatibility solutions, and high-concurrency scenario practices for the Xiaozhi ecosystem; providing full-chain technical documentation support for this project
	Xuanfeng Technology	Thanks to Xuanfeng Technology for contributing the function call framework, MCP communication protocol, and plugin call mechanism implementation code, significantly improving front-end device (IoT) interaction efficiency and functional extensibility through standardized instruction scheduling system and dynamic expansion capabilities
	Huiyuan Design	Thanks to Huiyuan Design for providing professional visual solutions for this project, empowering the product user experience with their design experience serving over a thousand enterprises
	Xi'an Qinren Information Technology	Thanks to Xi'an Qinren Information Technology for deepening the visual system of this project, ensuring consistency and extensibility of the overall design style in multi-scenario applications