|
3 days ago | |
---|---|---|
.. | ||
manager-api | 4 days ago | |
manager-web | 4 days ago | |
xiaozhi-server | 3 days ago | |
README.md | 4 days ago | |
README_en.md | 4 days ago |
README_en.md
Technical Documentation: xiaozhi-esp32-server
Table of Contents:
- Introduction
- Overall Architecture
- Component Deep Dive
- Data Flow and Interaction Mechanisms
- Key Features Summary
- Deployment and Configuration Overview
1. Introduction
The xiaozhi-esp32-server
project is a comprehensive backend system designed to support intelligent hardware based on ESP32. Its core goal is to enable developers to quickly build a robust server infrastructure that can understand natural language commands, interact efficiently with various AI services (for speech recognition, natural language understanding, and speech synthesis), manage IoT devices, and provide a web-based user interface for system configuration and management. By integrating multiple cutting-edge technologies into a cohesive and extensible platform, this project aims to simplify and accelerate the development process of customizable voice assistants and intelligent control systems. It is not just a simple server, but a bridge connecting hardware, AI capabilities, and user management.
2. Overall Architecture
The xiaozhi-esp32-server
system adopts a distributed, multi-component collaborative architectural design, ensuring modularity, maintainability, and scalability. Each core component has its specific role and works in coordination. The main components include:
-
ESP32 Hardware (Client Device): This is the physical smart hardware device that end-users directly interact with. Its main responsibilities include:
- Capturing user voice commands.
- Securely sending captured raw audio data to
xiaozhi-server
for processing. - Receiving synthesized voice responses from
xiaozhi-server
and playing them through speakers. - Controlling other connected peripherals or IoT devices (such as smart bulbs, sensors, etc.) based on instructions received from
xiaozhi-server
.
-
xiaozhi-server
(Core AI Engine - Python Implementation): This Python-based server is the "brain" of the entire system, responsible for handling all voice-related logic and AI interactions. Its key responsibilities are detailed as follows:- Establishing stable, low-latency real-time bidirectional communication links with ESP32 devices through the WebSocket protocol.
- Receiving audio streams from ESP32 and using Voice Activity Detection (VAD) technology to precisely segment valid speech segments.
- Integrating and calling Automatic Speech Recognition (ASR) services (configurable for local or cloud), converting speech segments to text.
- Interacting with Large Language Models (LLMs) to parse user intent, generate intelligent responses, and support complex natural language understanding tasks.
- Managing context information and user memory in multi-turn dialogues to provide coherent interaction experiences.
- Calling Text-to-Speech (TTS) services to synthesize natural and fluent speech from LLM-generated text responses.
- Executing custom commands through a flexible plugin system, including IoT device control logic.
- Obtaining its detailed runtime operation configuration from the
manager-api
service.
-
manager-api
(Management Backend - Java Spring Boot Implementation): This is an application built using the Java Spring Boot framework, providing a secure RESTful API for system management and configuration. It serves not only as the backend support for themanager-web
console but also as the configuration data source forxiaozhi-server
. Its core functions include:- Providing user authentication (login, permission verification) and user account management functions for the Web console.
- Registration, information management of ESP32 devices, and maintenance of device-specific configurations.
- Persistently storing system configurations in the MySQL database, such as user-selected AI service providers, API keys, device parameters, plugin settings, etc.
- Providing specific API endpoints for
xiaozhi-server
to pull its required latest configuration. - Managing TTS voice options, handling OTA (Over-The-Air) firmware update processes, and related metadata.
- Utilizing Redis as a high-speed cache to store hotspot data (such as session information, frequently accessed configurations) to improve API response speed and overall system performance.
-
manager-web
(Web Control Panel - Vue.js Implementation): This is a Single Page Application (SPA) built with Vue.js, providing system administrators with a graphical, user-friendly operation interface. Its main capabilities include:- Conveniently configuring various AI services used by
xiaozhi-server
(such as ASR, LLM, TTS provider switching, parameter adjustment). - Managing platform user accounts, role assignment, and permission control.
- Managing registered ESP32 devices and their related settings.
- (Potential functionality) Monitoring system operation status, viewing logs, troubleshooting, etc.
- Comprehensive interaction with all backend management functions provided by
manager-api
.
- Conveniently configuring various AI services used by
High-Level Interaction Flow Overview:
- Voice Interaction Main Line: After the ESP32 device captures user voice, it transmits audio data in real-time to
xiaozhi-server
through WebSocket. Afterxiaozhi-server
completes a series of AI processing (VAD, ASR, LLM interaction, TTS), it sends the synthesized voice response back to the ESP32 device for playback through WebSocket. All real-time interactions directly related to voice are completed in this link. - Management Configuration Main Line: Administrators access the
manager-web
console through a browser.manager-web
executes various management operations (such as modifying configurations, managing users or devices) by calling RESTful HTTP interfaces provided bymanager-api
. Data is passed between them in JSON format. - Configuration Synchronization:
xiaozhi-server
actively pulls its latest operation configuration frommanager-api
through HTTP requests when starting or when specific update mechanisms are triggered. This ensures that configuration changes made by administrators in the Web interface can be effectively applied to the operation of the core AI engine in a timely manner.
This frontend-backend separation, core service and management service separation architectural design allows xiaozhi-server
to focus on efficient real-time AI processing tasks, while manager-api
and manager-web
together provide a powerful and easy-to-use management and configuration platform. Each component has clear responsibilities, facilitating independent development, testing, deployment, and expansion.
xiaozhi-esp32-server
├─ xiaozhi-server Port 8000 Python development Responsible for ESP32 communication
├─ manager-web Port 8001 Node.js+Vue development Responsible for providing web interface for console
├─ manager-api Port 8002 Java development Responsible for providing console API
3. Component Deep Dive
3.1. xiaozhi-server
(Core AI Engine - Python Implementation)
The xiaozhi-server
is the intelligent core of the system, responsible for processing voice interactions, interfacing with AI services, and managing communication with ESP32 devices.
-
Purpose:
- To provide real-time processing of voice commands from ESP32 devices.
- To integrate with various AI services for Speech-to-Text (ASR), Natural Language Understanding (via Large Language Models - LLMs), Text-to-Speech (TTS), Voice Activity Detection (VAD), Intent Recognition, and Memory.
- To manage dialogue flow and context with users.
- To execute custom functions and control IoT devices based on user commands.
- To be dynamically configurable through the
manager-api
.
-
Core Technologies:
- Python 3: The primary programming language.
- Asyncio: Python's asynchronous programming framework, crucial for handling concurrent WebSocket connections and non-blocking I/O for AI service API calls.
websockets
Library: For WebSocket server implementation.- HTTP Client (e.g.,
aiohttp
,httpx
): For asynchronous HTTP requests tomanager-api
and external AI services. - YAML (PyYAML): For local configuration file parsing.
-
Key Implementation Aspects:
-
AI Service Provider Pattern (
core/providers/
):- Concept: A flexible design for integrating AI services. Each service type (ASR, TTS, LLM, etc.) has an abstract base class defining a common interface. Concrete classes implement this interface for specific vendors or local models.
- Benefit: Allows easy switching of AI service backends via configuration and simplifies adding new service integrations.
- Initialization:
core/utils/modules_initialize.py
acts as a factory to load and instantiate configured providers.
-
WebSocket Communication & Connection Handling (
core/websocket_server.py
,core/connection.py
):- Server Setup: Manages WebSocket connections from ESP32 devices.
- Connection Isolation: Each ESP32 client gets a dedicated
ConnectionHandler
instance, isolating its session state and dialogue. - Dynamic Configuration Updates: Can fetch updated configurations from
manager-api
and re-initialize AI service modules live, without a full server restart.
-
Message Handling & Dialogue Flow (
core/handle/
):- Employs a modular handler pattern. The
ConnectionHandler
dispatches message processing to specialized modules based on message type or dialogue phase (e.g.,receiveAudioHandle.py
for audio input,intentHandler.py
for NLU,functionHandler.py
for plugin execution,sendAudioHandle.py
for TTS output).
- Employs a modular handler pattern. The
-
Plugin System for Extensible Functions (
plugins_func/
):- Purpose: Allows adding custom "skills" (e.g., weather, news, Home Assistant control).
- Mechanism: Plugins define functions and schemas. The LLM can request execution of these functions (function calling).
loadplugins.py
andregister.py
manage plugin discovery and registration.
-
Configuration Management (
config/
):- Loads settings from a local
config.yaml
and merges them with configurations fetched frommanager-api
(viamanage_api_client.py
), enabling remote dynamic configuration. logger.py
sets up structured application logging.config/assets/
stores predefined audio files for system notifications.
- Loads settings from a local
-
Auxiliary HTTP Server (
core/http_server.py
):- Handles specific HTTP requests, notably for OTA firmware updates (
/xiaozhi/ota/
) and other utility endpoints.
- Handles specific HTTP requests, notably for OTA firmware updates (
-
3.2. manager-api
(Management Backend - Java Spring Boot Implementation)
The manager-api
component is a backend server built using Java and the Spring Boot framework, serving as the administrative hub.
-
Purpose:
- Provide a secure RESTful API for the
manager-web
frontend. - Act as a centralized configuration provider for
xiaozhi-server
. - Manage persistent data (users, devices, AI configurations, voice timbres, OTA firmware).
- Provide a secure RESTful API for the
-
Core Technologies:
- Java 21 & Spring Boot 3: Core language and framework.
- Spring MVC: For building REST controllers.
- MyBatis-Plus: ORM for database interaction with MySQL.
- MySQL: Relational database.
- Druid: JDBC connection pool.
- Redis (Spring Data Redis): For caching.
- Apache Shiro: Security framework for authentication and authorization.
- Liquibase: Database schema migration.
- Knife4j: OpenAPI (Swagger) API documentation.
- Maven: Build and dependency management.
-
Key Implementation Aspects:
-
Modular Architecture (
modules/
package):- Business logic is organized into distinct modules (e.g.,
sys
for users/roles,agent
for assistant configs,device
for ESP32s,config
forxiaozhi-server
settings,security
,timbre
,ota
). - Each module typically follows a layered pattern: Controller, Service, DAO (Mapper), Entity, DTO.
- Business logic is organized into distinct modules (e.g.,
-
Layered Architecture:
- Controller Layer (
@RestController
): Defines API endpoints, handles HTTP request/response. - Service Layer (
@Service
): Contains business logic, transaction management. - Data Access Layer (MyBatis-Plus Mappers): Interacts with the MySQL database.
- Controller Layer (
-
Common Functionalities (
common/
package):- Provides shared code: base classes, global configurations (Spring, MyBatis, Redis, Knife4j), custom annotations (e.g.,
@LogOperation
), AOP aspects, global exception handling, utility classes, and XSS protection.
- Provides shared code: base classes, global configurations (Spring, MyBatis, Redis, Knife4j), custom annotations (e.g.,
-
Security (Apache Shiro):
- Manages user authentication and permissions for accessing API endpoints. Configured with Shiro Realms and security filters.
-
Database Schema Management (Liquibase):
- Ensures consistent database structure across environments through versioned schema changes.
-
3.3. manager-web
(Web Control Panel - Vue.js Implementation)
The manager-web
is a Single Page Application (SPA) providing the administrative user interface.
-
Purpose:
- Offer a web-based control panel for system configuration and management.
- Enable administrators to configure
xiaozhi-server
's AI services, manage users and devices, customize voice timbres, and handle OTA updates.
-
Core Technologies:
- Vue.js 2 & Vue CLI: Core JavaScript framework and build tools.
- Vue Router: For client-side routing within the SPA.
- Vuex: For centralized state management.
- Element UI: UI component library for a consistent look and feel.
- SCSS: CSS preprocessor.
- HTTP Client (Flyio or Axios): For API calls to
manager-api
. - Workbox: For PWA features (caching, service worker).
- Opus Libraries: For potential in-browser audio recording/playback.
-
Key Implementation Aspects:
- SPA Structure: Single HTML page with dynamic view updates.
- Component-Based Architecture: UI built from reusable Vue components (
.vue
files insrc/views/
for pages andsrc/components/
for smaller elements). - Client-Side Routing (
src/router/index.js
): Maps browser URLs to view components, with route guards for authentication. - State Management (
src/store/index.js
): Vuex manages global state (user info, device lists, etc.) via state, getters, mutations, and actions (often involving API calls). - API Communication (
src/apis/
): Modularized API service files make asynchronous calls tomanager-api
. - Build Process & PWA Features: Vue CLI (Webpack) bundles assets. Workbox enables PWA features like caching.
- Environment Configuration (
.env
files): Manages settings like themanager-api
base URL for different environments.
4. Data Flow and Interaction Mechanisms
The xiaozhi-esp32-server
system coordinates work through well-defined data flows and interaction protocols between components. The main communication methods rely on WebSocket protocol optimized for real-time interaction and RESTful API suitable for client-server requests.
4.1. Core Voice Interaction Flow (ESP32 Device <-> xiaozhi-server
)
This flow is real-time, primarily using WebSocket for low-latency, bidirectional data exchange.
-
Communication Protocol Documentation:
- Detailed communication protocol documentation can be accessed at: https://ccnphfhqs21z.feishu.cn/wiki/M0XiwldO9iJwHikpXD5cEx71nKh
- This document details the WebSocket communication protocol between ESP32 devices and
xiaozhi-server
, including:- Connection establishment and handshake process
- Audio data transmission format
- Control command format
- Status report format
- Error handling mechanism
-
Connection Establishment and Handshake:
- The ESP32 device, as a client, actively initiates a WebSocket connection request to the specified endpoint of
xiaozhi-server
(e.g.,ws://<server-IP>:<WebSocket-port>/xiaozhi/v1/
). xiaozhi-server
(core/websocket_server.py
) receives the connection and instantiates an independentConnectionHandler
object for each successfully connected ESP32 device to manage the entire lifecycle of that session.- After the connection is established, an initial handshake process may be executed (handled by
core/handle/helloHandle.py
) to exchange device identification, authentication information, protocol version, or basic status.
- The ESP32 device, as a client, actively initiates a WebSocket connection request to the specified endpoint of
-
Audio Uplink Transmission (ESP32 ->
xiaozhi-server
):- After a user speaks to the ESP32 device, the device's microphone captures raw audio data (usually in PCM or compressed formats like Opus).
- The ESP32 pushes these audio data chunks as WebSocket binary messages in real-time to the corresponding
ConnectionHandler
inxiaozhi-server
. - The server-side
core/handle/receiveAudioHandle.py
module is responsible for receiving, buffering, and processing these audio data.
-
AI Core Processing (within
xiaozhi-server
):- VAD (Voice Activity Detection):
receiveAudioHandle.py
uses the configured VAD provider (such as SileroVAD) to analyze the audio stream, accurately identifying the start and end points of speech, filtering out silent or noise segments. - ASR (Automatic Speech Recognition): Detected valid speech segments are sent to the configured ASR provider (local such as FunASR, or cloud services). The ASR engine converts audio signals into text strings.
- NLU/LLM (Natural Language Understanding/Large Language Model): The ASR output text, along with the current dialogue context history obtained from the Memory provider, and the description schemas of available functions (tools) loaded from
plugins_func/
, are passed to the configured LLM provider. - Function Call Execution (if LLM decides needed): If the LLM analysis determines that an external function needs to be called (e.g., querying weather, controlling home appliances), it generates a structured function call request.
core/handle/functionHandler.py
receives this request, finds and executes the corresponding Python function defined inplugins_func/
, and returns the function's execution result to the LLM. The LLM then generates the final natural language response based on this result. - Response Generation: The LLM synthesizes all information (user input, context, function call results, etc.) to generate the final text response.
- Memory Update: The current round of interaction (user question, LLM response, possible function calls) is processed by the Memory provider to update the dialogue history for subsequent interactions.
- TTS (Text-to-Speech): The final text response generated by the LLM is sent to the configured TTS provider, which synthesizes the text into a speech data stream (e.g., MP3 or WAV format).
- VAD (Voice Activity Detection):
-
Audio Downlink Response (
xiaozhi-server
-> ESP32):- The speech data stream synthesized by the TTS provider is sent in real-time as WebSocket binary messages back to the ESP32 device through the
core/handle/sendAudioHandle.py
module. - The ESP32 device receives these audio data chunks and immediately plays them to the user through the speaker.
- The speech data stream synthesized by the TTS provider is sent in real-time as WebSocket binary messages back to the ESP32 device through the
-
Control and Status Messages (Bidirectional):
- In addition to audio streams, ESP32 and
xiaozhi-server
also exchange text messages through WebSocket, these messages are usually encapsulated in JSON format. - ESP32 -> Server: The device may send status reports (such as network conditions, microphone status), error codes, or specific control commands (e.g., "stop TTS playback" triggered by user button press).
- Server -> ESP32: The server may send control instructions to the device (such as "start listening", "stop listening", adjust sensitivity, send specific configuration parameters).
- Modules like
core/handle/abortHandle.py
(handling interrupt requests),core/handle/reportHandle.py
(handling device reports) are responsible for parsing and responding to these control/status messages.
- In addition to audio streams, ESP32 and
4.2. Management and Configuration Flow (manager-web
<-> manager-api
<-> xiaozhi-server
)
This flow primarily relies on HTTP/HTTPS-based RESTful API for request-response interactions.
-
Administrator UI Backend Interaction (
manager-web
->manager-api
):- When administrators perform operations in the
manager-web
interface (e.g., saving a configuration, adding a new user, registering an ESP32 device):- The Vue.js frontend application (
manager-web
) will initiate asynchronous HTTP requests (usually GET, POST, PUT, DELETE) to the corresponding REST API endpoints ofmanager-api
through its API encapsulation module (located insrc/apis/module/
). - Request and response bodies typically use JSON format.
- The
@RestController
classes inmanager-api
receive these requests. The Apache Shiro framework will first perform authentication and authorization checks on the requests. - After verification, the Controller distributes the request to the corresponding Service layer to handle business logic. The Service layer may interact with the MySQL database (through MyBatis-Plus) and may utilize Redis for data caching.
- After processing,
manager-api
returns an HTTP response in JSON format tomanager-web
. manager-web
updates its Vuex state store and user interface display based on the response results.
- The Vue.js frontend application (
- When administrators perform operations in the
-
Configuration Synchronization (
manager-api
->xiaozhi-server
):- The operation of
xiaozhi-server
depends on dynamic configurations obtained frommanager-api
(such as currently selected AI service providers and their API keys). - Pull Mechanism: The
config/manage_api_client.py
module withinxiaozhi-server
, when the server starts or through specific update triggers (e.g., whenWebSocketServer.update_config()
is called), will initiate an HTTP GET request to a specified endpoint ofmanager-api
(e.g., provided by a Controller inmodules/config/controller/
). manager-api
responds to this request, returning the configuration data required byxiaozhi-server
(in JSON format).- After receiving the configuration,
xiaozhi-server
will update its internal state and may reinitialize relevant AI service modules to make the new configuration effective.
- The operation of
-
OTA Firmware Update Flow (Conceptual Description):
- Administrators upload new ESP32 firmware packages to specific endpoints of
manager-api
through themanager-web
interface. manager-api
stores the firmware files and records related metadata (version number, applicable device models, etc.).- When administrators trigger OTA updates for specific devices:
manager-api
may notifyxiaozhi-server
(the specific notification mechanism may be a polling checkpoint, orxiaozhi-server
exposes an API to receive update notifications, or more loosely coupled like message queues).xiaozhi-server
can then send an instruction message containing the firmware download URL to the target ESP32 device through WebSocket.- After receiving the instruction, the ESP32 device downloads the firmware through an HTTP GET request from that URL. This URL may point to a path served by the
SimpleHttpServer
running onxiaozhi-server
itself (such as/xiaozhi/ota/
), or in some architectures, it may directly point tomanager-api
or a dedicated file server.
- Administrators upload new ESP32 firmware packages to specific endpoints of
4.3. Main Protocol Summary:
- WebSocket: Selected for the communication link between ESP32 and
xiaozhi-server
because it is very suitable for real-time, low-latency, bidirectional data stream transmission (especially audio), as well as asynchronous control message delivery. - RESTful APIs (based on HTTP/HTTPS, usually using JSON as the data exchange format): This is the standard way for web service communication. Used for request-response interactions between
manager-web
(client) andmanager-api
(server), and also forxiaozhi-server
(as client) to pull configuration information frommanager-api
(as server). Its stateless nature, wide library support, and easy-to-understand semantics make it an ideal choice for such interactions.
This multi-protocol communication strategy ensures that different types of interaction requirements within the system can be handled efficiently and appropriately, balancing real-time performance and standardized request-response patterns.
5. Key Features Summary
The xiaozhi-esp32-server
system provides a series of rich features aimed at supporting developers in building advanced voice control applications:
- Comprehensive Voice Interaction Backend: Provides an end-to-end solution from voice capture guidance to response generation and action execution.
- Modular and Pluggable AI Services:
- Supports a wide range of ASR (Automatic Speech Recognition), LLM (Large Language Model), TTS (Text-to-Speech), VAD (Voice Activity Detection), Intent Recognition, and Memory providers.
- Allows dynamic selection and configuration of these services (including cloud-based APIs and local models) to balance cost, performance, privacy, and language requirements.
- Advanced Dialogue Management:
- Supports natural interaction, with wake word to start dialogue, manual (push-to-talk) dialogue, and real-time interruption of system responses.
- Includes contextual memory to maintain coherence in multi-turn dialogues.
- Has automatic sleep mode after a period of inactivity.
- Multi-language Capabilities:
- Supports recognition and synthesis in multiple languages, including Mandarin, Cantonese, English, Japanese, and Korean (specific capabilities depend on the selected ASR/LLM/TTS providers).
- Extensible Functions through Plugins:
- Powerful plugin system allows developers to add custom "skills" or functions (e.g., getting weather, controlling smart home devices, accessing news).
- These functions can be triggered by the LLM using its function calling capability, based on provided schemas.
- Built-in support for Home Assistant integration.
- IoT Device Control:
- Designed to manage and control smart home devices and other IoT hardware through voice commands, utilizing the plugin system.
- Web-based Management Console (
manager-web
&manager-api
):- Provides a comprehensive graphical interface for:
- System configuration (AI service selection, API keys, operation parameters).
- Role-based access control user management.
- ESP32 device registration and management.
- Voice timbre/TTS voice customization.
- OTA (Over-The-Air) firmware update management for ESP32 devices.
- System parameter and dictionary management.
- Provides a comprehensive graphical interface for:
- Flexible Deployment Options:
- Supports deployment through Docker containers (for simplified server-only or full-stack setup) and directly from source code, adapting to various environments and user expertise.
- Dynamic Remote Configuration:
xiaozhi-server
can obtain its configuration frommanager-api
, allowing real-time updates of AI providers and settings without restarting the server.
- Open Source and Community-Driven:
- Licensed under MIT License, encouraging transparency, collaboration, and community contribution.
- Cost-Effective Solution:
- Provides an "Entry Level Free Settings" path, utilizing free tiers of AI services or local models, making it easy to experiment and for personal projects.
- Progressive Web Application (PWA) Features:
- The
manager-web
control panel includes Service Worker integration to enhance caching and potential offline access capabilities.
- The
- Detailed API Documentation:
manager-api
provides OpenAPI (Swagger) documentation through Knife4j for clear understanding and testing of its RESTful endpoints.
These features together make xiaozhi-esp32-server
a powerful, adaptable, and user-friendly platform for building complex voice interaction applications.
6. Deployment and Configuration Overview
The xiaozhi-esp32-server
system is designed with flexibility in mind, providing multiple deployment methods and comprehensive configuration options to adapt to different usage scenarios and requirements.
Deployment Options:
The project can be deployed in multiple ways, mainly including using Docker to simplify the installation process, or deploying directly from source code for greater control and development.
-
Docker-based Deployment:
- Simplified Installation (Only
xiaozhi-server
): This option only deploys the core Python-basedxiaozhi-server
. It is suitable for users who mainly need voice AI processing capabilities and IoT control, without requiring the complete Web management interface and database support functions (such as OTA). In this mode, configuration is typically managed through local files (config.yaml
), but if needed, it can still point to an existingmanager-api
instance. - Full Module Installation (All Components): This scheme deploys all core components:
xiaozhi-server
, Java-basedmanager-api
, and Vue.js-basedmanager-web
, along with required database services (MySQL and Redis). This provides a complete system experience, including a Web control panel for comprehensive configuration and management. - The project provides
Dockerfile
definitions for each service and usesdocker-compose.yml
files (e.g.,docker-compose.yml
for basic version,docker-compose_all.yml
for full-featured version) to orchestrate and manage multi-container deployment. Additionally, adocker-setup.sh
script may be provided to assist in automating part of the Docker environment setup work.
- Simplified Installation (Only
-
Source Code Deployment:
- This method requires manual setup of the corresponding development environment for each component: Python environment for
xiaozhi-server
, Java/Maven environment formanager-api
, Node.js/Vue CLI environment formanager-web
. - For full module installation, MySQL and Redis database services also need to be manually installed and configured.
- This approach is typically used for project development, deep customization, debugging, or in production scenarios with special environmental requirements.
- This method requires manual setup of the corresponding development environment for each component: Python environment for
Configuration Management:
Configuration is key to customizing system behavior, especially in selecting AI service providers and managing API keys.
-
xiaozhi-server
Configuration:- Local
config.yaml
: A main YAML format configuration file located in thexiaozhi-server
root directory. It defines server ports, selected AI service providers (ASR, LLM, TTS, VAD, Intent Recognition, Memory modules, etc.), their respective API keys or model paths, plugin configurations, and log levels. - Remote Configuration through
manager-api
:xiaozhi-server
is designed to obtain its operation configuration frommanager-api
. Settings obtained frommanager-api
typically override settings with the same name in the localconfig.yaml
. This brings two major benefits:- Centralized Management: All configurations can be managed uniformly through the
manager-web
interface. - Dynamic Updates:
xiaozhi-server
can refresh its configuration and reinitialize AI modules without completely restarting the service.
- Centralized Management: All configurations can be managed uniformly through the
config/config_loader.py
andconfig/manage_api_client.py
inxiaozhi-server
are responsible for handling configuration loading, merging, and pulling logic frommanager-api
.
- Local
-
manager-api
Configuration:- As a Spring Boot application, its configuration is mainly managed through the
application.properties
orapplication.yml
file located in thesrc/main/resources
directory. - Key configuration items include: database connection information (MySQL URL, username, password), Redis server address and port, application service port (default 8002), Apache Shiro security-related settings, and configuration parameters for any integrated third-party services (such as Aliyun SMS).
- As a Spring Boot application, its configuration is mainly managed through the
-
manager-web
Configuration:- Environment-specific settings for the Vue.js frontend application are managed through
.env
series files (e.g.,.env
,.env.development
,.env.production
) in the project root directory. - The most critical configuration here is usually the API base URL address of the
manager-api
backend (e.g.,VUE_APP_API_BASE_URL
), to which the frontend application will send all API requests.
- Environment-specific settings for the Vue.js frontend application are managed through
-
Predefined Configuration Schemes:
- The project documentation (usually README) will recommend some common configuration combinations, for example:
- "Entry Level Free Settings": This scheme aims to utilize free tier quotas of cloud AI services or completely free local models to minimize users' initial usage costs and operating expenses.
- "Full Streaming Configuration": This scheme prioritizes system response speed and interaction fluency, typically choosing AI services that support streaming processing (possibly paid).
- These predefined schemes provide guidance for users to configure AI service providers in
xiaozhi-server
(through themanager-web
interface or directly modifyingconfig.yaml
).
- The project documentation (usually README) will recommend some common configuration combinations, for example:
In the case of full module deployment, it is recommended to use the manager-web
control panel as the main operation interface for most configuration tasks, as it provides a user-friendly way to manage various settings that are persisted by manager-api
and ultimately used by xiaozhi-server
.