You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
HuangHai 9ffb8fb6ac
'commit'
2 days ago
..
manager-api 'commit' 3 days ago
manager-web 'commit' 3 days ago
xiaozhi-server 'commit' 2 days ago
README.md 'commit' 3 days ago
README_en.md 'commit' 3 days ago

README_en.md

Technical Documentation: xiaozhi-esp32-server

Table of Contents:

  1. Introduction
  2. Overall Architecture
  3. Component Deep Dive
  4. Data Flow and Interaction Mechanisms
  5. Key Features Summary
  6. Deployment and Configuration Overview

1. Introduction

The xiaozhi-esp32-server project is a comprehensive backend system designed to support intelligent hardware based on ESP32. Its core goal is to enable developers to quickly build a robust server infrastructure that can understand natural language commands, interact efficiently with various AI services (for speech recognition, natural language understanding, and speech synthesis), manage IoT devices, and provide a web-based user interface for system configuration and management. By integrating multiple cutting-edge technologies into a cohesive and extensible platform, this project aims to simplify and accelerate the development process of customizable voice assistants and intelligent control systems. It is not just a simple server, but a bridge connecting hardware, AI capabilities, and user management.


2. Overall Architecture

The xiaozhi-esp32-server system adopts a distributed, multi-component collaborative architectural design, ensuring modularity, maintainability, and scalability. Each core component has its specific role and works in coordination. The main components include:

  1. ESP32 Hardware (Client Device): This is the physical smart hardware device that end-users directly interact with. Its main responsibilities include:

    • Capturing user voice commands.
    • Securely sending captured raw audio data to xiaozhi-server for processing.
    • Receiving synthesized voice responses from xiaozhi-server and playing them through speakers.
    • Controlling other connected peripherals or IoT devices (such as smart bulbs, sensors, etc.) based on instructions received from xiaozhi-server.
  2. xiaozhi-server (Core AI Engine - Python Implementation): This Python-based server is the "brain" of the entire system, responsible for handling all voice-related logic and AI interactions. Its key responsibilities are detailed as follows:

    • Establishing stable, low-latency real-time bidirectional communication links with ESP32 devices through the WebSocket protocol.
    • Receiving audio streams from ESP32 and using Voice Activity Detection (VAD) technology to precisely segment valid speech segments.
    • Integrating and calling Automatic Speech Recognition (ASR) services (configurable for local or cloud), converting speech segments to text.
    • Interacting with Large Language Models (LLMs) to parse user intent, generate intelligent responses, and support complex natural language understanding tasks.
    • Managing context information and user memory in multi-turn dialogues to provide coherent interaction experiences.
    • Calling Text-to-Speech (TTS) services to synthesize natural and fluent speech from LLM-generated text responses.
    • Executing custom commands through a flexible plugin system, including IoT device control logic.
    • Obtaining its detailed runtime operation configuration from the manager-api service.
  3. manager-api (Management Backend - Java Spring Boot Implementation): This is an application built using the Java Spring Boot framework, providing a secure RESTful API for system management and configuration. It serves not only as the backend support for the manager-web console but also as the configuration data source for xiaozhi-server. Its core functions include:

    • Providing user authentication (login, permission verification) and user account management functions for the Web console.
    • Registration, information management of ESP32 devices, and maintenance of device-specific configurations.
    • Persistently storing system configurations in the MySQL database, such as user-selected AI service providers, API keys, device parameters, plugin settings, etc.
    • Providing specific API endpoints for xiaozhi-server to pull its required latest configuration.
    • Managing TTS voice options, handling OTA (Over-The-Air) firmware update processes, and related metadata.
    • Utilizing Redis as a high-speed cache to store hotspot data (such as session information, frequently accessed configurations) to improve API response speed and overall system performance.
  4. manager-web (Web Control Panel - Vue.js Implementation): This is a Single Page Application (SPA) built with Vue.js, providing system administrators with a graphical, user-friendly operation interface. Its main capabilities include:

    • Conveniently configuring various AI services used by xiaozhi-server (such as ASR, LLM, TTS provider switching, parameter adjustment).
    • Managing platform user accounts, role assignment, and permission control.
    • Managing registered ESP32 devices and their related settings.
    • (Potential functionality) Monitoring system operation status, viewing logs, troubleshooting, etc.
    • Comprehensive interaction with all backend management functions provided by manager-api.

High-Level Interaction Flow Overview:

  • Voice Interaction Main Line: After the ESP32 device captures user voice, it transmits audio data in real-time to xiaozhi-server through WebSocket. After xiaozhi-server completes a series of AI processing (VAD, ASR, LLM interaction, TTS), it sends the synthesized voice response back to the ESP32 device for playback through WebSocket. All real-time interactions directly related to voice are completed in this link.
  • Management Configuration Main Line: Administrators access the manager-web console through a browser. manager-web executes various management operations (such as modifying configurations, managing users or devices) by calling RESTful HTTP interfaces provided by manager-api. Data is passed between them in JSON format.
  • Configuration Synchronization: xiaozhi-server actively pulls its latest operation configuration from manager-api through HTTP requests when starting or when specific update mechanisms are triggered. This ensures that configuration changes made by administrators in the Web interface can be effectively applied to the operation of the core AI engine in a timely manner.

This frontend-backend separation, core service and management service separation architectural design allows xiaozhi-server to focus on efficient real-time AI processing tasks, while manager-api and manager-web together provide a powerful and easy-to-use management and configuration platform. Each component has clear responsibilities, facilitating independent development, testing, deployment, and expansion.

xiaozhi-esp32-server
  ├─ xiaozhi-server Port 8000 Python development Responsible for ESP32 communication
  ├─ manager-web Port 8001 Node.js+Vue development Responsible for providing web interface for console
  ├─ manager-api Port 8002 Java development Responsible for providing console API

3. Component Deep Dive

3.1. xiaozhi-server (Core AI Engine - Python Implementation)

The xiaozhi-server is the intelligent core of the system, responsible for processing voice interactions, interfacing with AI services, and managing communication with ESP32 devices.

  • Purpose:

    • To provide real-time processing of voice commands from ESP32 devices.
    • To integrate with various AI services for Speech-to-Text (ASR), Natural Language Understanding (via Large Language Models - LLMs), Text-to-Speech (TTS), Voice Activity Detection (VAD), Intent Recognition, and Memory.
    • To manage dialogue flow and context with users.
    • To execute custom functions and control IoT devices based on user commands.
    • To be dynamically configurable through the manager-api.
  • Core Technologies:

    • Python 3: The primary programming language.
    • Asyncio: Python's asynchronous programming framework, crucial for handling concurrent WebSocket connections and non-blocking I/O for AI service API calls.
    • websockets Library: For WebSocket server implementation.
    • HTTP Client (e.g., aiohttp, httpx): For asynchronous HTTP requests to manager-api and external AI services.
    • YAML (PyYAML): For local configuration file parsing.
  • Key Implementation Aspects:

    1. AI Service Provider Pattern (core/providers/):

      • Concept: A flexible design for integrating AI services. Each service type (ASR, TTS, LLM, etc.) has an abstract base class defining a common interface. Concrete classes implement this interface for specific vendors or local models.
      • Benefit: Allows easy switching of AI service backends via configuration and simplifies adding new service integrations.
      • Initialization: core/utils/modules_initialize.py acts as a factory to load and instantiate configured providers.
    2. WebSocket Communication & Connection Handling (core/websocket_server.py, core/connection.py):

      • Server Setup: Manages WebSocket connections from ESP32 devices.
      • Connection Isolation: Each ESP32 client gets a dedicated ConnectionHandler instance, isolating its session state and dialogue.
      • Dynamic Configuration Updates: Can fetch updated configurations from manager-api and re-initialize AI service modules live, without a full server restart.
    3. Message Handling & Dialogue Flow (core/handle/):

      • Employs a modular handler pattern. The ConnectionHandler dispatches message processing to specialized modules based on message type or dialogue phase (e.g., receiveAudioHandle.py for audio input, intentHandler.py for NLU, functionHandler.py for plugin execution, sendAudioHandle.py for TTS output).
    4. Plugin System for Extensible Functions (plugins_func/):

      • Purpose: Allows adding custom "skills" (e.g., weather, news, Home Assistant control).
      • Mechanism: Plugins define functions and schemas. The LLM can request execution of these functions (function calling). loadplugins.py and register.py manage plugin discovery and registration.
    5. Configuration Management (config/):

      • Loads settings from a local config.yaml and merges them with configurations fetched from manager-api (via manage_api_client.py), enabling remote dynamic configuration.
      • logger.py sets up structured application logging.
      • config/assets/ stores predefined audio files for system notifications.
    6. Auxiliary HTTP Server (core/http_server.py):

      • Handles specific HTTP requests, notably for OTA firmware updates (/xiaozhi/ota/) and other utility endpoints.

3.2. manager-api (Management Backend - Java Spring Boot Implementation)

The manager-api component is a backend server built using Java and the Spring Boot framework, serving as the administrative hub.

  • Purpose:

    • Provide a secure RESTful API for the manager-web frontend.
    • Act as a centralized configuration provider for xiaozhi-server.
    • Manage persistent data (users, devices, AI configurations, voice timbres, OTA firmware).
  • Core Technologies:

    • Java 21 & Spring Boot 3: Core language and framework.
    • Spring MVC: For building REST controllers.
    • MyBatis-Plus: ORM for database interaction with MySQL.
    • MySQL: Relational database.
    • Druid: JDBC connection pool.
    • Redis (Spring Data Redis): For caching.
    • Apache Shiro: Security framework for authentication and authorization.
    • Liquibase: Database schema migration.
    • Knife4j: OpenAPI (Swagger) API documentation.
    • Maven: Build and dependency management.
  • Key Implementation Aspects:

    1. Modular Architecture (modules/ package):

      • Business logic is organized into distinct modules (e.g., sys for users/roles, agent for assistant configs, device for ESP32s, config for xiaozhi-server settings, security, timbre, ota).
      • Each module typically follows a layered pattern: Controller, Service, DAO (Mapper), Entity, DTO.
    2. Layered Architecture:

      • Controller Layer (@RestController): Defines API endpoints, handles HTTP request/response.
      • Service Layer (@Service): Contains business logic, transaction management.
      • Data Access Layer (MyBatis-Plus Mappers): Interacts with the MySQL database.
    3. Common Functionalities (common/ package):

      • Provides shared code: base classes, global configurations (Spring, MyBatis, Redis, Knife4j), custom annotations (e.g., @LogOperation), AOP aspects, global exception handling, utility classes, and XSS protection.
    4. Security (Apache Shiro):

      • Manages user authentication and permissions for accessing API endpoints. Configured with Shiro Realms and security filters.
    5. Database Schema Management (Liquibase):

      • Ensures consistent database structure across environments through versioned schema changes.

3.3. manager-web (Web Control Panel - Vue.js Implementation)

The manager-web is a Single Page Application (SPA) providing the administrative user interface.

  • Purpose:

    • Offer a web-based control panel for system configuration and management.
    • Enable administrators to configure xiaozhi-server's AI services, manage users and devices, customize voice timbres, and handle OTA updates.
  • Core Technologies:

    • Vue.js 2 & Vue CLI: Core JavaScript framework and build tools.
    • Vue Router: For client-side routing within the SPA.
    • Vuex: For centralized state management.
    • Element UI: UI component library for a consistent look and feel.
    • SCSS: CSS preprocessor.
    • HTTP Client (Flyio or Axios): For API calls to manager-api.
    • Workbox: For PWA features (caching, service worker).
    • Opus Libraries: For potential in-browser audio recording/playback.
  • Key Implementation Aspects:

    1. SPA Structure: Single HTML page with dynamic view updates.
    2. Component-Based Architecture: UI built from reusable Vue components (.vue files in src/views/ for pages and src/components/ for smaller elements).
    3. Client-Side Routing (src/router/index.js): Maps browser URLs to view components, with route guards for authentication.
    4. State Management (src/store/index.js): Vuex manages global state (user info, device lists, etc.) via state, getters, mutations, and actions (often involving API calls).
    5. API Communication (src/apis/): Modularized API service files make asynchronous calls to manager-api.
    6. Build Process & PWA Features: Vue CLI (Webpack) bundles assets. Workbox enables PWA features like caching.
    7. Environment Configuration (.env files): Manages settings like the manager-api base URL for different environments.

4. Data Flow and Interaction Mechanisms

The xiaozhi-esp32-server system coordinates work through well-defined data flows and interaction protocols between components. The main communication methods rely on WebSocket protocol optimized for real-time interaction and RESTful API suitable for client-server requests.

4.1. Core Voice Interaction Flow (ESP32 Device <-> xiaozhi-server)

This flow is real-time, primarily using WebSocket for low-latency, bidirectional data exchange.

  • Communication Protocol Documentation:

    • Detailed communication protocol documentation can be accessed at: https://ccnphfhqs21z.feishu.cn/wiki/M0XiwldO9iJwHikpXD5cEx71nKh
    • This document details the WebSocket communication protocol between ESP32 devices and xiaozhi-server, including:
      • Connection establishment and handshake process
      • Audio data transmission format
      • Control command format
      • Status report format
      • Error handling mechanism
  • Connection Establishment and Handshake:

    • The ESP32 device, as a client, actively initiates a WebSocket connection request to the specified endpoint of xiaozhi-server (e.g., ws://<server-IP>:<WebSocket-port>/xiaozhi/v1/).
    • xiaozhi-server (core/websocket_server.py) receives the connection and instantiates an independent ConnectionHandler object for each successfully connected ESP32 device to manage the entire lifecycle of that session.
    • After the connection is established, an initial handshake process may be executed (handled by core/handle/helloHandle.py) to exchange device identification, authentication information, protocol version, or basic status.
  • Audio Uplink Transmission (ESP32 -> xiaozhi-server):

    • After a user speaks to the ESP32 device, the device's microphone captures raw audio data (usually in PCM or compressed formats like Opus).
    • The ESP32 pushes these audio data chunks as WebSocket binary messages in real-time to the corresponding ConnectionHandler in xiaozhi-server.
    • The server-side core/handle/receiveAudioHandle.py module is responsible for receiving, buffering, and processing these audio data.
  • AI Core Processing (within xiaozhi-server):

    • VAD (Voice Activity Detection): receiveAudioHandle.py uses the configured VAD provider (such as SileroVAD) to analyze the audio stream, accurately identifying the start and end points of speech, filtering out silent or noise segments.
    • ASR (Automatic Speech Recognition): Detected valid speech segments are sent to the configured ASR provider (local such as FunASR, or cloud services). The ASR engine converts audio signals into text strings.
    • NLU/LLM (Natural Language Understanding/Large Language Model): The ASR output text, along with the current dialogue context history obtained from the Memory provider, and the description schemas of available functions (tools) loaded from plugins_func/, are passed to the configured LLM provider.
    • Function Call Execution (if LLM decides needed): If the LLM analysis determines that an external function needs to be called (e.g., querying weather, controlling home appliances), it generates a structured function call request. core/handle/functionHandler.py receives this request, finds and executes the corresponding Python function defined in plugins_func/, and returns the function's execution result to the LLM. The LLM then generates the final natural language response based on this result.
    • Response Generation: The LLM synthesizes all information (user input, context, function call results, etc.) to generate the final text response.
    • Memory Update: The current round of interaction (user question, LLM response, possible function calls) is processed by the Memory provider to update the dialogue history for subsequent interactions.
    • TTS (Text-to-Speech): The final text response generated by the LLM is sent to the configured TTS provider, which synthesizes the text into a speech data stream (e.g., MP3 or WAV format).
  • Audio Downlink Response (xiaozhi-server -> ESP32):

    • The speech data stream synthesized by the TTS provider is sent in real-time as WebSocket binary messages back to the ESP32 device through the core/handle/sendAudioHandle.py module.
    • The ESP32 device receives these audio data chunks and immediately plays them to the user through the speaker.
  • Control and Status Messages (Bidirectional):

    • In addition to audio streams, ESP32 and xiaozhi-server also exchange text messages through WebSocket, these messages are usually encapsulated in JSON format.
    • ESP32 -> Server: The device may send status reports (such as network conditions, microphone status), error codes, or specific control commands (e.g., "stop TTS playback" triggered by user button press).
    • Server -> ESP32: The server may send control instructions to the device (such as "start listening", "stop listening", adjust sensitivity, send specific configuration parameters).
    • Modules like core/handle/abortHandle.py (handling interrupt requests), core/handle/reportHandle.py (handling device reports) are responsible for parsing and responding to these control/status messages.

4.2. Management and Configuration Flow (manager-web <-> manager-api <-> xiaozhi-server)

This flow primarily relies on HTTP/HTTPS-based RESTful API for request-response interactions.

  • Administrator UI Backend Interaction (manager-web -> manager-api):

    • When administrators perform operations in the manager-web interface (e.g., saving a configuration, adding a new user, registering an ESP32 device):
      • The Vue.js frontend application (manager-web) will initiate asynchronous HTTP requests (usually GET, POST, PUT, DELETE) to the corresponding REST API endpoints of manager-api through its API encapsulation module (located in src/apis/module/).
      • Request and response bodies typically use JSON format.
      • The @RestController classes in manager-api receive these requests. The Apache Shiro framework will first perform authentication and authorization checks on the requests.
      • After verification, the Controller distributes the request to the corresponding Service layer to handle business logic. The Service layer may interact with the MySQL database (through MyBatis-Plus) and may utilize Redis for data caching.
      • After processing, manager-api returns an HTTP response in JSON format to manager-web.
      • manager-web updates its Vuex state store and user interface display based on the response results.
  • Configuration Synchronization (manager-api -> xiaozhi-server):

    • The operation of xiaozhi-server depends on dynamic configurations obtained from manager-api (such as currently selected AI service providers and their API keys).
    • Pull Mechanism: The config/manage_api_client.py module within xiaozhi-server, when the server starts or through specific update triggers (e.g., when WebSocketServer.update_config() is called), will initiate an HTTP GET request to a specified endpoint of manager-api (e.g., provided by a Controller in modules/config/controller/).
    • manager-api responds to this request, returning the configuration data required by xiaozhi-server (in JSON format).
    • After receiving the configuration, xiaozhi-server will update its internal state and may reinitialize relevant AI service modules to make the new configuration effective.
  • OTA Firmware Update Flow (Conceptual Description):

    • Administrators upload new ESP32 firmware packages to specific endpoints of manager-api through the manager-web interface.
    • manager-api stores the firmware files and records related metadata (version number, applicable device models, etc.).
    • When administrators trigger OTA updates for specific devices:
      • manager-api may notify xiaozhi-server (the specific notification mechanism may be a polling checkpoint, or xiaozhi-server exposes an API to receive update notifications, or more loosely coupled like message queues).
      • xiaozhi-server can then send an instruction message containing the firmware download URL to the target ESP32 device through WebSocket.
      • After receiving the instruction, the ESP32 device downloads the firmware through an HTTP GET request from that URL. This URL may point to a path served by the SimpleHttpServer running on xiaozhi-server itself (such as /xiaozhi/ota/), or in some architectures, it may directly point to manager-api or a dedicated file server.

4.3. Main Protocol Summary:

  • WebSocket: Selected for the communication link between ESP32 and xiaozhi-server because it is very suitable for real-time, low-latency, bidirectional data stream transmission (especially audio), as well as asynchronous control message delivery.
  • RESTful APIs (based on HTTP/HTTPS, usually using JSON as the data exchange format): This is the standard way for web service communication. Used for request-response interactions between manager-web (client) and manager-api (server), and also for xiaozhi-server (as client) to pull configuration information from manager-api (as server). Its stateless nature, wide library support, and easy-to-understand semantics make it an ideal choice for such interactions.

This multi-protocol communication strategy ensures that different types of interaction requirements within the system can be handled efficiently and appropriately, balancing real-time performance and standardized request-response patterns.


5. Key Features Summary

The xiaozhi-esp32-server system provides a series of rich features aimed at supporting developers in building advanced voice control applications:

  1. Comprehensive Voice Interaction Backend: Provides an end-to-end solution from voice capture guidance to response generation and action execution.
  2. Modular and Pluggable AI Services:
    • Supports a wide range of ASR (Automatic Speech Recognition), LLM (Large Language Model), TTS (Text-to-Speech), VAD (Voice Activity Detection), Intent Recognition, and Memory providers.
    • Allows dynamic selection and configuration of these services (including cloud-based APIs and local models) to balance cost, performance, privacy, and language requirements.
  3. Advanced Dialogue Management:
    • Supports natural interaction, with wake word to start dialogue, manual (push-to-talk) dialogue, and real-time interruption of system responses.
    • Includes contextual memory to maintain coherence in multi-turn dialogues.
    • Has automatic sleep mode after a period of inactivity.
  4. Multi-language Capabilities:
    • Supports recognition and synthesis in multiple languages, including Mandarin, Cantonese, English, Japanese, and Korean (specific capabilities depend on the selected ASR/LLM/TTS providers).
  5. Extensible Functions through Plugins:
    • Powerful plugin system allows developers to add custom "skills" or functions (e.g., getting weather, controlling smart home devices, accessing news).
    • These functions can be triggered by the LLM using its function calling capability, based on provided schemas.
    • Built-in support for Home Assistant integration.
  6. IoT Device Control:
    • Designed to manage and control smart home devices and other IoT hardware through voice commands, utilizing the plugin system.
  7. Web-based Management Console (manager-web & manager-api):
    • Provides a comprehensive graphical interface for:
      • System configuration (AI service selection, API keys, operation parameters).
      • Role-based access control user management.
      • ESP32 device registration and management.
      • Voice timbre/TTS voice customization.
      • OTA (Over-The-Air) firmware update management for ESP32 devices.
      • System parameter and dictionary management.
  8. Flexible Deployment Options:
    • Supports deployment through Docker containers (for simplified server-only or full-stack setup) and directly from source code, adapting to various environments and user expertise.
  9. Dynamic Remote Configuration:
    • xiaozhi-server can obtain its configuration from manager-api, allowing real-time updates of AI providers and settings without restarting the server.
  10. Open Source and Community-Driven:
    • Licensed under MIT License, encouraging transparency, collaboration, and community contribution.
  11. Cost-Effective Solution:
    • Provides an "Entry Level Free Settings" path, utilizing free tiers of AI services or local models, making it easy to experiment and for personal projects.
  12. Progressive Web Application (PWA) Features:
    • The manager-web control panel includes Service Worker integration to enhance caching and potential offline access capabilities.
  13. Detailed API Documentation:
    • manager-api provides OpenAPI (Swagger) documentation through Knife4j for clear understanding and testing of its RESTful endpoints.

These features together make xiaozhi-esp32-server a powerful, adaptable, and user-friendly platform for building complex voice interaction applications.


6. Deployment and Configuration Overview

The xiaozhi-esp32-server system is designed with flexibility in mind, providing multiple deployment methods and comprehensive configuration options to adapt to different usage scenarios and requirements.

Deployment Options:

The project can be deployed in multiple ways, mainly including using Docker to simplify the installation process, or deploying directly from source code for greater control and development.

  1. Docker-based Deployment:

    • Simplified Installation (Only xiaozhi-server): This option only deploys the core Python-based xiaozhi-server. It is suitable for users who mainly need voice AI processing capabilities and IoT control, without requiring the complete Web management interface and database support functions (such as OTA). In this mode, configuration is typically managed through local files (config.yaml), but if needed, it can still point to an existing manager-api instance.
    • Full Module Installation (All Components): This scheme deploys all core components: xiaozhi-server, Java-based manager-api, and Vue.js-based manager-web, along with required database services (MySQL and Redis). This provides a complete system experience, including a Web control panel for comprehensive configuration and management.
    • The project provides Dockerfile definitions for each service and uses docker-compose.yml files (e.g., docker-compose.yml for basic version, docker-compose_all.yml for full-featured version) to orchestrate and manage multi-container deployment. Additionally, a docker-setup.sh script may be provided to assist in automating part of the Docker environment setup work.
  2. Source Code Deployment:

    • This method requires manual setup of the corresponding development environment for each component: Python environment for xiaozhi-server, Java/Maven environment for manager-api, Node.js/Vue CLI environment for manager-web.
    • For full module installation, MySQL and Redis database services also need to be manually installed and configured.
    • This approach is typically used for project development, deep customization, debugging, or in production scenarios with special environmental requirements.

Configuration Management:

Configuration is key to customizing system behavior, especially in selecting AI service providers and managing API keys.

  1. xiaozhi-server Configuration:

    • Local config.yaml: A main YAML format configuration file located in the xiaozhi-server root directory. It defines server ports, selected AI service providers (ASR, LLM, TTS, VAD, Intent Recognition, Memory modules, etc.), their respective API keys or model paths, plugin configurations, and log levels.
    • Remote Configuration through manager-api: xiaozhi-server is designed to obtain its operation configuration from manager-api. Settings obtained from manager-api typically override settings with the same name in the local config.yaml. This brings two major benefits:
      • Centralized Management: All configurations can be managed uniformly through the manager-web interface.
      • Dynamic Updates: xiaozhi-server can refresh its configuration and reinitialize AI modules without completely restarting the service.
    • config/config_loader.py and config/manage_api_client.py in xiaozhi-server are responsible for handling configuration loading, merging, and pulling logic from manager-api.
  2. manager-api Configuration:

    • As a Spring Boot application, its configuration is mainly managed through the application.properties or application.yml file located in the src/main/resources directory.
    • Key configuration items include: database connection information (MySQL URL, username, password), Redis server address and port, application service port (default 8002), Apache Shiro security-related settings, and configuration parameters for any integrated third-party services (such as Aliyun SMS).
  3. manager-web Configuration:

    • Environment-specific settings for the Vue.js frontend application are managed through .env series files (e.g., .env, .env.development, .env.production) in the project root directory.
    • The most critical configuration here is usually the API base URL address of the manager-api backend (e.g., VUE_APP_API_BASE_URL), to which the frontend application will send all API requests.
  4. Predefined Configuration Schemes:

    • The project documentation (usually README) will recommend some common configuration combinations, for example:
      • "Entry Level Free Settings": This scheme aims to utilize free tier quotas of cloud AI services or completely free local models to minimize users' initial usage costs and operating expenses.
      • "Full Streaming Configuration": This scheme prioritizes system response speed and interaction fluency, typically choosing AI services that support streaming processing (possibly paid).
    • These predefined schemes provide guidance for users to configure AI service providers in xiaozhi-server (through the manager-web interface or directly modifying config.yaml).

In the case of full module deployment, it is recommended to use the manager-web control panel as the main operation interface for most configuration tasks, as it provides a user-friendly way to manage various settings that are persisted by manager-api and ultimately used by xiaozhi-server.