diff --git a/dsRagAnything/Doc/文档.txt b/dsRagAnything/Doc/文档.txt new file mode 100644 index 00000000..a582e27d --- /dev/null +++ b/dsRagAnything/Doc/文档.txt @@ -0,0 +1,7 @@ +https://github.com/HKUDS/RAG-Anything + +# 创建虚拟环境 +conda create -n raganything python=3.10 + +# 激活虚拟环境 +conda activate raganything \ No newline at end of file diff --git a/dsRagAnything/README.md b/dsRagAnything/README.md deleted file mode 100644 index 3b0a3e87..00000000 --- a/dsRagAnything/README.md +++ /dev/null @@ -1,842 +0,0 @@ -
- -
- RAG-Anything Logo -
- -# 🚀 RAG-Anything: All-in-One RAG System - -
- Typing Animation -
- -
-
-

- - - -

-

- - - -

-

- - -

-

- - -

-
-
- -
- -
-
-
- -
- - - -
- ---- - -## 🎉 News -- [X] [2025.07.04]🎯📢 RAGAnything now supports query with multimodal content, enabling enhanced retrieval-augmented generation with integrated text, images, tables, and equations processing. -- [X] [2025.07.03]🎯📢 RAGAnything has reached 1K🌟 stars on GitHub! Thank you for your support and contributions. - ---- - -## 🌟 System Overview - -*Next-Generation Multimodal Intelligence* - -
- -Modern documents increasingly contain diverse multimodal content—text, images, tables, equations, charts, and multimedia—that traditional text-focused RAG systems cannot effectively process. **RAG-Anything** addresses this challenge as a comprehensive **All-in-One Multimodal Document Processing RAG system** built on [LightRAG](https://github.com/HKUDS/LightRAG). - -As a unified solution, RAG-Anything **eliminates the need for multiple specialized tools**. It provides **seamless processing and querying across all content modalities** within a single integrated framework. Unlike conventional RAG approaches that struggle with non-textual elements, our all-in-one system delivers **comprehensive multimodal retrieval capabilities**. - -Users can query documents containing **interleaved text**, **visual diagrams**, **structured tables**, and **mathematical formulations** through **one cohesive interface**. This consolidated approach makes RAG-Anything particularly valuable for academic research, technical documentation, financial reports, and enterprise knowledge management where rich, mixed-content documents demand a **unified processing framework**. - -RAG-Anything - -
- -### 🎯 Key Features - -
- -- **🔄 End-to-End Multimodal Pipeline** - Complete workflow from document ingestion and parsing to intelligent multimodal query answering -- **📄 Universal Document Support** - Seamless processing of PDFs, Office documents, images, and diverse file formats -- **🧠 Specialized Content Analysis** - Dedicated processors for images, tables, mathematical equations, and heterogeneous content types -- **🔗 Multimodal Knowledge Graph** - Automatic entity extraction and cross-modal relationship discovery for enhanced understanding -- **⚡ Adaptive Processing Modes** - Flexible MinerU-based parsing or direct multimodal content injection workflows -- **🎯 Hybrid Intelligent Retrieval** - Advanced search capabilities spanning textual and multimodal content with contextual understanding - -
- ---- - -## 🏗️ Algorithm & Architecture - -
- -### Core Algorithm - -**RAG-Anything** implements an effective **multi-stage multimodal pipeline** that fundamentally extends traditional RAG architectures to seamlessly handle diverse content modalities through intelligent orchestration and cross-modal understanding. - -
- -
-
-
-
-
📄
-
Document Parsing
-
-
-
-
🧠
-
Content Analysis
-
-
-
-
🔍
-
Knowledge Graph
-
-
-
-
🎯
-
Intelligent Retrieval
-
-
-
-
- -### 1. Document Parsing Stage - -
- -The system provides high-fidelity document extraction through adaptive content decomposition. It intelligently segments heterogeneous elements while preserving contextual relationships. Universal format compatibility is achieved via specialized optimized parsers. - -**Key Components:** - -- **⚙️ MinerU Integration**: Leverages [MinerU](https://github.com/opendatalab/MinerU) for high-fidelity document structure extraction and semantic preservation across complex layouts. - -- **🧩 Adaptive Content Decomposition**: Automatically segments documents into coherent text blocks, visual elements, structured tables, mathematical equations, and specialized content types while preserving contextual relationships. - -- **📁 Universal Format Support**: Provides comprehensive handling of PDFs, Office documents (DOC/DOCX/PPT/PPTX/XLS/XLSX), images, and emerging formats through specialized parsers with format-specific optimization. - -
- -### 2. Multi-Modal Content Understanding & Processing - -
- -The system automatically categorizes and routes content through optimized channels. It uses concurrent pipelines for parallel text and multimodal processing. Document hierarchy and relationships are preserved during transformation. - -**Key Components:** - -- **🎯 Autonomous Content Categorization and Routing**: Automatically identify, categorize, and route different content types through optimized execution channels. - -- **⚡ Concurrent Multi-Pipeline Architecture**: Implements concurrent execution of textual and multimodal content through dedicated processing pipelines. This approach maximizes throughput efficiency while preserving content integrity. - -- **🏗️ Document Hierarchy Extraction**: Extracts and preserves original document hierarchy and inter-element relationships during content transformation. - -
- -### 3. Multimodal Analysis Engine - -
- -The system deploys modality-aware processing units for heterogeneous data modalities: - -**Specialized Analyzers:** - -- **🔍 Visual Content Analyzer**: - - Integrate vision model for image analysis. - - Generates context-aware descriptive captions based on visual semantics. - - Extracts spatial relationships and hierarchical structures between visual elements. - -- **📊 Structured Data Interpreter**: - - Performs systematic interpretation of tabular and structured data formats. - - Implements statistical pattern recognition algorithms for data trend analysis. - - Identifies semantic relationships and dependencies across multiple tabular datasets. - -- **📐 Mathematical Expression Parser**: - - Parses complex mathematical expressions and formulas with high accuracy. - - Provides native LaTeX format support for seamless integration with academic workflows. - - Establishes conceptual mappings between mathematical equations and domain-specific knowledge bases. - -- **🔧 Extensible Modality Handler**: - - Provides configurable processing framework for custom and emerging content types. - - Enables dynamic integration of new modality processors through plugin architecture. - - Supports runtime configuration of processing pipelines for specialized use cases. - -
- -### 4. Multimodal Knowledge Graph Index - -
- -The multi-modal knowledge graph construction module transforms document content into structured semantic representations. It extracts multimodal entities, establishes cross-modal relationships, and preserves hierarchical organization. The system applies weighted relevance scoring for optimized knowledge retrieval. - -**Core Functions:** - -- **🔍 Multi-Modal Entity Extraction**: Transforms significant multimodal elements into structured knowledge graph entities. The process includes semantic annotations and metadata preservation. - -- **🔗 Cross-Modal Relationship Mapping**: Establishes semantic connections and dependencies between textual entities and multimodal components. This is achieved through automated relationship inference algorithms. - -- **🏗️ Hierarchical Structure Preservation**: Maintains original document organization through "belongs_to" relationship chains. These chains preserve logical content hierarchy and sectional dependencies. - -- **⚖️ Weighted Relationship Scoring**: Assigns quantitative relevance scores to relationship types. Scoring is based on semantic proximity and contextual significance within the document structure. - -
- -### 5. Modality-Aware Retrieval - -
- -The hybrid retrieval system combines vector similarity search with graph traversal algorithms for comprehensive content retrieval. It implements modality-aware ranking mechanisms and maintains relational coherence between retrieved elements to ensure contextually integrated information delivery. - -**Retrieval Mechanisms:** - -- **🔀 Vector-Graph Fusion**: Integrates vector similarity search with graph traversal algorithms. This approach leverages both semantic embeddings and structural relationships for comprehensive content retrieval. - -- **📊 Modality-Aware Ranking**: Implements adaptive scoring mechanisms that weight retrieval results based on content type relevance. The system adjusts rankings according to query-specific modality preferences. - -- **🔗 Relational Coherence Maintenance**: Maintains semantic and structural relationships between retrieved elements. This ensures coherent information delivery and contextual integrity. - -
- ---- - -## 🚀 Quick Start - -*Initialize Your AI Journey* - -
- -
- -### Installation - -#### Option 1: Install from PyPI (Recommended) - -```bash -# Basic installation -pip install raganything - -# With optional dependencies for extended format support: -pip install 'raganything[all]' # All optional features -pip install 'raganything[image]' # Image format conversion (BMP, TIFF, GIF, WebP) -pip install 'raganything[text]' # Text file processing (TXT, MD) -pip install 'raganything[image,text]' # Multiple features -``` - -#### Option 2: Install from Source - -```bash -git clone https://github.com/HKUDS/RAG-Anything.git -cd RAG-Anything -pip install -e . - -# With optional dependencies -pip install -e '.[all]' -``` - -#### Optional Dependencies - -- **`[image]`** - Enables processing of BMP, TIFF, GIF, WebP image formats (requires Pillow) -- **`[text]`** - Enables processing of TXT and MD files (requires ReportLab) -- **`[all]`** - Includes all Python optional dependencies - -> **⚠️ Office Document Processing Requirements:** -> - Office documents (.doc, .docx, .ppt, .pptx, .xls, .xlsx) require **LibreOffice** installation -> - Download from [LibreOffice official website](https://www.libreoffice.org/download/download/) -> - **Windows**: Download installer from official website -> - **macOS**: `brew install --cask libreoffice` -> - **Ubuntu/Debian**: `sudo apt-get install libreoffice` -> - **CentOS/RHEL**: `sudo yum install libreoffice` - -**Check MinerU installation:** - -```bash -# Verify installation -mineru --version - -# Check if properly configured -python -c "from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_mineru_installation() else '❌ MinerU installation issue')" -``` - -Models are downloaded automatically on first use. For manual download, refer to [MinerU Model Source Configuration](https://github.com/opendatalab/MinerU/blob/master/README.md#22-model-source-configuration). - -### Usage Examples - -#### 1. End-to-End Document Processing - -```python -import asyncio -from raganything import RAGAnything -from lightrag.llm.openai import openai_complete_if_cache, openai_embed -from lightrag.utils import EmbeddingFunc - -async def main(): - # Initialize RAGAnything - rag = RAGAnything( - working_dir="./rag_storage", - llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache( - "gpt-4o-mini", - prompt, - system_prompt=system_prompt, - history_messages=history_messages, - api_key="your-api-key", - **kwargs, - ), - vision_model_func=lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache( - "gpt-4o", - "", - system_prompt=None, - history_messages=[], - messages=[ - {"role": "system", "content": system_prompt} if system_prompt else None, - {"role": "user", "content": [ - {"type": "text", "text": prompt}, - {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}} - ]} if image_data else {"role": "user", "content": prompt} - ], - api_key="your-api-key", - **kwargs, - ) if image_data else openai_complete_if_cache( - "gpt-4o-mini", - prompt, - system_prompt=system_prompt, - history_messages=history_messages, - api_key="your-api-key", - **kwargs, - ), - embedding_func=EmbeddingFunc( - embedding_dim=3072, - max_token_size=8192, - func=lambda texts: openai_embed( - texts, - model="text-embedding-3-large", - api_key=api_key, - base_url=base_url, - ), - ), - ) - - # Process a document - await rag.process_document_complete( - file_path="path/to/your/document.pdf", - output_dir="./output", - parse_method="auto" - ) - - # Query the processed content - # Pure text query - for basic knowledge base search - text_result = await rag.aquery( - "What are the main findings shown in the figures and tables?", - mode="hybrid" - ) - print("Text query result:", text_result) - - # Multimodal query with specific multimodal content - multimodal_result = await rag.aquery_with_multimodal( - "Explain this formula and its relevance to the document content", - multimodal_content=[{ - "type": "equation", - "latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}", - "equation_caption": "Document relevance probability" - }], - mode="hybrid" -) - print("Multimodal query result:", multimodal_result) - -if __name__ == "__main__": - asyncio.run(main()) -``` - -#### 2. Direct Multimodal Content Processing - -```python -import asyncio -from lightrag import LightRAG -from raganything.modalprocessors import ImageModalProcessor, TableModalProcessor - -async def process_multimodal_content(): - # Initialize LightRAG - rag = LightRAG( - working_dir="./rag_storage", - # ... your LLM and embedding configurations - ) - await rag.initialize_storages() - - # Process an image - image_processor = ImageModalProcessor( - lightrag=rag, - modal_caption_func=your_vision_model_func - ) - - image_content = { - "img_path": "path/to/image.jpg", - "img_caption": ["Figure 1: Experimental results"], - "img_footnote": ["Data collected in 2024"] - } - - description, entity_info = await image_processor.process_multimodal_content( - modal_content=image_content, - content_type="image", - file_path="research_paper.pdf", - entity_name="Experimental Results Figure" - ) - - # Process a table - table_processor = TableModalProcessor( - lightrag=rag, - modal_caption_func=your_llm_model_func - ) - - table_content = { - "table_body": """ - | Method | Accuracy | F1-Score | - |--------|----------|----------| - | RAGAnything | 95.2% | 0.94 | - | Baseline | 87.3% | 0.85 | - """, - "table_caption": ["Performance Comparison"], - "table_footnote": ["Results on test dataset"] - } - - description, entity_info = await table_processor.process_multimodal_content( - modal_content=table_content, - content_type="table", - file_path="research_paper.pdf", - entity_name="Performance Results Table" - ) - -if __name__ == "__main__": - asyncio.run(process_multimodal_content()) -``` - -#### 3. Batch Processing - -```python -# Process multiple documents -await rag.process_folder_complete( - folder_path="./documents", - output_dir="./output", - file_extensions=[".pdf", ".docx", ".pptx"], - recursive=True, - max_workers=4 -) -``` - -#### 4. Custom Modal Processors - -```python -from raganything.modalprocessors import GenericModalProcessor - -class CustomModalProcessor(GenericModalProcessor): - async def process_multimodal_content(self, modal_content, content_type, file_path, entity_name): - # Your custom processing logic - enhanced_description = await self.analyze_custom_content(modal_content) - entity_info = self.create_custom_entity(enhanced_description, entity_name) - return await self._create_entity_and_chunk(enhanced_description, entity_info, file_path) -``` - -#### 5. Query Options - -RAG-Anything provides two types of query methods: - -**Pure Text Queries** - Direct knowledge base search using LightRAG: -```python -# Different query modes for text queries -text_result_hybrid = await rag.aquery("Your question", mode="hybrid") -text_result_local = await rag.aquery("Your question", mode="local") -text_result_global = await rag.aquery("Your question", mode="global") -text_result_naive = await rag.aquery("Your question", mode="naive") - -# Synchronous version -sync_text_result = rag.query("Your question", mode="hybrid") -``` - -**Multimodal Queries** - Enhanced queries with multimodal content analysis: -```python -# Query with table data -table_result = await rag.aquery_with_multimodal( - "Compare these performance metrics with the document content", - multimodal_content=[{ - "type": "table", - "table_data": """Method,Accuracy,Speed - RAGAnything,95.2%,120ms - Traditional,87.3%,180ms""", - "table_caption": "Performance comparison" - }], - mode="hybrid" -) - -# Query with equation content -equation_result = await rag.aquery_with_multimodal( - "Explain this formula and its relevance to the document content", - multimodal_content=[{ - "type": "equation", - "latex": "P(d|q) = \\frac{P(q|d) \\cdot P(d)}{P(q)}", - "equation_caption": "Document relevance probability" - }], - mode="hybrid" -) -``` - -#### 6. Loading Existing LightRAG Instance - -```python -import asyncio -from raganything import RAGAnything -from lightrag import LightRAG -from lightrag.llm.openai import openai_complete_if_cache, openai_embed -from lightrag.utils import EmbeddingFunc -import os - -async def load_existing_lightrag(): - # First, create or load an existing LightRAG instance - lightrag_working_dir = "./existing_lightrag_storage" - - # Check if previous LightRAG instance exists - if os.path.exists(lightrag_working_dir) and os.listdir(lightrag_working_dir): - print("✅ Found existing LightRAG instance, loading...") - else: - print("❌ No existing LightRAG instance found, will create new one") - - # Create/Load LightRAG instance with your configurations - lightrag_instance = LightRAG( - working_dir=lightrag_working_dir, - llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache( - "gpt-4o-mini", - prompt, - system_prompt=system_prompt, - history_messages=history_messages, - api_key="your-api-key", - **kwargs, - ), - embedding_func=EmbeddingFunc( - embedding_dim=3072, - max_token_size=8192, - func=lambda texts: openai_embed( - texts, - model="text-embedding-3-large", - api_key=api_key, - base_url=base_url, - ), - ) - ) - - # Initialize storage (this will load existing data if available) - await lightrag_instance.initialize_storages() - - # Now initialize RAGAnything with the existing LightRAG instance - rag = RAGAnything( - lightrag=lightrag_instance, # Pass the existing LightRAG instance - # Only need vision model for multimodal processing - vision_model_func=lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache( - "gpt-4o", - "", - system_prompt=None, - history_messages=[], - messages=[ - {"role": "system", "content": system_prompt} if system_prompt else None, - {"role": "user", "content": [ - {"type": "text", "text": prompt}, - {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}} - ]} if image_data else {"role": "user", "content": prompt} - ], - api_key="your-api-key", - **kwargs, - ) if image_data else openai_complete_if_cache( - "gpt-4o-mini", - prompt, - system_prompt=system_prompt, - history_messages=history_messages, - api_key="your-api-key", - **kwargs, - ) - # Note: working_dir, llm_model_func, embedding_func, etc. are inherited from lightrag_instance - ) - - # Query the existing knowledge base - result = await rag.query_with_multimodal( - "What data has been processed in this LightRAG instance?", - mode="hybrid" - ) - print("Query result:", result) - - # Add new multimodal documents to the existing LightRAG instance - await rag.process_document_complete( - file_path="path/to/new/multimodal_document.pdf", - output_dir="./output" - ) - -if __name__ == "__main__": - asyncio.run(load_existing_lightrag()) -``` - ---- - -## 🛠️ Examples - -*Practical Implementation Demos* - -
- -
- -The `examples/` directory contains comprehensive usage examples: - -- **`raganything_example.py`**: End-to-end document processing with MinerU -- **`modalprocessors_example.py`**: Direct multimodal content processing -- **`office_document_test.py`**: Office document parsing test with MinerU (no API key required) -- **`image_format_test.py`**: Image format parsing test with MinerU (no API key required) -- **`text_format_test.py`**: Text format parsing test with MinerU (no API key required) - -**Run examples:** - -```bash -# End-to-end processing -python examples/raganything_example.py path/to/document.pdf --api-key YOUR_API_KEY - -# Direct modal processing -python examples/modalprocessors_example.py --api-key YOUR_API_KEY - -# Office document parsing test (MinerU only) -python examples/office_document_test.py --file path/to/document.docx - -# Image format parsing test (MinerU only) -python examples/image_format_test.py --file path/to/image.bmp - -# Text format parsing test (MinerU only) -python examples/text_format_test.py --file path/to/document.md - -# Check LibreOffice installation -python examples/office_document_test.py --check-libreoffice --file dummy - -# Check PIL/Pillow installation -python examples/image_format_test.py --check-pillow --file dummy - -# Check ReportLab installation -python examples/text_format_test.py --check-reportlab --file dummy -``` - ---- - -## 🔧 Configuration - -*System Optimization Parameters* - -### Environment Variables - -Create a `.env` file (refer to `.env.example`): - -```bash -OPENAI_API_KEY=your_openai_api_key -OPENAI_BASE_URL=your_base_url # Optional -``` - -> **Note**: API keys are only required for full RAG processing with LLM integration. The parsing test files (`office_document_test.py` and `image_format_test.py`) only test MinerU functionality and do not require API keys. - -### MinerU Configuration - -MinerU 2.0 uses a simplified configuration approach: - -```bash -# MinerU 2.0 uses command-line parameters instead of config files -# Check available options: -mineru --help - -# Common configurations: -mineru -p input.pdf -o output_dir -m auto # Automatic parsing mode -mineru -p input.pdf -o output_dir -m ocr # OCR-focused parsing -mineru -p input.pdf -o output_dir -b pipeline --device cuda # GPU acceleration -``` - -You can also configure MinerU through RAGAnything parameters: - -```python -# Configure parsing behavior -await rag.process_document_complete( - file_path="document.pdf", - parse_method="auto", # or "ocr", "txt" - device="cuda", # GPU acceleration - backend="pipeline", # parsing backend - lang="en" # language optimization -) -``` - -> **Note**: MinerU 2.0 no longer uses the `magic-pdf.json` configuration file. All settings are now passed as command-line parameters or function arguments. - -### Processing Requirements - -Different content types require specific optional dependencies: - -- **Office Documents** (.doc, .docx, .ppt, .pptx, .xls, .xlsx): Install [LibreOffice](https://www.libreoffice.org/download/download/) -- **Extended Image Formats** (.bmp, .tiff, .gif, .webp): Install with `pip install raganything[image]` -- **Text Files** (.txt, .md): Install with `pip install raganything[text]` - -> **📋 Quick Install**: Use `pip install raganything[all]` to enable all format support (Python dependencies only - LibreOffice still needs separate installation) - ---- - -## 🧪 Supported Content Types - -### Document Formats - -- **PDFs** - Research papers, reports, presentations -- **Office Documents** - DOC, DOCX, PPT, PPTX, XLS, XLSX -- **Images** - JPG, PNG, BMP, TIFF, GIF, WebP -- **Text Files** - TXT, MD - -### Multimodal Elements - -- **Images** - Photographs, diagrams, charts, screenshots -- **Tables** - Data tables, comparison charts, statistical summaries -- **Equations** - Mathematical formulas in LaTeX format -- **Generic Content** - Custom content types via extensible processors - -*For installation of format-specific dependencies, see the [Configuration](#-configuration) section.* - ---- - -## 📖 Citation - -*Academic Reference* - -
-
-
-
📖
-
-
-
-
- -If you find RAG-Anything useful in your research, please cite our paper: - -```bibtex -@article{guo2024lightrag, - title={LightRAG: Simple and Fast Retrieval-Augmented Generation}, - author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang}, - year={2024}, - eprint={2410.05779}, - archivePrefix={arXiv}, - primaryClass={cs.IR} -} -``` - ---- - -## 🔗 Related Projects - -*Ecosystem & Extensions* - -
- - - - - - -
- -
- -
- LightRAG
- Simple and Fast RAG -
-
- -
- 🎥 -
- VideoRAG
- Extreme Long-Context Video RAG -
-
- -
- -
- MiniRAG
- Extremely Simple RAG -
-
-
- ---- - -## ⭐ Star History - -*Community Growth Trajectory* - -
- - - - - Star History Chart - - -
- ---- - -## 🤝 Contribution - -*Join the Innovation* - -
- We thank all our contributors for their valuable contributions. -
- -
- - - -
- ---- - -
-
- -
-
- - - - - - - - - -
-
- -
-
-
- - Thank you for visiting RAG-Anything! - -
-
Building the Future of Multimodal AI
-
-