Spaces:
Running
Running
| title: Smart Document Parser | |
| emoji: π» | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.13.0 | |
| app_file: app.py | |
| pinned: false | |
| # π Smart Document Parser | |
| A powerful document parsing application that automatically extracts structured information from various document formats. | |
| ## π Features | |
| - **Multiple Format Support**: PDF, DOCX, TXT, HTML, and Markdown | |
| - **Rich Information Extraction**: | |
| - Document content with preserved formatting | |
| - Comprehensive metadata | |
| - Section breakdown | |
| - Named entity recognition | |
| - **Smart Processing**: | |
| - Automatic format detection | |
| - Confidence scoring | |
| - Error handling | |
| ## π― How to Use | |
| 1. **Upload Document**: Click the upload button or drag & drop your document | |
| 2. **Process**: Click "Process Document" | |
| 3. **View Results**: Explore the extracted information in different tabs: | |
| - π Content: Main document text | |
| - π Metadata: Document properties | |
| - π Sections: Document structure | |
| - π·οΈ Entities: Named entities | |
| ## π Supported Formats | |
| - PDF Documents (*.pdf) | |
| - Word Documents (*.docx) | |
| - Text Files (*.txt) | |
| - HTML Files (*.html) | |
| - Markdown Files (*.md) | |
| ## π οΈ Technical Details | |
| Built with: | |
| - Docling: Advanced document processing | |
| - Gradio: Interactive web interface | |
| - Pydantic: Type-safe data handling | |
| - Hugging Face Spaces: Cloud deployment | |
| ## π License | |
| MIT License |