Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

Tonic commited on 9 days ago

Commit

ab33e9d

unverified ·

2 Parent(s): 3ab54ea ca3a4f7

Initial demo testing (#4)

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.github/README.md +46 -26
.github/workflows/ci.yml +14 -14
.github/workflows/docs.yml +56 -0
.gitignore +4 -0
.pre-commit-config.yaml +6 -16
.pre-commit-hooks/run_pytest.ps1 +5 -0
.pre-commit-hooks/run_pytest.sh +5 -0
.pre-commit-hooks/run_pytest_embeddings.ps1 +14 -0
.pre-commit-hooks/run_pytest_embeddings.sh +15 -0
.pre-commit-hooks/run_pytest_unit.ps1 +14 -0
.pre-commit-hooks/run_pytest_unit.sh +15 -0
.pre-commit-hooks/run_pytest_with_sync.ps1 +25 -0
.pre-commit-hooks/run_pytest_with_sync.py +93 -0
=0.22.0 +0 -0
=0.22.0, +0 -0
CONTRIBUTING.md +0 -1
Makefile +9 -0
README.md +99 -173
.cursorrules → dev/.cursorrules +1 -0
AGENTS.txt → dev/AGENTS.txt +0 -0
dev/Makefile +51 -0
dev/docs_plugins.py +74 -0
docs/CONFIGURATION.md +0 -301
docs/api/agents.md +260 -0
docs/api/models.md +238 -0
docs/api/orchestrators.md +185 -0
docs/api/services.md +191 -0
docs/api/tools.md +225 -0
docs/architecture/agents.md +182 -0
docs/architecture/design-patterns.md +0 -1509
docs/architecture/graph-orchestration.md +152 -0
docs/architecture/graph_orchestration.md +8 -0
docs/architecture/middleware.md +132 -0
docs/architecture/orchestrators.md +198 -0
docs/architecture/overview.md +0 -474
docs/architecture/services.md +132 -0
docs/architecture/tools.md +165 -0
docs/architecture/workflow-diagrams.md +670 -0
docs/{workflow-diagrams.md → architecture/workflows.md} +0 -0
docs/brainstorming/00_ROADMAP_SUMMARY.md +0 -194
docs/brainstorming/01_PUBMED_IMPROVEMENTS.md +0 -125
docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md +0 -193
docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md +0 -211
docs/brainstorming/04_OPENALEX_INTEGRATION.md +0 -303
docs/brainstorming/implementation/15_PHASE_OPENALEX.md +0 -603
docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md +0 -586
docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md +0 -540
docs/brainstorming/implementation/README.md +0 -143
docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +0 -189
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +0 -289

.github/README.md CHANGED Viewed

@@ -7,7 +7,11 @@ sdk: gradio
 sdk_version: "6.0.1"
 python_version: "3.11"
 app_file: src/app.py
-pinned: false
 license: mit
 tags:
   - mcp-in-action-track-enterprise
@@ -19,6 +23,18 @@ tags:
   - modal
 ---
 # DeepCritical
 ## Intro
@@ -27,9 +43,10 @@ tags:
 - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
 - **LlamaIndex RAG**: Semantic search and evidence synthesis
-- **HuggingfaceInference**:
 - **HuggingfaceMCP Custom Config To Use Community Tools**:
 - **Strongly Typed Composable Graphs**:
 - **Specialized Research Teams of Agents**:
@@ -55,7 +72,20 @@ uv run gradio run src/app.py
 Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
 This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
@@ -81,7 +111,13 @@ Add this to your `claude_desktop_config.json`:
 - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
-## Deep Research Flows
 - iterativeResearch
 - deepResearch
@@ -89,6 +125,7 @@ Add this to your `claude_desktop_config.json`:
 ### Iterative Research
 sequenceDiagram
     participant IterativeFlow
     participant ThinkingAgent
@@ -121,10 +158,12 @@ sequenceDiagram
             JudgeHandler-->>IterativeFlow: should_continue
         end
     end
 ### Deep Research
 sequenceDiagram
     actor User
     participant GraphOrchestrator
@@ -159,8 +198,10 @@ sequenceDiagram
     end
     GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
 ### Research Team
 Critical Deep Research Agent
 ## Development
@@ -177,27 +218,6 @@ uv run pytest
 make check
 ```
-## Architecture
-DeepCritical uses a Vertical Slice Architecture:
-1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
-2.  **Judge Slice**: Evaluating evidence quality using LLMs.
-3.  **Orchestrator Slice**: Managing the research loop and UI.
-Built with:
-- **PydanticAI**: For robust agent interactions.
-- **Gradio**: For the streaming user interface.
-- **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
-- **MCP**: For universal tool access.
-- **Modal**: For secure code execution.
-## Team
-- The-Obstacle-Is-The-Way
-- MarioAderman
-- Josephrp
 ## Links
-- [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)

 sdk_version: "6.0.1"
 python_version: "3.11"
 app_file: src/app.py
+hf_oauth: true
+hf_oauth_expiration_minutes: 480
+hf_oauth_scopes:
+ - inference-api
+pinned: true
 license: mit
 tags:
   - mcp-in-action-track-enterprise
   - modal
 ---
+<div align="center">
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
+</div>
 # DeepCritical
 ## Intro
 - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
 - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
+- **HuggingFace OAuth**: Sign in with your HuggingFace account to automatically use your API token
 - **Modal Sandbox**: Secure execution of AI-generated statistical code
 - **LlamaIndex RAG**: Semantic search and evidence synthesis
+- **HuggingfaceInference**: Free tier support with automatic fallback
 - **HuggingfaceMCP Custom Config To Use Community Tools**:
 - **Strongly Typed Composable Graphs**:
 - **Specialized Research Teams of Agents**:
 Open your browser to `http://localhost:7860`.
+### 3. Authentication (Optional)
+**HuggingFace OAuth Login**:
+- Click the "Sign in with HuggingFace" button at the top of the app
+- Your HuggingFace API token will be automatically used for AI inference
+- No need to manually enter API keys when logged in
+- OAuth token is used only for the current session and never stored
+**Manual API Key (BYOK)**:
+- You can still provide your own API key in the Settings accordion
+- Supports HuggingFace, OpenAI, or Anthropic API keys
+- Manual keys take priority over OAuth tokens
+### 4. Connect via MCP
 This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
 - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
+## Architecture
+DeepCritical uses a Vertical Slice Architecture:
+1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
+2.  **Judge Slice**: Evaluating evidence quality using LLMs.
+3.  **Orchestrator Slice**: Managing the research loop and UI.
 - iterativeResearch
 - deepResearch
 ### Iterative Research
+```mermaid
 sequenceDiagram
     participant IterativeFlow
     participant ThinkingAgent
             JudgeHandler-->>IterativeFlow: should_continue
         end
     end
+```
 ### Deep Research
+```mermaid
 sequenceDiagram
     actor User
     participant GraphOrchestrator
     end
     GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
+```
 ### Research Team
 Critical Deep Research Agent
 ## Development
 make check
 ```
 ## Links
+- [GitHub Repository](https://github.com/DeepCritical/GradioDemo)

.github/workflows/ci.yml CHANGED Viewed

@@ -16,6 +16,11 @@ jobs:
     steps:
       - uses: actions/checkout@v4
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v5
         with:
@@ -23,45 +28,40 @@ jobs:
       - name: Install dependencies
         run: |
-          python -m pip install --upgrade pip
-          pip install -e ".[dev]"
       - name: Lint with ruff
         run: |
-          ruff check . --exclude tests
-          ruff format --check . --exclude tests
       - name: Type check with mypy
         run: |
-          mypy src
-      - name: Install embedding dependencies
-        run: |
-          pip install -e ".[embeddings]"
-      - name: Run unit tests (excluding OpenAI and embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
       - name: Run local embeddings tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Run HuggingFace integration tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if HF_TOKEN not set
       - name: Run non-OpenAI integration tests (excluding embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
-          pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if dependencies not available

     steps:
       - uses: actions/checkout@v4
+      - name: Install uv
+        uses: astral-sh/setup-uv@v5
+        with:
+          version: "latest"
       - name: Set up Python ${{ matrix.python-version }}
         uses: actions/setup-python@v5
         with:
       - name: Install dependencies
         run: |
+          uv sync --dev
       - name: Lint with ruff
         run: |
+          uv run ruff check . --exclude tests
+          uv run ruff format --check . --exclude tests
       - name: Type check with mypy
         run: |
+          uv run mypy src
+      - name: Run unit tests (No Black Box Apis)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
+          uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
       - name: Run local embeddings tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
+          uv run pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if dependencies not available
       - name: Run HuggingFace integration tests
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
+          uv run pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if HF_TOKEN not set
       - name: Run non-OpenAI integration tests (excluding embedding providers)
         env:
           HF_TOKEN: ${{ secrets.HF_TOKEN }}
         run: |
+          uv run pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
         continue-on-error: true  # Allow failures if dependencies not available

.github/workflows/docs.yml ADDED Viewed

	@@ -0,0 +1,56 @@

+name: Documentation
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  pull_request:
+    branches:
+      - main
+    paths:
+      - 'docs/**'
+      - 'mkdocs.yml'
+      - '.github/workflows/docs.yml'
+  workflow_dispatch:
+permissions:
+  contents: write
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+      - name: Install uv
+        run: |
+          pip install uv
+      - name: Install dependencies
+        run: |
+          uv sync --all-extras --dev
+      - name: Build documentation
+        run: |
+          uv run mkdocs build --strict
+      - name: Deploy to GitHub Pages
+        if: github.ref == 'refs/heads/main' && github.event_name == 'push'
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./site
+          cname: false

.gitignore CHANGED Viewed

@@ -1,6 +1,10 @@
 folder/
 .cursor/
 .ruff_cache/
 # Python
 __pycache__/
 *.py[cod]

+=0.22.0
+=0.22.0,
 folder/
+site/
 .cursor/
 .ruff_cache/
+docs/contributing/
 # Python
 __pycache__/
 *.py[cod]

.pre-commit-config.yaml CHANGED Viewed

@@ -31,14 +31,9 @@ repos:
         types: [python]
         args: [
           "run",
-          "pytest",
-          "tests/unit/",
-          "-v",
-          "-m",
-          "not openai and not embedding_provider",
-          "--tb=short",
-          "-p",
-          "no:logfire",
         ]
         pass_filenames: false
         always_run: true
@@ -50,14 +45,9 @@ repos:
         types: [python]
         args: [
           "run",
-          "pytest",
-          "tests/",
-          "-v",
-          "-m",
-          "local_embeddings",
-          "--tb=short",
-          "-p",
-          "no:logfire",
         ]
         pass_filenames: false
         always_run: true

         types: [python]
         args: [
           "run",
+          "python",
+          ".pre-commit-hooks/run_pytest_with_sync.py",
+          "unit",
         ]
         pass_filenames: false
         always_run: true
         types: [python]
         args: [
           "run",
+          "python",
+          ".pre-commit-hooks/run_pytest_with_sync.py",
+          "embeddings",
         ]
         pass_filenames: false
         always_run: true

.pre-commit-hooks/run_pytest.ps1 CHANGED Viewed

@@ -2,6 +2,8 @@
 # Uses uv if available, otherwise falls back to pytest
 if (Get-Command uv -ErrorAction SilentlyContinue) {
     uv run pytest $args
 } else {
     Write-Warning "uv not found, using system pytest (may have missing dependencies)"
@@ -12,3 +14,6 @@ if (Get-Command uv -ErrorAction SilentlyContinue) {

 # Uses uv if available, otherwise falls back to pytest
 if (Get-Command uv -ErrorAction SilentlyContinue) {
+    # Sync dependencies before running tests
+    uv sync
     uv run pytest $args
 } else {
     Write-Warning "uv not found, using system pytest (may have missing dependencies)"

.pre-commit-hooks/run_pytest.sh CHANGED Viewed

@@ -3,6 +3,8 @@
 # Uses uv if available, otherwise falls back to pytest
 if command -v uv >/dev/null 2>&1; then
     uv run pytest "$@"
 else
     echo "Warning: uv not found, using system pytest (may have missing dependencies)"
@@ -13,3 +15,6 @@ fi

 # Uses uv if available, otherwise falls back to pytest
 if command -v uv >/dev/null 2>&1; then
+    # Sync dependencies before running tests
+    uv sync
     uv run pytest "$@"
 else
     echo "Warning: uv not found, using system pytest (may have missing dependencies)"

.pre-commit-hooks/run_pytest_embeddings.ps1 ADDED Viewed

	@@ -0,0 +1,14 @@

+# PowerShell wrapper to sync embeddings dependencies and run embeddings tests
+$ErrorActionPreference = "Stop"
+if (Get-Command uv -ErrorAction SilentlyContinue) {
+    Write-Host "Syncing embeddings dependencies..."
+    uv sync --extra embeddings
+    Write-Host "Running embeddings tests..."
+    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
+} else {
+    Write-Error "uv not found"
+    exit 1
+}

.pre-commit-hooks/run_pytest_embeddings.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/bin/bash
+# Wrapper script to sync embeddings dependencies and run embeddings tests
+set -e
+if command -v uv >/dev/null 2>&1; then
+    echo "Syncing embeddings dependencies..."
+    uv sync --extra embeddings
+    echo "Running embeddings tests..."
+    uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
+else
+    echo "Error: uv not found"
+    exit 1
+fi

.pre-commit-hooks/run_pytest_unit.ps1 ADDED Viewed

	@@ -0,0 +1,14 @@

+# PowerShell wrapper to sync dependencies and run unit tests
+$ErrorActionPreference = "Stop"
+if (Get-Command uv -ErrorAction SilentlyContinue) {
+    Write-Host "Syncing dependencies..."
+    uv sync
+    Write-Host "Running unit tests..."
+    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
+} else {
+    Write-Error "uv not found"
+    exit 1
+}

.pre-commit-hooks/run_pytest_unit.sh ADDED Viewed

	@@ -0,0 +1,15 @@

+#!/bin/bash
+# Wrapper script to sync dependencies and run unit tests
+set -e
+if command -v uv >/dev/null 2>&1; then
+    echo "Syncing dependencies..."
+    uv sync
+    echo "Running unit tests..."
+    uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
+else
+    echo "Error: uv not found"
+    exit 1
+fi

.pre-commit-hooks/run_pytest_with_sync.ps1 ADDED Viewed

	@@ -0,0 +1,25 @@

+# PowerShell wrapper for pytest runner
+# Ensures uv is available and runs the Python script
+param(
+    [Parameter(Position=0)]
+    [string]$TestType = "unit"
+)
+$ErrorActionPreference = "Stop"
+# Check if uv is available
+if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
+    Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
+    exit 1
+}
+# Get the script directory
+$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
+$PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
+# Run the Python script using uv
+uv run python $PythonScript $TestType
+exit $LASTEXITCODE

.pre-commit-hooks/run_pytest_with_sync.py ADDED Viewed

	@@ -0,0 +1,93 @@

+#!/usr/bin/env python3
+"""Cross-platform pytest runner that syncs dependencies before running tests."""
+import subprocess
+import sys
+def run_command(
+    cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
+) -> int:
+    """Run a command and return exit code."""
+    try:
+        result = subprocess.run(
+            cmd,
+            check=check,
+            shell=shell,
+            cwd=cwd,
+            env=None,  # Use current environment, uv will handle venv
+        )
+        return result.returncode
+    except subprocess.CalledProcessError as e:
+        return e.returncode
+    except FileNotFoundError:
+        print(f"Error: Command not found: {cmd[0]}")
+        return 1
+def main() -> int:
+    """Main entry point."""
+    import os
+    from pathlib import Path
+    # Get the project root (where pyproject.toml is)
+    script_dir = Path(__file__).parent
+    project_root = script_dir.parent
+    # Change to project root to ensure uv works correctly
+    os.chdir(project_root)
+    # Check if uv is available
+    if run_command(["uv", "--version"], check=False) != 0:
+        print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
+        return 1
+    # Parse arguments
+    test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
+    extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
+    # Sync dependencies - always include dev
+    # Note: embeddings dependencies are now in main dependencies, not optional
+    # So we just sync with --dev for all test types
+    sync_cmd = ["uv", "sync", "--dev"]
+    print(f"Syncing dependencies for {test_type} tests...")
+    if run_command(sync_cmd, cwd=project_root) != 0:
+        return 1
+    # Build pytest command - use uv run to ensure correct environment
+    if test_type == "unit":
+        pytest_args = [
+            "tests/unit/",
+            "-v",
+            "-m",
+            "not openai and not embedding_provider",
+            "--tb=short",
+            "-p",
+            "no:logfire",
+        ]
+    elif test_type == "embeddings":
+        pytest_args = [
+            "tests/",
+            "-v",
+            "-m",
+            "local_embeddings",
+            "--tb=short",
+            "-p",
+            "no:logfire",
+        ]
+    else:
+        pytest_args = []
+    pytest_args.extend(extra_args)
+    # Use uv run python -m pytest to ensure we use the venv's pytest
+    # This is more reliable than uv run pytest which might find system pytest
+    pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
+    print(f"Running {test_type} tests...")
+    return run_command(pytest_cmd, cwd=project_root)
+if __name__ == "__main__":
+    sys.exit(main())

=0.22.0 ADDED Viewed

File without changes

=0.22.0, ADDED Viewed

File without changes

CONTRIBUTING.md DELETED Viewed

	@@ -1 +0,0 @@
1	- make sure you run the full pre-commit checks before opening a PR (not draft) otherwise Obstacle is the Way will loose his mind

Makefile CHANGED Viewed

@@ -37,6 +37,15 @@ typecheck:
 check: lint typecheck test-cov
 	@echo "All checks passed!"
 clean:
 	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
 	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

 check: lint typecheck test-cov
 	@echo "All checks passed!"
+docs-build:
+	uv run mkdocs build
+docs-serve:
+	uv run mkdocs serve
+docs-clean:
+	rm -rf site/
 clean:
 	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
 	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
 ---
-title: DeepCritical
-emoji: 🧬
-colorFrom: blue
-colorTo: purple
 sdk: gradio
 sdk_version: "6.0.1"
 python_version: "3.11"
 app_file: src/app.py
-pinned: false
 license: mit
 tags:
   - mcp-in-action-track-enterprise
@@ -19,178 +23,100 @@ tags:
   - modal
 ---
 # DeepCritical
-## Intro
-## Features
-- **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
-- **MCP Integration**: Use our tools from Claude Desktop or any MCP client
-- **Modal Sandbox**: Secure execution of AI-generated statistical code
-- **LlamaIndex RAG**: Semantic search and evidence synthesis
-- **HuggingfaceInference**:
-- **HuggingfaceMCP Custom Config To Use Community Tools**:
-- **Strongly Typed Composable Graphs**:
-- **Specialized Research Teams of Agents**:
-## Quick Start
-### 1. Environment Setup
-```bash
-# Install uv if you haven't already
-pip install uv
-# Sync dependencies
-uv sync
-```
-### 2. Run the UI
-```bash
-# Start the Gradio app
-uv run gradio run src/app.py
-```
-Open your browser to `http://localhost:7860`.
-### 3. Connect via MCP
-This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
-**MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
-**Claude Desktop Configuration**:
-Add this to your `claude_desktop_config.json`:
-```json
-{
-  "mcpServers": {
-    "deepcritical": {
-      "url": "http://localhost:7860/gradio_api/mcp/"
-    }
-  }
-}
-```
-**Available Tools**:
-- `search_pubmed`: Search peer-reviewed biomedical literature.
-- `search_clinical_trials`: Search ClinicalTrials.gov.
-- `search_biorxiv`: Search bioRxiv/medRxiv preprints.
-- `search_all`: Search all sources simultaneously.
-- `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
-## Architecture
-DeepCritical uses a Vertical Slice Architecture:
-1.  **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
-2.  **Judge Slice**: Evaluating evidence quality using LLMs.
-3.  **Orchestrator Slice**: Managing the research loop and UI.
-- iterativeResearch
-- deepResearch
-- researchTeam
-### Iterative Research
-sequenceDiagram
-    participant IterativeFlow
-    participant ThinkingAgent
-    participant KnowledgeGapAgent
-    participant ToolSelector
-    participant ToolExecutor
-    participant JudgeHandler
-    participant WriterAgent
-    IterativeFlow->>IterativeFlow: run(query)
-    loop Until complete or max_iterations
-        IterativeFlow->>ThinkingAgent: generate_observations()
-        ThinkingAgent-->>IterativeFlow: observations
-        IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
-        KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
-        alt Research complete
-            IterativeFlow->>WriterAgent: create_final_report()
-            WriterAgent-->>IterativeFlow: final_report
-        else Gaps remain
-            IterativeFlow->>ToolSelector: select_agents(gap)
-            ToolSelector-->>IterativeFlow: AgentSelectionPlan
-            IterativeFlow->>ToolExecutor: execute_tool_tasks()
-            ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
-            IterativeFlow->>JudgeHandler: assess_evidence()
-            JudgeHandler-->>IterativeFlow: should_continue
-        end
-    end
-### Deep Research
-sequenceDiagram
-    actor User
-    participant GraphOrchestrator
-    participant InputParser
-    participant GraphBuilder
-    participant GraphExecutor
-    participant Agent
-    participant BudgetTracker
-    participant WorkflowState
-    User->>GraphOrchestrator: run(query)
-    GraphOrchestrator->>InputParser: detect_research_mode(query)
-    InputParser-->>GraphOrchestrator: mode (iterative/deep)
-    GraphOrchestrator->>GraphBuilder: build_graph(mode)
-    GraphBuilder-->>GraphOrchestrator: ResearchGraph
-    GraphOrchestrator->>WorkflowState: init_workflow_state()
-    GraphOrchestrator->>BudgetTracker: create_budget()
-    GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
-    loop For each node in graph
-        GraphExecutor->>Agent: execute_node(agent_node)
-        Agent->>Agent: process_input
-        Agent-->>GraphExecutor: result
-        GraphExecutor->>WorkflowState: update_state(result)
-        GraphExecutor->>BudgetTracker: add_tokens(used)
-        GraphExecutor->>BudgetTracker: check_budget()
-        alt Budget exceeded
-            GraphExecutor->>GraphOrchestrator: emit(error_event)
-        else Continue
-            GraphExecutor->>GraphOrchestrator: emit(progress_event)
-        end
-    end
-    GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
-### Research Team
-Critical Deep Research Agent
-## Development
-### Run Tests
-```bash
-uv run pytest
-```
-### Run Checks
-```bash
-make check
-```
-## Join Us
-- The-Obstacle-Is-The-Way
 - MarioAderman
 - Josephrp
 ## Links
-- [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)

 ---
+title: Critical Deep Resarch
+emoji: 🐉
+colorFrom: red
+colorTo: yellow
 sdk: gradio
 sdk_version: "6.0.1"
 python_version: "3.11"
 app_file: src/app.py
+hf_oauth: true
+hf_oauth_expiration_minutes: 480
+hf_oauth_scopes:
+ - inference-api
+pinned: true
 license: mit
 tags:
   - mcp-in-action-track-enterprise
   - modal
 ---
+> [!IMPORTANT]
+> **You are reading the Gradio Demo README!**
+>
+> - 📚 **Documentation**: See our [technical documentation](docs/index.md) for detailed information
+> - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
+> - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
+<div align="center">
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
+</div>
 # DeepCritical
+## About
+The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
+For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
+## Deep Critical In the Medial
+- Social Medial Posts about Deep Critical :
+  -
+  -
+  -
+  -
+  -
+  -
+  -
+## Important information
+- **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
+- **[docs]**: want to know how all this works ? read our detailed technical documentation here.
+- **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
+- **[team](### Team)**: Join us , or follow us !
+- **[video]**: See our demo video
+## Future Developments
+- [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
+- [] Visualize Pydantic Graphs as Loading Screens in the UI
+- [] Improve Data Science with more Complex Graph Agents
+- [] Create Deep Critical Drug Reporposing / Discovery Demo
+- [] Create Deep Critical Literal Review
+- [] Create Deep Critical Hypothesis Generator
+## Completed
+- [] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
+- [] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
+- [] **HuggingFace OAuth**: Sign in with HuggingFace
+- [] **Modal Sandbox**: Secure execution of AI-generated statistical code
+- [] **LlamaIndex RAG**: Semantic search and evidence synthesis
+- [] **HuggingfaceInference**:
+- [] **HuggingfaceMCP Custom Config To Use Community Tools**:
+- [] **Strongly Typed Composable Graphs**:
+- [] **Specialized Research Teams of Agents**:
+### Team
+- ZJ
 - MarioAderman
 - Josephrp
+## Acknowledgements
+- McSwaggins
+- Magentic
+- Huggingface
+- Gradio
+- DeepCritical
+- Sponsors
+- Microsoft
+- Pydantic
+- Llama-index
+- Anthhropic/MCP
+- List of Tools Makers
 ## Links
+[![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
+[![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
+[![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
+[![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
+[![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)

.cursorrules → dev/.cursorrules RENAMED Viewed

	@@ -238,3 +238,4 @@
238
239
240


238
239
240
241	+

AGENTS.txt → dev/AGENTS.txt RENAMED Viewed

File without changes

dev/Makefile ADDED Viewed

	@@ -0,0 +1,51 @@

+.PHONY: install test lint format typecheck check clean all cov cov-html
+# Default target
+all: check
+install:
+	uv sync --all-extras
+	uv run pre-commit install
+test:
+	uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
+test-hf:
+	uv run pytest tests/ -v -m "huggingface" -p no:logfire
+test-all:
+	uv run pytest tests/ -v -p no:logfire
+# Coverage aliases
+cov: test-cov
+test-cov:
+	uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
+cov-html:
+	uv run pytest --cov=src --cov-report=html -p no:logfire
+	@echo "Coverage report: open htmlcov/index.html"
+lint:
+	uv run ruff check src tests
+format:
+	uv run ruff format src tests
+typecheck:
+	uv run mypy src
+check: lint typecheck test-cov
+	@echo "All checks passed!"
+docs-build:
+	uv run mkdocs build
+docs-serve:
+	uv run mkdocs serve
+docs-clean:
+	rm -rf site/
+clean:
+	rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
+	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true

dev/docs_plugins.py ADDED Viewed

	@@ -0,0 +1,74 @@

+"""Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
+import re
+from pathlib import Path
+from markdown import Markdown
+from markdown.extensions import Extension
+from markdown.preprocessors import Preprocessor
+class CodeAnchorPreprocessor(Preprocessor):
+    """Preprocess code blocks with anchor format: ```start:end:filepath"""
+    def __init__(self, md: Markdown, base_path: Path):
+        super().__init__(md)
+        self.base_path = base_path
+        self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
+    def run(self, lines: list[str]) -> list[str]:
+        """Process lines and convert code anchor format to standard code blocks."""
+        text = "\n".join(lines)
+        new_text = self.pattern.sub(self._replace_code_anchor, text)
+        return new_text.split("\n")
+    def _replace_code_anchor(self, match) -> str:
+        """Replace code anchor format with standard code block + link."""
+        start_line = int(match.group(1))
+        end_line = int(match.group(2))
+        file_path = match.group(3).strip()
+        existing_code = match.group(4)
+        # Determine language from file extension
+        ext = Path(file_path).suffix.lower()
+        lang_map = {
+            ".py": "python",
+            ".js": "javascript",
+            ".ts": "typescript",
+            ".md": "markdown",
+            ".yaml": "yaml",
+            ".yml": "yaml",
+            ".toml": "toml",
+            ".json": "json",
+            ".html": "html",
+            ".css": "css",
+            ".sh": "bash",
+        }
+        language = lang_map.get(ext, "python")
+        # Generate GitHub link
+        repo_url = "https://github.com/DeepCritical/GradioDemo"
+        github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
+        # Return standard code block with source link
+        return (
+            f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
+            f"```{language}\n{existing_code}\n```"
+        )
+class CodeAnchorExtension(Extension):
+    """Markdown extension for code anchors."""
+    def __init__(self, base_path: str = ".", **kwargs):
+        super().__init__(**kwargs)
+        self.base_path = Path(base_path)
+    def extendMarkdown(self, md: Markdown):  # noqa: N802
+        """Register the preprocessor."""
+        md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
+def makeExtension(**kwargs):  # noqa: N802
+    """Create the extension."""
+    return CodeAnchorExtension(**kwargs)

docs/CONFIGURATION.md DELETED Viewed

@@ -1,301 +0,0 @@
-# Configuration Guide
-## Overview
-DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
-## Quick Start
-1. Copy the example environment file (if available) or create a `.env` file in the project root
-2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
-3. Optionally configure other services as needed
-## Configuration System
-### How It Works
-- **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
-- **Environment File**: Automatically loads from `.env` file (if present)
-- **Environment Variables**: Reads from environment variables (case-insensitive)
-- **Type Safety**: Strongly-typed fields with validation
-- **Singleton Pattern**: Global `settings` instance for easy access
-### Usage
-```python
-from src.utils.config import settings
-# Check if API keys are available
-if settings.has_openai_key:
-    # Use OpenAI
-    pass
-# Access configuration values
-max_iterations = settings.max_iterations
-web_search_provider = settings.web_search_provider
-```
-## Required Configuration
-### At Least One LLM Provider
-You must configure at least one LLM provider:
-**OpenAI:**
-```bash
-LLM_PROVIDER=openai
-OPENAI_API_KEY=your_openai_api_key_here
-OPENAI_MODEL=gpt-5.1
-```
-**Anthropic:**
-```bash
-LLM_PROVIDER=anthropic
-ANTHROPIC_API_KEY=your_anthropic_api_key_here
-ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
-```
-## Optional Configuration
-### Embedding Configuration
-```bash
-# Embedding Provider: "openai", "local", or "huggingface"
-EMBEDDING_PROVIDER=local
-# OpenAI Embedding Model (used by LlamaIndex RAG)
-OPENAI_EMBEDDING_MODEL=text-embedding-3-small
-# Local Embedding Model (sentence-transformers)
-LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
-# HuggingFace Embedding Model
-HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
-```
-### HuggingFace Configuration
-```bash
-# HuggingFace API Token (for inference API)
-HUGGINGFACE_API_KEY=your_huggingface_api_key_here
-# Or use HF_TOKEN (alternative name)
-# Default HuggingFace Model ID
-HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
-```
-### Web Search Configuration
-```bash
-# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
-# Default: "duckduckgo" (no API key required)
-WEB_SEARCH_PROVIDER=duckduckgo
-# Serper API Key (for Google search via Serper)
-SERPER_API_KEY=your_serper_api_key_here
-# SearchXNG Host URL
-SEARCHXNG_HOST=http://localhost:8080
-# Brave Search API Key
-BRAVE_API_KEY=your_brave_api_key_here
-# Tavily API Key
-TAVILY_API_KEY=your_tavily_api_key_here
-```
-### PubMed Configuration
-```bash
-# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
-NCBI_API_KEY=your_ncbi_api_key_here
-```
-### Agent Configuration
-```bash
-# Maximum iterations per research loop
-MAX_ITERATIONS=10
-# Search timeout in seconds
-SEARCH_TIMEOUT=30
-# Use graph-based execution for research flows
-USE_GRAPH_EXECUTION=false
-```
-### Budget & Rate Limiting Configuration
-```bash
-# Default token budget per research loop
-DEFAULT_TOKEN_LIMIT=100000
-# Default time limit per research loop (minutes)
-DEFAULT_TIME_LIMIT_MINUTES=10
-# Default iterations limit per research loop
-DEFAULT_ITERATIONS_LIMIT=10
-```
-### RAG Service Configuration
-```bash
-# ChromaDB collection name for RAG
-RAG_COLLECTION_NAME=deepcritical_evidence
-# Number of top results to retrieve from RAG
-RAG_SIMILARITY_TOP_K=5
-# Automatically ingest evidence into RAG
-RAG_AUTO_INGEST=true
-```
-### ChromaDB Configuration
-```bash
-# ChromaDB storage path
-CHROMA_DB_PATH=./chroma_db
-# Whether to persist ChromaDB to disk
-CHROMA_DB_PERSIST=true
-# ChromaDB server host (for remote ChromaDB, optional)
-# CHROMA_DB_HOST=localhost
-# ChromaDB server port (for remote ChromaDB, optional)
-# CHROMA_DB_PORT=8000
-```
-### External Services
-```bash
-# Modal Token ID (for Modal sandbox execution)
-MODAL_TOKEN_ID=your_modal_token_id_here
-# Modal Token Secret
-MODAL_TOKEN_SECRET=your_modal_token_secret_here
-```
-### Logging Configuration
-```bash
-# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
-LOG_LEVEL=INFO
-```
-## Configuration Properties
-The `Settings` class provides helpful properties for checking configuration:
-```python
-from src.utils.config import settings
-# Check API key availability
-settings.has_openai_key          # bool
-settings.has_anthropic_key       # bool
-settings.has_huggingface_key     # bool
-settings.has_any_llm_key         # bool
-# Check service availability
-settings.modal_available         # bool
-settings.web_search_available    # bool
-```
-## Environment Variables Reference
-### Required (at least one LLM)
-- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
-### Optional LLM Providers
-- `DEEPSEEK_API_KEY` (Phase 2)
-- `OPENROUTER_API_KEY` (Phase 2)
-- `GEMINI_API_KEY` (Phase 2)
-- `PERPLEXITY_API_KEY` (Phase 2)
-- `HUGGINGFACE_API_KEY` or `HF_TOKEN`
-- `AZURE_OPENAI_ENDPOINT` (Phase 2)
-- `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
-- `AZURE_OPENAI_API_KEY` (Phase 2)
-- `AZURE_OPENAI_API_VERSION` (Phase 2)
-- `LOCAL_MODEL_URL` (Phase 2)
-### Web Search
-- `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
-- `SERPER_API_KEY`
-- `SEARCHXNG_HOST`
-- `BRAVE_API_KEY`
-- `TAVILY_API_KEY`
-### Embeddings
-- `EMBEDDING_PROVIDER` (default: "local")
-- `HUGGINGFACE_EMBEDDING_MODEL` (optional)
-### RAG
-- `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
-- `RAG_SIMILARITY_TOP_K` (default: 5)
-- `RAG_AUTO_INGEST` (default: true)
-### ChromaDB
-- `CHROMA_DB_PATH` (default: "./chroma_db")
-- `CHROMA_DB_PERSIST` (default: true)
-- `CHROMA_DB_HOST` (optional)
-- `CHROMA_DB_PORT` (optional)
-### Budget
-- `DEFAULT_TOKEN_LIMIT` (default: 100000)
-- `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
-- `DEFAULT_ITERATIONS_LIMIT` (default: 10)
-### Other
-- `LLM_PROVIDER` (default: "openai")
-- `NCBI_API_KEY` (optional)
-- `MODAL_TOKEN_ID` (optional)
-- `MODAL_TOKEN_SECRET` (optional)
-- `MAX_ITERATIONS` (default: 10)
-- `LOG_LEVEL` (default: "INFO")
-- `USE_GRAPH_EXECUTION` (default: false)
-## Validation
-Settings are validated on load using Pydantic validation:
-- **Type checking**: All fields are strongly typed
-- **Range validation**: Numeric fields have min/max constraints
-- **Literal validation**: Enum fields only accept specific values
-- **Required fields**: API keys are checked when accessed via `get_api_key()`
-## Error Handling
-Configuration errors raise `ConfigurationError`:
-```python
-from src.utils.config import settings
-from src.utils.exceptions import ConfigurationError
-try:
-    api_key = settings.get_api_key()
-except ConfigurationError as e:
-    print(f"Configuration error: {e}")
-```
-## Future Enhancements (Phase 2)
-The following configurations are planned for Phase 2:
-1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
-2. **Model Selection**: Reasoning/main/fast model configuration
-3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
-See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.

docs/api/agents.md ADDED Viewed

	@@ -0,0 +1,260 @@

+# Agents API Reference
+This page documents the API for DeepCritical agents.
+## KnowledgeGapAgent
+**Module**: `src.agents.knowledge_gap`
+**Purpose**: Evaluates research state and identifies knowledge gaps.
+### Methods
+#### `evaluate`
+```python
+async def evaluate(
+    self,
+    query: str,
+    background_context: str,
+    conversation_history: Conversation,
+    iteration: int,
+    time_elapsed_minutes: float,
+    max_time_minutes: float
+) -> KnowledgeGapOutput
+```
+Evaluates research completeness and identifies outstanding knowledge gaps.
+**Parameters**:
+- `query`: Research query string
+- `background_context`: Background context for the query
+- `conversation_history`: Conversation history with previous iterations
+- `iteration`: Current iteration number
+- `time_elapsed_minutes`: Elapsed time in minutes
+- `max_time_minutes`: Maximum time limit in minutes
+**Returns**: `KnowledgeGapOutput` with:
+- `research_complete`: Boolean indicating if research is complete
+- `outstanding_gaps`: List of remaining knowledge gaps
+## ToolSelectorAgent
+**Module**: `src.agents.tool_selector`
+**Purpose**: Selects appropriate tools for addressing knowledge gaps.
+### Methods
+#### `select_tools`
+```python
+async def select_tools(
+    self,
+    query: str,
+    knowledge_gaps: list[str],
+    available_tools: list[str]
+) -> AgentSelectionPlan
+```
+Selects tools for addressing knowledge gaps.
+**Parameters**:
+- `query`: Research query string
+- `knowledge_gaps`: List of knowledge gaps to address
+- `available_tools`: List of available tool names
+**Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
+## WriterAgent
+**Module**: `src.agents.writer`
+**Purpose**: Generates final reports from research findings.
+### Methods
+#### `write_report`
+```python
+async def write_report(
+    self,
+    query: str,
+    findings: str,
+    output_length: str = "medium",
+    output_instructions: str | None = None
+) -> str
+```
+Generates a markdown report from research findings.
+**Parameters**:
+- `query`: Research query string
+- `findings`: Research findings to include in report
+- `output_length`: Desired output length ("short", "medium", "long")
+- `output_instructions`: Additional instructions for report generation
+**Returns**: Markdown string with numbered citations.
+## LongWriterAgent
+**Module**: `src.agents.long_writer`
+**Purpose**: Long-form report generation with section-by-section writing.
+### Methods
+#### `write_next_section`
+```python
+async def write_next_section(
+    self,
+    query: str,
+    draft: ReportDraft,
+    section_title: str,
+    section_content: str
+) -> LongWriterOutput
+```
+Writes the next section of a long-form report.
+**Parameters**:
+- `query`: Research query string
+- `draft`: Current report draft
+- `section_title`: Title of the section to write
+- `section_content`: Content/guidance for the section
+**Returns**: `LongWriterOutput` with updated draft.
+#### `write_report`
+```python
+async def write_report(
+    self,
+    query: str,
+    report_title: str,
+    report_draft: ReportDraft
+) -> str
+```
+Generates final report from draft.
+**Parameters**:
+- `query`: Research query string
+- `report_title`: Title of the report
+- `report_draft`: Complete report draft
+**Returns**: Final markdown report string.
+## ProofreaderAgent
+**Module**: `src.agents.proofreader`
+**Purpose**: Proofreads and polishes report drafts.
+### Methods
+#### `proofread`
+```python
+async def proofread(
+    self,
+    query: str,
+    report_title: str,
+    report_draft: ReportDraft
+) -> str
+```
+Proofreads and polishes a report draft.
+**Parameters**:
+- `query`: Research query string
+- `report_title`: Title of the report
+- `report_draft`: Report draft to proofread
+**Returns**: Polished markdown string.
+## ThinkingAgent
+**Module**: `src.agents.thinking`
+**Purpose**: Generates observations from conversation history.
+### Methods
+#### `generate_observations`
+```python
+async def generate_observations(
+    self,
+    query: str,
+    background_context: str,
+    conversation_history: Conversation
+) -> str
+```
+Generates observations from conversation history.
+**Parameters**:
+- `query`: Research query string
+- `background_context`: Background context
+- `conversation_history`: Conversation history
+**Returns**: Observation string.
+## InputParserAgent
+**Module**: `src.agents.input_parser`
+**Purpose**: Parses and improves user queries, detects research mode.
+### Methods
+#### `parse_query`
+```python
+async def parse_query(
+    self,
+    query: str
+) -> ParsedQuery
+```
+Parses and improves a user query.
+**Parameters**:
+- `query`: Original query string
+**Returns**: `ParsedQuery` with:
+- `original_query`: Original query string
+- `improved_query`: Refined query string
+- `research_mode`: "iterative" or "deep"
+- `key_entities`: List of key entities
+- `research_questions`: List of research questions
+## Factory Functions
+All agents have factory functions in `src.agent_factory.agents`:
+```python
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
+def create_writer_agent(model: Any | None = None) -> WriterAgent
+def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent
+def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent
+def create_thinking_agent(model: Any | None = None) -> ThinkingAgent
+def create_input_parser_agent(model: Any | None = None) -> InputParserAgent
+```
+**Parameters**:
+- `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
+**Returns**: Agent instance.
+## See Also
+- [Architecture - Agents](../architecture/agents.md) - Architecture overview
+- [Models API](models.md) - Data models used by agents

docs/api/models.md ADDED Viewed

	@@ -0,0 +1,238 @@

+# Models API Reference
+This page documents the Pydantic models used throughout DeepCritical.
+## Evidence
+**Module**: `src.utils.models`
+**Purpose**: Represents evidence from search results.
+```python
+class Evidence(BaseModel):
+    citation: Citation
+    content: str
+    relevance_score: float = Field(ge=0.0, le=1.0)
+    metadata: dict[str, Any] = Field(default_factory=dict)
+```
+**Fields**:
+- `citation`: Citation information (title, URL, date, authors)
+- `content`: Evidence text content
+- `relevance_score`: Relevance score (0.0-1.0)
+- `metadata`: Additional metadata dictionary
+## Citation
+**Module**: `src.utils.models`
+**Purpose**: Citation information for evidence.
+```python
+class Citation(BaseModel):
+    title: str
+    url: str
+    date: str | None = None
+    authors: list[str] = Field(default_factory=list)
+```
+**Fields**:
+- `title`: Article/trial title
+- `url`: Source URL
+- `date`: Publication date (optional)
+- `authors`: List of authors (optional)
+## KnowledgeGapOutput
+**Module**: `src.utils.models`
+**Purpose**: Output from knowledge gap evaluation.
+```python
+class KnowledgeGapOutput(BaseModel):
+    research_complete: bool
+    outstanding_gaps: list[str] = Field(default_factory=list)
+```
+**Fields**:
+- `research_complete`: Boolean indicating if research is complete
+- `outstanding_gaps`: List of remaining knowledge gaps
+## AgentSelectionPlan
+**Module**: `src.utils.models`
+**Purpose**: Plan for tool/agent selection.
+```python
+class AgentSelectionPlan(BaseModel):
+    tasks: list[AgentTask] = Field(default_factory=list)
+```
+**Fields**:
+- `tasks`: List of agent tasks to execute
+## AgentTask
+**Module**: `src.utils.models`
+**Purpose**: Individual agent task.
+```python
+class AgentTask(BaseModel):
+    agent_name: str
+    query: str
+    context: dict[str, Any] = Field(default_factory=dict)
+```
+**Fields**:
+- `agent_name`: Name of agent to use
+- `query`: Task query
+- `context`: Additional context dictionary
+## ReportDraft
+**Module**: `src.utils.models`
+**Purpose**: Draft structure for long-form reports.
+```python
+class ReportDraft(BaseModel):
+    title: str
+    sections: list[ReportSection] = Field(default_factory=list)
+    references: list[Citation] = Field(default_factory=list)
+```
+**Fields**:
+- `title`: Report title
+- `sections`: List of report sections
+- `references`: List of citations
+## ReportSection
+**Module**: `src.utils.models`
+**Purpose**: Individual section in a report draft.
+```python
+class ReportSection(BaseModel):
+    title: str
+    content: str
+    order: int
+```
+**Fields**:
+- `title`: Section title
+- `content`: Section content
+- `order`: Section order number
+## ParsedQuery
+**Module**: `src.utils.models`
+**Purpose**: Parsed and improved query.
+```python
+class ParsedQuery(BaseModel):
+    original_query: str
+    improved_query: str
+    research_mode: Literal["iterative", "deep"]
+    key_entities: list[str] = Field(default_factory=list)
+    research_questions: list[str] = Field(default_factory=list)
+```
+**Fields**:
+- `original_query`: Original query string
+- `improved_query`: Refined query string
+- `research_mode`: Research mode ("iterative" or "deep")
+- `key_entities`: List of key entities
+- `research_questions`: List of research questions
+## Conversation
+**Module**: `src.utils.models`
+**Purpose**: Conversation history with iterations.
+```python
+class Conversation(BaseModel):
+    iterations: list[IterationData] = Field(default_factory=list)
+```
+**Fields**:
+- `iterations`: List of iteration data
+## IterationData
+**Module**: `src.utils.models`
+**Purpose**: Data for a single iteration.
+```python
+class IterationData(BaseModel):
+    iteration: int
+    observations: str | None = None
+    knowledge_gaps: list[str] = Field(default_factory=list)
+    tool_calls: list[dict[str, Any]] = Field(default_factory=list)
+    findings: str | None = None
+    thoughts: str | None = None
+```
+**Fields**:
+- `iteration`: Iteration number
+- `observations`: Generated observations
+- `knowledge_gaps`: Identified knowledge gaps
+- `tool_calls`: Tool calls made
+- `findings`: Findings from tools
+- `thoughts`: Agent thoughts
+## AgentEvent
+**Module**: `src.utils.models`
+**Purpose**: Event emitted during research execution.
+```python
+class AgentEvent(BaseModel):
+    type: str
+    iteration: int | None = None
+    data: dict[str, Any] = Field(default_factory=dict)
+```
+**Fields**:
+- `type`: Event type (e.g., "started", "search_complete", "complete")
+- `iteration`: Iteration number (optional)
+- `data`: Event data dictionary
+## BudgetStatus
+**Module**: `src.utils.models`
+**Purpose**: Current budget status.
+```python
+class BudgetStatus(BaseModel):
+    tokens_used: int
+    tokens_limit: int
+    time_elapsed_seconds: float
+    time_limit_seconds: float
+    iterations: int
+    iterations_limit: int
+```
+**Fields**:
+- `tokens_used`: Tokens used so far
+- `tokens_limit`: Token limit
+- `time_elapsed_seconds`: Elapsed time in seconds
+- `time_limit_seconds`: Time limit in seconds
+- `iterations`: Current iteration count
+- `iterations_limit`: Iteration limit
+## See Also
+- [Architecture - Agents](../architecture/agents.md) - How models are used
+- [Configuration](../configuration/index.md) - Model configuration

docs/api/orchestrators.md ADDED Viewed

	@@ -0,0 +1,185 @@

+# Orchestrators API Reference
+This page documents the API for DeepCritical orchestrators.
+## IterativeResearchFlow
+**Module**: `src.orchestrator.research_flow`
+**Purpose**: Single-loop research with search-judge-synthesize cycles.
+### Methods
+#### `run`
+```python
+async def run(
+    self,
+    query: str,
+    background_context: str = "",
+    max_iterations: int | None = None,
+    max_time_minutes: float | None = None,
+    token_budget: int | None = None
+) -> AsyncGenerator[AgentEvent, None]
+```
+Runs iterative research flow.
+**Parameters**:
+- `query`: Research query string
+- `background_context`: Background context (default: "")
+- `max_iterations`: Maximum iterations (default: from settings)
+- `max_time_minutes`: Maximum time in minutes (default: from settings)
+- `token_budget`: Token budget (default: from settings)
+**Yields**: `AgentEvent` objects for:
+- `started`: Research started
+- `search_complete`: Search completed
+- `judge_complete`: Evidence evaluation completed
+- `synthesizing`: Generating report
+- `complete`: Research completed
+- `error`: Error occurred
+## DeepResearchFlow
+**Module**: `src.orchestrator.research_flow`
+**Purpose**: Multi-section parallel research with planning and synthesis.
+### Methods
+#### `run`
+```python
+async def run(
+    self,
+    query: str,
+    background_context: str = "",
+    max_iterations_per_section: int | None = None,
+    max_time_minutes: float | None = None,
+    token_budget: int | None = None
+) -> AsyncGenerator[AgentEvent, None]
+```
+Runs deep research flow.
+**Parameters**:
+- `query`: Research query string
+- `background_context`: Background context (default: "")
+- `max_iterations_per_section`: Maximum iterations per section (default: from settings)
+- `max_time_minutes`: Maximum time in minutes (default: from settings)
+- `token_budget`: Token budget (default: from settings)
+**Yields**: `AgentEvent` objects for:
+- `started`: Research started
+- `planning`: Creating research plan
+- `looping`: Running parallel research loops
+- `synthesizing`: Synthesizing results
+- `complete`: Research completed
+- `error`: Error occurred
+## GraphOrchestrator
+**Module**: `src.orchestrator.graph_orchestrator`
+**Purpose**: Graph-based execution using Pydantic AI agents as nodes.
+### Methods
+#### `run`
+```python
+async def run(
+    self,
+    query: str,
+    research_mode: str = "auto",
+    use_graph: bool = True
+) -> AsyncGenerator[AgentEvent, None]
+```
+Runs graph-based research orchestration.
+**Parameters**:
+- `query`: Research query string
+- `research_mode`: Research mode ("iterative", "deep", or "auto")
+- `use_graph`: Whether to use graph execution (default: True)
+**Yields**: `AgentEvent` objects during graph execution.
+## Orchestrator Factory
+**Module**: `src.orchestrator_factory`
+**Purpose**: Factory for creating orchestrators.
+### Functions
+#### `create_orchestrator`
+```python
+def create_orchestrator(
+    search_handler: SearchHandlerProtocol,
+    judge_handler: JudgeHandlerProtocol,
+    config: dict[str, Any],
+    mode: str | None = None
+) -> Any
+```
+Creates an orchestrator instance.
+**Parameters**:
+- `search_handler`: Search handler protocol implementation
+- `judge_handler`: Judge handler protocol implementation
+- `config`: Configuration dictionary
+- `mode`: Orchestrator mode ("simple", "advanced", "magentic", or None for auto-detect)
+**Returns**: Orchestrator instance.
+**Raises**:
+- `ValueError`: If requirements not met
+**Modes**:
+- `"simple"`: Legacy orchestrator
+- `"advanced"` or `"magentic"`: Magentic orchestrator (requires OpenAI API key)
+- `None`: Auto-detect based on API key availability
+## MagenticOrchestrator
+**Module**: `src.orchestrator_magentic`
+**Purpose**: Multi-agent coordination using Microsoft Agent Framework.
+### Methods
+#### `run`
+```python
+async def run(
+    self,
+    query: str,
+    max_rounds: int = 15,
+    max_stalls: int = 3
+) -> AsyncGenerator[AgentEvent, None]
+```
+Runs Magentic orchestration.
+**Parameters**:
+- `query`: Research query string
+- `max_rounds`: Maximum rounds (default: 15)
+- `max_stalls`: Maximum stalls before reset (default: 3)
+**Yields**: `AgentEvent` objects converted from Magentic events.
+**Requirements**:
+- `agent-framework-core` package
+- OpenAI API key
+## See Also
+- [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
+- [Graph Orchestration](../architecture/graph-orchestration.md) - Graph execution details

docs/api/services.md ADDED Viewed

	@@ -0,0 +1,191 @@

+# Services API Reference
+This page documents the API for DeepCritical services.
+## EmbeddingService
+**Module**: `src.services.embeddings`
+**Purpose**: Local sentence-transformers for semantic search and deduplication.
+### Methods
+#### `embed`
+```python
+async def embed(self, text: str) -> list[float]
+```
+Generates embedding for a text string.
+**Parameters**:
+- `text`: Text to embed
+**Returns**: Embedding vector as list of floats.
+#### `embed_batch`
+```python
+async def embed_batch(self, texts: list[str]) -> list[list[float]]
+```
+Generates embeddings for multiple texts.
+**Parameters**:
+- `texts`: List of texts to embed
+**Returns**: List of embedding vectors.
+#### `similarity`
+```python
+async def similarity(self, text1: str, text2: str) -> float
+```
+Calculates similarity between two texts.
+**Parameters**:
+- `text1`: First text
+- `text2`: Second text
+**Returns**: Similarity score (0.0-1.0).
+#### `find_duplicates`
+```python
+async def find_duplicates(
+    self,
+    texts: list[str],
+    threshold: float = 0.85
+) -> list[tuple[int, int]]
+```
+Finds duplicate texts based on similarity threshold.
+**Parameters**:
+- `texts`: List of texts to check
+- `threshold`: Similarity threshold (default: 0.85)
+**Returns**: List of (index1, index2) tuples for duplicate pairs.
+### Factory Function
+#### `get_embedding_service`
+```python
+@lru_cache(maxsize=1)
+def get_embedding_service() -> EmbeddingService
+```
+Returns singleton EmbeddingService instance.
+## LlamaIndexRAGService
+**Module**: `src.services.rag`
+**Purpose**: Retrieval-Augmented Generation using LlamaIndex.
+### Methods
+#### `ingest_evidence`
+```python
+async def ingest_evidence(self, evidence: list[Evidence]) -> None
+```
+Ingests evidence into RAG service.
+**Parameters**:
+- `evidence`: List of Evidence objects to ingest
+**Note**: Requires OpenAI API key for embeddings.
+#### `retrieve`
+```python
+async def retrieve(
+    self,
+    query: str,
+    top_k: int = 5
+) -> list[Document]
+```
+Retrieves relevant documents for a query.
+**Parameters**:
+- `query`: Search query string
+- `top_k`: Number of top results to return (default: 5)
+**Returns**: List of Document objects with metadata.
+#### `query`
+```python
+async def query(
+    self,
+    query: str,
+    top_k: int = 5
+) -> str
+```
+Queries RAG service and returns formatted results.
+**Parameters**:
+- `query`: Search query string
+- `top_k`: Number of top results to return (default: 5)
+**Returns**: Formatted query results as string.
+### Factory Function
+#### `get_rag_service`
+```python
+@lru_cache(maxsize=1)
+def get_rag_service() -> LlamaIndexRAGService | None
+```
+Returns singleton LlamaIndexRAGService instance, or None if OpenAI key not available.
+## StatisticalAnalyzer
+**Module**: `src.services.statistical_analyzer`
+**Purpose**: Secure execution of AI-generated statistical code.
+### Methods
+#### `analyze`
+```python
+async def analyze(
+    self,
+    hypothesis: str,
+    evidence: list[Evidence],
+    data_description: str | None = None
+) -> AnalysisResult
+```
+Analyzes a hypothesis using statistical methods.
+**Parameters**:
+- `hypothesis`: Hypothesis to analyze
+- `evidence`: List of Evidence objects
+- `data_description`: Optional data description
+**Returns**: `AnalysisResult` with:
+- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
+- `code`: Generated analysis code
+- `output`: Execution output
+- `error`: Error message if execution failed
+**Note**: Requires Modal credentials for sandbox execution.
+## See Also
+- [Architecture - Services](../architecture/services.md) - Architecture overview
+- [Configuration](../configuration/index.md) - Service configuration

docs/api/tools.md ADDED Viewed

	@@ -0,0 +1,225 @@

+# Tools API Reference
+This page documents the API for DeepCritical search tools.
+## SearchTool Protocol
+All tools implement the `SearchTool` protocol:
+```python
+class SearchTool(Protocol):
+    @property
+    def name(self) -> str: ...
+    async def search(
+        self,
+        query: str,
+        max_results: int = 10
+    ) -> list[Evidence]: ...
+```
+## PubMedTool
+**Module**: `src.tools.pubmed`
+**Purpose**: Search peer-reviewed biomedical literature from PubMed.
+### Properties
+#### `name`
+```python
+@property
+def name(self) -> str
+```
+Returns tool name: `"pubmed"`
+### Methods
+#### `search`
+```python
+async def search(
+    self,
+    query: str,
+    max_results: int = 10
+) -> list[Evidence]
+```
+Searches PubMed for articles.
+**Parameters**:
+- `query`: Search query string
+- `max_results`: Maximum number of results to return (default: 10)
+**Returns**: List of `Evidence` objects with PubMed articles.
+**Raises**:
+- `SearchError`: If search fails
+- `RateLimitError`: If rate limit is exceeded
+## ClinicalTrialsTool
+**Module**: `src.tools.clinicaltrials`
+**Purpose**: Search ClinicalTrials.gov for interventional studies.
+### Properties
+#### `name`
+```python
+@property
+def name(self) -> str
+```
+Returns tool name: `"clinicaltrials"`
+### Methods
+#### `search`
+```python
+async def search(
+    self,
+    query: str,
+    max_results: int = 10
+) -> list[Evidence]
+```
+Searches ClinicalTrials.gov for trials.
+**Parameters**:
+- `query`: Search query string
+- `max_results`: Maximum number of results to return (default: 10)
+**Returns**: List of `Evidence` objects with clinical trials.
+**Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION
+**Raises**:
+- `SearchError`: If search fails
+## EuropePMCTool
+**Module**: `src.tools.europepmc`
+**Purpose**: Search Europe PMC for preprints and peer-reviewed articles.
+### Properties
+#### `name`
+```python
+@property
+def name(self) -> str
+```
+Returns tool name: `"europepmc"`
+### Methods
+#### `search`
+```python
+async def search(
+    self,
+    query: str,
+    max_results: int = 10
+) -> list[Evidence]
+```
+Searches Europe PMC for articles and preprints.
+**Parameters**:
+- `query`: Search query string
+- `max_results`: Maximum number of results to return (default: 10)
+**Returns**: List of `Evidence` objects with articles/preprints.
+**Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles.
+**Raises**:
+- `SearchError`: If search fails
+## RAGTool
+**Module**: `src.tools.rag_tool`
+**Purpose**: Semantic search within collected evidence.
+### Properties
+#### `name`
+```python
+@property
+def name(self) -> str
+```
+Returns tool name: `"rag"`
+### Methods
+#### `search`
+```python
+async def search(
+    self,
+    query: str,
+    max_results: int = 10
+) -> list[Evidence]
+```
+Searches collected evidence using semantic similarity.
+**Parameters**:
+- `query`: Search query string
+- `max_results`: Maximum number of results to return (default: 10)
+**Returns**: List of `Evidence` objects from collected evidence.
+**Note**: Requires evidence to be ingested into RAG service first.
+## SearchHandler
+**Module**: `src.tools.search_handler`
+**Purpose**: Orchestrates parallel searches across multiple tools.
+### Methods
+#### `search`
+```python
+async def search(
+    self,
+    query: str,
+    tools: list[SearchTool] | None = None,
+    max_results_per_tool: int = 10
+) -> SearchResult
+```
+Searches multiple tools in parallel.
+**Parameters**:
+- `query`: Search query string
+- `tools`: List of tools to use (default: all available tools)
+- `max_results_per_tool`: Maximum results per tool (default: 10)
+**Returns**: `SearchResult` with:
+- `evidence`: Aggregated list of evidence
+- `tool_results`: Results per tool
+- `total_count`: Total number of results
+**Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully.
+## See Also
+- [Architecture - Tools](../architecture/tools.md) - Architecture overview
+- [Models API](models.md) - Data models used by tools

docs/architecture/agents.md ADDED Viewed

	@@ -0,0 +1,182 @@

+# Agents Architecture
+DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents follow a consistent pattern and use structured output types.
+## Agent Pattern
+All agents use the Pydantic AI `Agent` class with the following structure:
+- **System Prompt**: Module-level constant with date injection
+- **Agent Class**: `__init__(model: Any | None = None)`
+- **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
+- **Factory Function**: `def create_agent_name(model: Any | None = None) -> AgentName`
+## Model Initialization
+Agents use `get_model()` from `src/agent_factory/judges.py` if no model is provided. This supports:
+- OpenAI models
+- Anthropic models
+- HuggingFace Inference API models
+The model selection is based on the configured `LLM_PROVIDER` in settings.
+## Error Handling
+Agents return fallback values on failure rather than raising exceptions:
+- `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`
+- Empty strings for text outputs
+- Default structured outputs
+All errors are logged with context using structlog.
+## Input Validation
+All agents validate inputs:
+- Check that queries/inputs are not empty
+- Truncate very long inputs with warnings
+- Handle None values gracefully
+## Output Types
+Agents use structured output types from `src/utils/models.py`:
+- `KnowledgeGapOutput`: Research completeness evaluation
+- `AgentSelectionPlan`: Tool selection plan
+- `ReportDraft`: Long-form report structure
+- `ParsedQuery`: Query parsing and mode detection
+For text output (writer agents), agents return `str` directly.
+## Agent Types
+### Knowledge Gap Agent
+**File**: `src/agents/knowledge_gap.py`
+**Purpose**: Evaluates research state and identifies knowledge gaps.
+**Output**: `KnowledgeGapOutput` with:
+- `research_complete`: Boolean indicating if research is complete
+- `outstanding_gaps`: List of remaining knowledge gaps
+**Methods**:
+- `async def evaluate(query, background_context, conversation_history, iteration, time_elapsed_minutes, max_time_minutes) -> KnowledgeGapOutput`
+### Tool Selector Agent
+**File**: `src/agents/tool_selector.py`
+**Purpose**: Selects appropriate tools for addressing knowledge gaps.
+**Output**: `AgentSelectionPlan` with list of `AgentTask` objects.
+**Available Agents**:
+- `WebSearchAgent`: General web search for fresh information
+- `SiteCrawlerAgent`: Research specific entities/companies
+- `RAGAgent`: Semantic search within collected evidence
+### Writer Agent
+**File**: `src/agents/writer.py`
+**Purpose**: Generates final reports from research findings.
+**Output**: Markdown string with numbered citations.
+**Methods**:
+- `async def write_report(query, findings, output_length, output_instructions) -> str`
+**Features**:
+- Validates inputs
+- Truncates very long findings (max 50000 chars) with warning
+- Retry logic for transient failures (3 retries)
+- Citation validation before returning
+### Long Writer Agent
+**File**: `src/agents/long_writer.py`
+**Purpose**: Long-form report generation with section-by-section writing.
+**Input/Output**: Uses `ReportDraft` models.
+**Methods**:
+- `async def write_next_section(query, draft, section_title, section_content) -> LongWriterOutput`
+- `async def write_report(query, report_title, report_draft) -> str`
+**Features**:
+- Writes sections iteratively
+- Aggregates references across sections
+- Reformats section headings and references
+- Deduplicates and renumbers references
+### Proofreader Agent
+**File**: `src/agents/proofreader.py`
+**Purpose**: Proofreads and polishes report drafts.
+**Input**: `ReportDraft`
+**Output**: Polished markdown string
+**Methods**:
+- `async def proofread(query, report_title, report_draft) -> str`
+**Features**:
+- Removes duplicate content across sections
+- Adds executive summary if multiple sections
+- Preserves all references and citations
+- Improves flow and readability
+### Thinking Agent
+**File**: `src/agents/thinking.py`
+**Purpose**: Generates observations from conversation history.
+**Output**: Observation string
+**Methods**:
+- `async def generate_observations(query, background_context, conversation_history) -> str`
+### Input Parser Agent
+**File**: `src/agents/input_parser.py`
+**Purpose**: Parses and improves user queries, detects research mode.
+**Output**: `ParsedQuery` with:
+- `original_query`: Original query string
+- `improved_query`: Refined query string
+- `research_mode`: "iterative" or "deep"
+- `key_entities`: List of key entities
+- `research_questions`: List of research questions
+## Factory Functions
+All agents have factory functions in `src/agent_factory/agents.py`:
+```python
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
+def create_writer_agent(model: Any | None = None) -> WriterAgent
+# ... etc
+```
+Factory functions:
+- Use `get_model()` if no model provided
+- Raise `ConfigurationError` if creation fails
+- Log agent creation
+## See Also
+- [Orchestrators](orchestrators.md) - How agents are orchestrated
+- [API Reference - Agents](../api/agents.md) - API documentation
+- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/design-patterns.md DELETED Viewed

@@ -1,1509 +0,0 @@
-# Design Patterns & Technical Decisions
-## Explicit Answers to Architecture Questions
----
-## Purpose of This Document
-This document explicitly answers all the "design pattern" questions raised in team discussions. It provides clear technical decisions with rationale.
----
-## 1. Primary Architecture Pattern
-### Decision: Orchestrator with Search-Judge Loop
-**Pattern Name**: Iterative Research Orchestrator
-**Structure**:
-```
-┌─────────────────────────────────────┐
-│    Research Orchestrator            │
-│  ┌───────────────────────────────┐  │
-│  │  Search Strategy Planner      │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Tool Coordinator             │  │
-│  │  - PubMed Search              │  │
-│  │  - Web Search                 │  │
-│  │  - Clinical Trials            │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Evidence Aggregator          │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Quality Judge                │  │
-│  │  (LLM-based assessment)       │  │
-│  └───────────────────────────────┘  │
-│              ↓                      │
-│       Loop or Synthesize?           │
-│              ↓                      │
-│  ┌───────────────────────────────┐  │
-│  │  Report Generator             │  │
-│  └───────────────────────────────┘  │
-└─────────────────────────────────────┘
-```
-**Why NOT single-agent?**
-- Need coordinated multi-tool queries
-- Need iterative refinement
-- Need quality assessment between searches
-**Why NOT pure ReAct?**
-- Medical research requires structured workflow
-- Need explicit quality gates
-- Want deterministic tool selection
-**Why THIS pattern?**
-- Clear separation of concerns
-- Testable components
-- Easy to debug
-- Proven in similar systems
----
-## 2. Tool Selection & Orchestration Pattern
-### Decision: Static Tool Registry with Dynamic Selection
-**Pattern**:
-```python
-class ToolRegistry:
-    """Central registry of available research tools"""
-    tools = {
-        'pubmed': PubMedSearchTool(),
-        'web': WebSearchTool(),
-        'trials': ClinicalTrialsTool(),
-        'drugs': DrugInfoTool(),
-    }
-class Orchestrator:
-    def select_tools(self, question: str, iteration: int) -> List[Tool]:
-        """Dynamically choose tools based on context"""
-        if iteration == 0:
-            # First pass: broad search
-            return [tools['pubmed'], tools['web']]
-        else:
-            # Refinement: targeted search
-            return self.judge.recommend_tools(question, context)
-```
-**Why NOT on-the-fly agent factories?**
-- 6-day timeline (too complex)
-- Tools are known upfront
-- Simpler to test and debug
-**Why NOT single tool?**
-- Need multiple evidence sources
-- Different tools for different info types
-- Better coverage
-**Why THIS pattern?**
-- Balance flexibility vs simplicity
-- Tools can be added easily
-- Selection logic is transparent
----
-## 3. Judge Pattern
-### Decision: Dual-Judge System (Quality + Budget)
-**Pattern**:
-```python
-class QualityJudge:
-    """LLM-based evidence quality assessment"""
-    def is_sufficient(self, question: str, evidence: List[Evidence]) -> bool:
-        """Main decision: do we have enough?"""
-        return (
-            self.has_mechanism_explanation(evidence) and
-            self.has_drug_candidates(evidence) and
-            self.has_clinical_evidence(evidence) and
-            self.confidence_score(evidence) > threshold
-        )
-    def identify_gaps(self, question: str, evidence: List[Evidence]) -> List[str]:
-        """What's missing?"""
-        gaps = []
-        if not self.has_mechanism_explanation(evidence):
-            gaps.append("disease mechanism")
-        if not self.has_drug_candidates(evidence):
-            gaps.append("potential drug candidates")
-        if not self.has_clinical_evidence(evidence):
-            gaps.append("clinical trial data")
-        return gaps
-class BudgetJudge:
-    """Resource constraint enforcement"""
-    def should_stop(self, state: ResearchState) -> bool:
-        """Hard limits"""
-        return (
-            state.tokens_used >= max_tokens or
-            state.iterations >= max_iterations or
-            state.time_elapsed >= max_time
-        )
-```
-**Why NOT just LLM judge?**
-- Cost control (prevent runaway queries)
-- Time bounds (hackathon demo needs to be fast)
-- Safety (prevent infinite loops)
-**Why NOT just token budget?**
-- Want early exit when answer is good
-- Quality matters, not just quantity
-- Better user experience
-**Why THIS pattern?**
-- Best of both worlds
-- Clear separation (quality vs resources)
-- Each judge has single responsibility
----
-## 4. Break/Stopping Pattern
-### Decision: Three-Tier Break Conditions
-**Pattern**:
-```python
-def should_continue(state: ResearchState) -> bool:
-    """Multi-tier stopping logic"""
-    # Tier 1: Quality-based (ideal stop)
-    if quality_judge.is_sufficient(state.question, state.evidence):
-        state.stop_reason = "sufficient_evidence"
-        return False
-    # Tier 2: Budget-based (cost control)
-    if state.tokens_used >= config.max_tokens:
-        state.stop_reason = "token_budget_exceeded"
-        return False
-    # Tier 3: Iteration-based (safety)
-    if state.iterations >= config.max_iterations:
-        state.stop_reason = "max_iterations_reached"
-        return False
-    # Tier 4: Time-based (demo friendly)
-    if state.time_elapsed >= config.max_time:
-        state.stop_reason = "timeout"
-        return False
-    return True  # Continue researching
-```
-**Configuration**:
-```toml
-[research.limits]
-max_tokens = 50000      # ~$0.50 at Claude pricing
-max_iterations = 5      # Reasonable depth
-max_time_seconds = 120  # 2 minutes for demo
-judge_threshold = 0.8   # Quality confidence score
-```
-**Why multiple conditions?**
-- Defense in depth
-- Different failure modes
-- Graceful degradation
-**Why these specific limits?**
-- Tokens: Balances cost vs quality
-- Iterations: Enough for refinement, not too deep
-- Time: Fast enough for live demo
-- Judge: High bar for quality
----
-## 5. State Management Pattern
-### Decision: Pydantic State Machine with Checkpoints
-**Pattern**:
-```python
-class ResearchState(BaseModel):
-    """Immutable state snapshots"""
-    query_id: str
-    question: str
-    iteration: int = 0
-    evidence: List[Evidence] = []
-    tokens_used: int = 0
-    search_history: List[SearchQuery] = []
-    stop_reason: Optional[str] = None
-    created_at: datetime
-    updated_at: datetime
-class StateManager:
-    def save_checkpoint(self, state: ResearchState) -> None:
-        """Save state to disk"""
-        path = f".deepresearch/checkpoints/{state.query_id}_iter{state.iteration}.json"
-        path.write_text(state.model_dump_json(indent=2))
-    def load_checkpoint(self, query_id: str, iteration: int) -> ResearchState:
-        """Resume from checkpoint"""
-        path = f".deepresearch/checkpoints/{query_id}_iter{iteration}.json"
-        return ResearchState.model_validate_json(path.read_text())
-```
-**Directory Structure**:
-```
-.deepresearch/
-├── state/
-│   └── current_123.json          # Active research state
-├── checkpoints/
-│   ├── query_123_iter0.json      # Checkpoint after iteration 0
-│   ├── query_123_iter1.json      # Checkpoint after iteration 1
-│   └── query_123_iter2.json      # Checkpoint after iteration 2
-└── workspace/
-    └── query_123/
-        ├── papers/                # Downloaded PDFs
-        ├── search_results/        # Raw search results
-        └── analysis/              # Intermediate analysis
-```
-**Why Pydantic?**
-- Type safety
-- Validation
-- Easy serialization
-- Integration with Pydantic AI
-**Why checkpoints?**
-- Resume interrupted research
-- Debugging (inspect state at each iteration)
-- Cost savings (don't re-query)
-- Demo resilience
----
-## 6. Tool Interface Pattern
-### Decision: Async Unified Tool Protocol
-**Pattern**:
-```python
-from typing import Protocol, Optional, List, Dict
-import asyncio
-class ResearchTool(Protocol):
-    """Standard async interface all tools must implement"""
-    async def search(
-        self,
-        query: str,
-        max_results: int = 10,
-        filters: Optional[Dict] = None
-    ) -> List[Evidence]:
-        """Execute search and return structured evidence"""
-        ...
-    def get_metadata(self) -> ToolMetadata:
-        """Tool capabilities and requirements"""
-        ...
-class PubMedSearchTool:
-    """Concrete async implementation"""
-    def __init__(self):
-        self._rate_limiter = asyncio.Semaphore(3)  # 3 req/sec
-        self._cache: Dict[str, List[Evidence]] = {}
-    async def search(self, query: str, max_results: int = 10, **kwargs) -> List[Evidence]:
-        # Check cache first
-        cache_key = f"{query}:{max_results}"
-        if cache_key in self._cache:
-            return self._cache[cache_key]
-        async with self._rate_limiter:
-            # 1. Query PubMed E-utilities API (async httpx)
-            async with httpx.AsyncClient() as client:
-                response = await client.get(
-                    "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
-                    params={"db": "pubmed", "term": query, "retmax": max_results}
-                )
-            # 2. Parse XML response
-            # 3. Extract: title, abstract, authors, citations
-            # 4. Convert to Evidence objects
-            evidence_list = self._parse_response(response.text)
-            # Cache results
-            self._cache[cache_key] = evidence_list
-            return evidence_list
-    def get_metadata(self) -> ToolMetadata:
-        return ToolMetadata(
-            name="PubMed",
-            description="Biomedical literature search",
-            rate_limit="3 requests/second",
-            requires_api_key=False
-        )
-```
-**Parallel Tool Execution**:
-```python
-async def search_all_tools(query: str, tools: List[ResearchTool]) -> List[Evidence]:
-    """Run all tool searches in parallel"""
-    tasks = [tool.search(query) for tool in tools]
-    results = await asyncio.gather(*tasks, return_exceptions=True)
-    # Flatten and filter errors
-    evidence = []
-    for result in results:
-        if isinstance(result, Exception):
-            logger.warning(f"Tool failed: {result}")
-        else:
-            evidence.extend(result)
-    return evidence
-```
-**Why Async?**
-- Tools are I/O bound (network calls)
-- Parallel execution = faster searches
-- Better UX (streaming progress)
-- Standard in 2025 Python
-**Why Protocol?**
-- Loose coupling
-- Easy to add new tools
-- Testable with mocks
-- Clear contract
-**Why NOT abstract base class?**
-- More Pythonic (PEP 544)
-- Duck typing friendly
-- Runtime checking with isinstance
----
-## 7. Report Generation Pattern
-### Decision: Structured Output with Citations
-**Pattern**:
-```python
-class DrugCandidate(BaseModel):
-    name: str
-    mechanism: str
-    evidence_quality: Literal["strong", "moderate", "weak"]
-    clinical_status: str  # "FDA approved", "Phase 2", etc.
-    citations: List[Citation]
-class ResearchReport(BaseModel):
-    query: str
-    disease_mechanism: str
-    candidates: List[DrugCandidate]
-    methodology: str  # How we searched
-    confidence: float
-    sources_used: List[str]
-    generated_at: datetime
-    def to_markdown(self) -> str:
-        """Human-readable format"""
-        ...
-    def to_json(self) -> str:
-        """Machine-readable format"""
-        ...
-```
-**Output Example**:
-```markdown
-# Research Report: Long COVID Fatigue
-## Disease Mechanism
-Long COVID fatigue is associated with mitochondrial dysfunction
-and persistent inflammation [1, 2].
-## Drug Candidates
-### 1. Coenzyme Q10 (CoQ10) - STRONG EVIDENCE
-- **Mechanism**: Mitochondrial support, ATP production
-- **Status**: FDA approved (supplement)
-- **Evidence**: 2 randomized controlled trials showing fatigue reduction
-- **Citations**:
-  - Smith et al. (2023) - PubMed: 12345678
-  - Johnson et al. (2023) - PubMed: 87654321
-### 2. Low-dose Naltrexone (LDN) - MODERATE EVIDENCE
-- **Mechanism**: Anti-inflammatory, immune modulation
-- **Status**: FDA approved (different indication)
-- **Evidence**: 3 case studies, 1 ongoing Phase 2 trial
-- **Citations**: ...
-## Methodology
-- Searched PubMed: 45 papers reviewed
-- Searched Web: 12 sources
-- Clinical trials: 8 trials identified
-- Total iterations: 3
-- Tokens used: 12,450
-## Confidence: 85%
-## Sources
-- PubMed E-utilities
-- ClinicalTrials.gov
-- OpenFDA Database
-```
-**Why structured?**
-- Parseable by other systems
-- Consistent format
-- Easy to validate
-- Good for datasets
-**Why markdown?**
-- Human-readable
-- Renders nicely in Gradio
-- Easy to convert to PDF
-- Standard format
----
-## 8. Error Handling Pattern
-### Decision: Graceful Degradation with Fallbacks
-**Pattern**:
-```python
-class ResearchAgent:
-    def research(self, question: str) -> ResearchReport:
-        try:
-            return self._research_with_retry(question)
-        except TokenBudgetExceeded:
-            # Return partial results
-            return self._synthesize_partial(state)
-        except ToolFailure as e:
-            # Try alternate tools
-            return self._research_with_fallback(question, failed_tool=e.tool)
-        except Exception as e:
-            # Log and return error report
-            logger.error(f"Research failed: {e}")
-            return self._error_report(question, error=e)
-```
-**Why NOT fail fast?**
-- Hackathon demo must be robust
-- Partial results better than nothing
-- Good user experience
-**Why NOT silent failures?**
-- Need visibility for debugging
-- User should know limitations
-- Honest about confidence
----
-## 9. Configuration Pattern
-### Decision: Hydra-inspired but Simpler
-**Pattern**:
-```toml
-# config.toml
-[research]
-max_iterations = 5
-max_tokens = 50000
-max_time_seconds = 120
-judge_threshold = 0.85
-[tools]
-enabled = ["pubmed", "web", "trials"]
-[tools.pubmed]
-max_results = 20
-rate_limit = 3  # per second
-[tools.web]
-engine = "serpapi"
-max_results = 10
-[llm]
-provider = "anthropic"
-model = "claude-3-5-sonnet-20241022"
-temperature = 0.1
-[output]
-format = "markdown"
-include_citations = true
-include_methodology = true
-```
-**Loading**:
-```python
-from pathlib import Path
-import tomllib
-def load_config() -> dict:
-    config_path = Path("config.toml")
-    with open(config_path, "rb") as f:
-        return tomllib.load(f)
-```
-**Why NOT full Hydra?**
-- Simpler for hackathon
-- Easier to understand
-- Faster to modify
-- Can upgrade later
-**Why TOML?**
-- Human-readable
-- Standard (PEP 680)
-- Better than YAML edge cases
-- Native in Python 3.11+
----
-## 10. Testing Pattern
-### Decision: Three-Level Testing Strategy
-**Pattern**:
-```python
-# Level 1: Unit tests (fast, isolated)
-def test_pubmed_tool():
-    tool = PubMedSearchTool()
-    results = tool.search("aspirin cardiovascular")
-    assert len(results) > 0
-    assert all(isinstance(r, Evidence) for r in results)
-# Level 2: Integration tests (tools + agent)
-def test_research_loop():
-    agent = ResearchAgent(config=test_config)
-    report = agent.research("aspirin repurposing")
-    assert report.candidates
-    assert report.confidence > 0
-# Level 3: End-to-end tests (full system)
-def test_full_workflow():
-    # Simulate user query through Gradio UI
-    response = gradio_app.predict("test query")
-    assert "Drug Candidates" in response
-```
-**Why three levels?**
-- Fast feedback (unit tests)
-- Confidence (integration tests)
-- Reality check (e2e tests)
-**Test Data**:
-```python
-# tests/fixtures/
-- mock_pubmed_response.xml
-- mock_web_results.json
-- sample_research_query.txt
-- expected_report.md
-```
----
-## 11. Judge Prompt Templates
-### Decision: Structured JSON Output with Domain-Specific Criteria
-**Quality Judge System Prompt**:
-```python
-QUALITY_JUDGE_SYSTEM = """You are a medical research quality assessor specializing in drug repurposing.
-Your task is to evaluate if collected evidence is sufficient to answer a drug repurposing question.
-You assess evidence against four criteria specific to drug repurposing research:
-1. MECHANISM: Understanding of the disease's molecular/cellular mechanisms
-2. CANDIDATES: Identification of potential drug candidates with known mechanisms
-3. EVIDENCE: Clinical or preclinical evidence supporting repurposing
-4. SOURCES: Quality and credibility of sources (peer-reviewed > preprints > web)
-You MUST respond with valid JSON only. No other text."""
-```
-**Quality Judge User Prompt**:
-```python
-QUALITY_JUDGE_USER = """
-## Research Question
-{question}
-## Evidence Collected (Iteration {iteration} of {max_iterations})
-{evidence_summary}
-## Token Budget
-Used: {tokens_used} / {max_tokens}
-## Your Assessment
-Evaluate the evidence and respond with this exact JSON structure:
-```json
-{{
-  "assessment": {{
-    "mechanism_score": <0-10>,
-    "mechanism_reasoning": "<Step-by-step analysis of mechanism understanding>",
-    "candidates_score": <0-10>,
-    "candidates_found": ["<drug1>", "<drug2>", ...],
-    "evidence_score": <0-10>,
-    "evidence_reasoning": "<Critical evaluation of clinical/preclinical support>",
-    "sources_score": <0-10>,
-    "sources_breakdown": {{
-      "peer_reviewed": <count>,
-      "clinical_trials": <count>,
-      "preprints": <count>,
-      "other": <count>
-    }}
-  }},
-  "overall_confidence": <0.0-1.0>,
-  "sufficient": <true/false>,
-  "gaps": ["<missing info 1>", "<missing info 2>"],
-  "recommended_searches": ["<search query 1>", "<search query 2>"],
-  "recommendation": "<continue|synthesize>"
-}}
-```
-Decision rules:
-- sufficient=true if overall_confidence >= 0.8 AND mechanism_score >= 6 AND candidates_score >= 6
-- sufficient=true if remaining budget < 10% (must synthesize with what we have)
-- Otherwise, provide recommended_searches to fill gaps
-"""
-```
-**Report Synthesis Prompt**:
-```python
-SYNTHESIS_PROMPT = """You are a medical research synthesizer creating a drug repurposing report.
-## Research Question
-{question}
-## Collected Evidence
-{all_evidence}
-## Judge Assessment
-{final_assessment}
-## Your Task
-Create a comprehensive research report with this structure:
-1. **Executive Summary** (2-3 sentences)
-2. **Disease Mechanism** - What we understand about the condition
-3. **Drug Candidates** - For each candidate:
-   - Drug name and current FDA status
-   - Proposed mechanism for this condition
-   - Evidence quality (strong/moderate/weak)
-   - Key citations
-4. **Methodology** - How we searched (tools used, queries, iterations)
-5. **Limitations** - What we couldn't find or verify
-6. **Confidence Score** - Overall confidence in findings
-Format as Markdown. Include PubMed IDs as citations [PMID: 12345678].
-Be scientifically accurate. Do not hallucinate drug names or mechanisms.
-If evidence is weak, say so clearly."""
-```
-**Why Structured JSON?**
-- Parseable by code (not just LLM output)
-- Consistent format for logging/debugging
-- Can trigger specific actions (continue vs synthesize)
-- Testable with expected outputs
-**Why Domain-Specific Criteria?**
-- Generic "is this good?" prompts fail
-- Drug repurposing has specific requirements
-- Physician on team validated criteria
-- Maps to real research workflow
----
-## 12. MCP Server Integration (Hackathon Track)
-### Decision: Tools as MCP Servers for Reusability
-**Why MCP?**
-- Hackathon has dedicated MCP track
-- Makes our tools reusable by others
-- Standard protocol (Model Context Protocol)
-- Future-proof (industry adoption growing)
-**Architecture**:
-```
-┌─────────────────────────────────────────────────┐
-│  DeepCritical Agent                             │
-│  (uses tools directly OR via MCP)               │
-└─────────────────────────────────────────────────┘
-                      │
-         ┌────────────┼────────────┐
-         ↓            ↓            ↓
-┌─────────────┐ ┌──────────┐ ┌───────────────┐
-│ PubMed MCP  │ │ Web MCP  │ │ Trials MCP    │
-│ Server      │ │ Server   │ │ Server        │
-└─────────────┘ └──────────┘ └───────────────┘
-         │            │            │
-         ↓            ↓            ↓
-    PubMed API   Brave/DDG   ClinicalTrials.gov
-```
-**PubMed MCP Server Implementation**:
-```python
-# src/mcp_servers/pubmed_server.py
-from fastmcp import FastMCP
-mcp = FastMCP("PubMed Research Tool")
-@mcp.tool()
-async def search_pubmed(
-    query: str,
-    max_results: int = 10,
-    date_range: str = "5y"
-) -> dict:
-    """
-    Search PubMed for biomedical literature.
-    Args:
-        query: Search terms (supports PubMed syntax like [MeSH])
-        max_results: Maximum papers to return (default 10, max 100)
-        date_range: Time filter - "1y", "5y", "10y", or "all"
-    Returns:
-        dict with papers list containing title, abstract, authors, pmid, date
-    """
-    tool = PubMedSearchTool()
-    results = await tool.search(query, max_results)
-    return {
-        "query": query,
-        "count": len(results),
-        "papers": [r.model_dump() for r in results]
-    }
-@mcp.tool()
-async def get_paper_details(pmid: str) -> dict:
-    """
-    Get full details for a specific PubMed paper.
-    Args:
-        pmid: PubMed ID (e.g., "12345678")
-    Returns:
-        Full paper metadata including abstract, MeSH terms, references
-    """
-    tool = PubMedSearchTool()
-    return await tool.get_details(pmid)
-if __name__ == "__main__":
-    mcp.run()
-```
-**Running the MCP Server**:
-```bash
-# Start the server
-python -m src.mcp_servers.pubmed_server
-# Or with uvx (recommended)
-uvx fastmcp run src/mcp_servers/pubmed_server.py
-# Note: fastmcp uses stdio transport by default, which is perfect
-# for local integration with Claude Desktop or the main agent.
-```
-**Claude Desktop Integration** (for demo):
-```json
-// ~/Library/Application Support/Claude/claude_desktop_config.json
-{
-  "mcpServers": {
-    "pubmed": {
-      "command": "python",
-      "args": ["-m", "src.mcp_servers.pubmed_server"],
-      "cwd": "/path/to/deepcritical"
-    }
-  }
-}
-```
-**Why FastMCP?**
-- Simple decorator syntax
-- Handles protocol complexity
-- Good docs and examples
-- Works with Claude Desktop and API
-**MCP Track Submission Requirements**:
-- [ ] At least one tool as MCP server
-- [ ] README with setup instructions
-- [ ] Demo showing MCP usage
-- [ ] Bonus: Multiple tools as MCP servers
----
-## 13. Gradio UI Pattern (Hackathon Track)
-### Decision: Streaming Progress with Modern UI
-**Pattern**:
-```python
-import gradio as gr
-from typing import Generator
-def research_with_streaming(question: str) -> Generator[str, None, None]:
-    """Stream research progress to UI"""
-    yield "🔍 Starting research...\n\n"
-    agent = ResearchAgent()
-    async for event in agent.research_stream(question):
-        match event.type:
-            case "search_start":
-                yield f"📚 Searching {event.tool}...\n"
-            case "search_complete":
-                yield f"✅ Found {event.count} results from {event.tool}\n"
-            case "judge_thinking":
-                yield f"🤔 Evaluating evidence quality...\n"
-            case "judge_decision":
-                yield f"📊 Confidence: {event.confidence:.0%}\n"
-            case "iteration_complete":
-                yield f"🔄 Iteration {event.iteration} complete\n\n"
-            case "synthesis_start":
-                yield f"📝 Generating report...\n"
-            case "complete":
-                yield f"\n---\n\n{event.report}"
-# Gradio 5 UI
-with gr.Blocks(theme=gr.themes.Soft()) as demo:
-    gr.Markdown("# 🔬 DeepCritical: Drug Repurposing Research Agent")
-    gr.Markdown("Ask a question about potential drug repurposing opportunities.")
-    with gr.Row():
-        with gr.Column(scale=2):
-            question = gr.Textbox(
-                label="Research Question",
-                placeholder="What existing drugs might help treat long COVID fatigue?",
-                lines=2
-            )
-            examples = gr.Examples(
-                examples=[
-                    "What existing drugs might help treat long COVID fatigue?",
-                    "Find existing drugs that might slow Alzheimer's progression",
-                    "Which diabetes drugs show promise for cancer treatment?"
-                ],
-                inputs=question
-            )
-            submit = gr.Button("🚀 Start Research", variant="primary")
-        with gr.Column(scale=3):
-            output = gr.Markdown(label="Research Progress & Report")
-    submit.click(
-        fn=research_with_streaming,
-        inputs=question,
-        outputs=output,
-    )
-demo.launch()
-```
-**Why Streaming?**
-- User sees progress, not loading spinner
-- Builds trust (system is working)
-- Better UX for long operations
-- Gradio 5 native support
-**Why gr.Markdown Output?**
-- Research reports are markdown
-- Renders citations nicely
-- Code blocks for methodology
-- Tables for drug comparisons
----
-## Summary: Design Decision Table
-| # | Question | Decision | Why |
-|---|----------|----------|-----|
-| 1 | **Architecture** | Orchestrator with search-judge loop | Clear, testable, proven |
-| 2 | **Tools** | Static registry, dynamic selection | Balance flexibility vs simplicity |
-| 3 | **Judge** | Dual (quality + budget) | Quality + cost control |
-| 4 | **Stopping** | Four-tier conditions | Defense in depth |
-| 5 | **State** | Pydantic + checkpoints | Type-safe, resumable |
-| 6 | **Tool Interface** | Async Protocol + parallel execution | Fast I/O, modern Python |
-| 7 | **Output** | Structured + Markdown | Human & machine readable |
-| 8 | **Errors** | Graceful degradation + fallbacks | Robust for demo |
-| 9 | **Config** | TOML (Hydra-inspired) | Simple, standard |
-| 10 | **Testing** | Three levels | Fast feedback + confidence |
-| 11 | **Judge Prompts** | Structured JSON + domain criteria | Parseable, medical-specific |
-| 12 | **MCP** | Tools as MCP servers | Hackathon track, reusability |
-| 13 | **UI** | Gradio 5 streaming | Progress visibility, modern UX |
----
-## Answers to Specific Questions
-### "What's the orchestrator pattern?"
-**Answer**: See Section 1 - Iterative Research Orchestrator with search-judge loop
-### "LLM-as-judge or token budget?"
-**Answer**: Both - See Section 3 (Dual-Judge System) and Section 4 (Three-Tier Break Conditions)
-### "What's the break pattern?"
-**Answer**: See Section 4 - Three stopping conditions: quality threshold, token budget, max iterations
-### "Should we use agent factories?"
-**Answer**: No - See Section 2. Static tool registry is simpler for 6-day timeline
-### "How do we handle state?"
-**Answer**: See Section 5 - Pydantic state machine with checkpoints
----
-## Appendix: Complete Data Models
-```python
-# src/deepresearch/models.py
-from pydantic import BaseModel, Field
-from typing import List, Optional, Literal
-from datetime import datetime
-class Citation(BaseModel):
-    """Reference to a source"""
-    source_type: Literal["pubmed", "web", "trial", "fda"]
-    identifier: str  # PMID, URL, NCT number, etc.
-    title: str
-    authors: Optional[List[str]] = None
-    date: Optional[str] = None
-    url: Optional[str] = None
-class Evidence(BaseModel):
-    """Single piece of evidence from search"""
-    content: str
-    source: Citation
-    relevance_score: float = Field(ge=0, le=1)
-    evidence_type: Literal["mechanism", "candidate", "clinical", "safety"]
-class DrugCandidate(BaseModel):
-    """Potential drug for repurposing"""
-    name: str
-    generic_name: Optional[str] = None
-    mechanism: str
-    current_indications: List[str]
-    proposed_mechanism: str
-    evidence_quality: Literal["strong", "moderate", "weak"]
-    fda_status: str
-    citations: List[Citation]
-class JudgeAssessment(BaseModel):
-    """Output from quality judge"""
-    mechanism_score: int = Field(ge=0, le=10)
-    candidates_score: int = Field(ge=0, le=10)
-    evidence_score: int = Field(ge=0, le=10)
-    sources_score: int = Field(ge=0, le=10)
-    overall_confidence: float = Field(ge=0, le=1)
-    sufficient: bool
-    gaps: List[str]
-    recommended_searches: List[str]
-    recommendation: Literal["continue", "synthesize"]
-class ResearchState(BaseModel):
-    """Complete state of a research session"""
-    query_id: str
-    question: str
-    iteration: int = 0
-    evidence: List[Evidence] = []
-    assessments: List[JudgeAssessment] = []
-    tokens_used: int = 0
-    search_history: List[str] = []
-    stop_reason: Optional[str] = None
-    created_at: datetime = Field(default_factory=datetime.utcnow)
-    updated_at: datetime = Field(default_factory=datetime.utcnow)
-class ResearchReport(BaseModel):
-    """Final output report"""
-    query: str
-    executive_summary: str
-    disease_mechanism: str
-    candidates: List[DrugCandidate]
-    methodology: str
-    limitations: str
-    confidence: float
-    sources_used: int
-    tokens_used: int
-    iterations: int
-    generated_at: datetime = Field(default_factory=datetime.utcnow)
-    def to_markdown(self) -> str:
-        """Render as markdown for Gradio"""
-        md = f"# Research Report: {self.query}\n\n"
-        md += f"## Executive Summary\n{self.executive_summary}\n\n"
-        md += f"## Disease Mechanism\n{self.disease_mechanism}\n\n"
-        md += "## Drug Candidates\n\n"
-        for i, drug in enumerate(self.candidates, 1):
-            md += f"### {i}. {drug.name} - {drug.evidence_quality.upper()} EVIDENCE\n"
-            md += f"- **Mechanism**: {drug.proposed_mechanism}\n"
-            md += f"- **FDA Status**: {drug.fda_status}\n"
-            md += f"- **Current Uses**: {', '.join(drug.current_indications)}\n"
-            md += f"- **Citations**: {len(drug.citations)} sources\n\n"
-        md += f"## Methodology\n{self.methodology}\n\n"
-        md += f"## Limitations\n{self.limitations}\n\n"
-        md += f"## Confidence: {self.confidence:.0%}\n"
-        return md
-```
----
-## 14. Alternative Frameworks Considered
-We researched major agent frameworks before settling on our stack. Here's why we chose what we chose, and what we'd steal if we're shipping like animals and have time for Gucci upgrades.
-### Frameworks Evaluated
-| Framework | Repo | What It Does |
-|-----------|------|--------------|
-| **Microsoft AutoGen** | [github.com/microsoft/autogen](https://github.com/microsoft/autogen) | Multi-agent orchestration, complex workflows |
-| **Claude Agent SDK** | [github.com/anthropics/claude-agent-sdk-python](https://github.com/anthropics/claude-agent-sdk-python) | Anthropic's official agent framework |
-| **Pydantic AI** | [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai) | Type-safe agents, structured outputs |
-### Why NOT AutoGen (Microsoft)?
-**Pros:**
-- Battle-tested multi-agent orchestration
-- `reflect_on_tool_use` - model reviews its own tool results
-- `max_tool_iterations` - built-in iteration limits
-- Concurrent tool execution
-- Rich ecosystem (AutoGen Studio, benchmarks)
-**Cons for MVP:**
-- Heavy dependency tree (50+ packages)
-- Complex configuration (YAML + Python)
-- Overkill for single-agent search-judge loop
-- Learning curve eats into 6-day timeline
-**Verdict:** Great for multi-agent systems. Overkill for our MVP.
-### Why NOT Claude Agent SDK (Anthropic)?
-**Pros:**
-- Official Anthropic framework
-- Clean `@tool` decorator pattern
-- In-process MCP servers (no subprocess)
-- Hooks for pre/post tool execution
-- Direct Claude Code integration
-**Cons for MVP:**
-- Requires Claude Code CLI bundled
-- Node.js dependency for some features
-- Designed for Claude Code ecosystem, not standalone agents
-- Less flexible for custom LLM providers
-**Verdict:** Would be great if we were building ON Claude Code. We're building a standalone agent.
-### Why Pydantic AI + FastMCP (Our Choice)
-**Pros:**
-- ✅ Simple, Pythonic API
-- ✅ Native async/await
-- ✅ Type-safe with Pydantic
-- ✅ Works with any LLM provider
-- ✅ FastMCP for clean MCP servers
-- ✅ Minimal dependencies
-- ✅ Can ship MVP in 6 days
-**Cons:**
-- Newer framework (less battle-tested)
-- Smaller ecosystem
-- May need to build more from scratch
-**Verdict:** Right tool for the job. Ship fast, iterate later.
----
-## 15. Stretch Goals: Gucci Bangers (If We're Shipping Like Animals)
-If MVP ships early and we're crushing it, here's what we'd steal from other frameworks:
-### Tier 1: Quick Wins (2-4 hours each)
-#### From Claude Agent SDK: `@tool` Decorator Pattern
-Replace our Protocol-based tools with cleaner decorators:
-```python
-# CURRENT (Protocol-based)
-class PubMedSearchTool:
-    async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
-        ...
-# UPGRADE (Decorator-based, stolen from Claude SDK)
-from claude_agent_sdk import tool
-@tool("search_pubmed", "Search PubMed for biomedical papers", {
-    "query": str,
-    "max_results": int
-})
-async def search_pubmed(args):
-    results = await _do_pubmed_search(args["query"], args["max_results"])
-    return {"content": [{"type": "text", "text": json.dumps(results)}]}
-```
-**Why it's Gucci:** Cleaner syntax, automatic schema generation, less boilerplate.
-#### From AutoGen: Reflect on Tool Use
-Add a reflection step where the model reviews its own tool results:
-```python
-# CURRENT: Judge evaluates evidence
-assessment = await judge.assess(question, evidence)
-# UPGRADE: Add reflection step (stolen from AutoGen)
-class ReflectiveJudge:
-    async def assess_with_reflection(self, question, evidence, tool_results):
-        # First pass: raw assessment
-        initial = await self._assess(question, evidence)
-        # Reflection: "Did I use the tools correctly?"
-        reflection = await self._reflect_on_tool_use(tool_results)
-        # Final: combine assessment + reflection
-        return self._combine(initial, reflection)
-```
-**Why it's Gucci:** Catches tool misuse, improves accuracy, more robust judge.
-### Tier 2: Medium Lifts (4-8 hours each)
-#### From AutoGen: Concurrent Tool Execution
-Run multiple tools in parallel with proper error handling:
-```python
-# CURRENT: Sequential with asyncio.gather
-results = await asyncio.gather(*[tool.search(query) for tool in tools])
-# UPGRADE: AutoGen-style with cancellation + timeout
-from autogen_core import CancellationToken
-async def execute_tools_concurrent(tools, query, timeout=30):
-    token = CancellationToken()
-    async def run_with_timeout(tool):
-        try:
-            return await asyncio.wait_for(
-                tool.search(query, cancellation_token=token),
-                timeout=timeout
-            )
-        except asyncio.TimeoutError:
-            token.cancel()  # Cancel other tools
-            return ToolError(f"{tool.name} timed out")
-    return await asyncio.gather(*[run_with_timeout(t) for t in tools])
-```
-**Why it's Gucci:** Proper timeout handling, cancellation propagation, production-ready.
-#### From Claude SDK: Hooks System
-Add pre/post hooks for logging, validation, cost tracking:
-```python
-# UPGRADE: Hook system (stolen from Claude SDK)
-class HookManager:
-    async def pre_tool_use(self, tool_name, args):
-        """Called before every tool execution"""
-        logger.info(f"Calling {tool_name} with {args}")
-        self.cost_tracker.start_timer()
-    async def post_tool_use(self, tool_name, result, duration):
-        """Called after every tool execution"""
-        self.cost_tracker.record(tool_name, duration)
-        if result.is_error:
-            self.error_tracker.record(tool_name, result.error)
-```
-**Why it's Gucci:** Observability, debugging, cost tracking, production-ready.
-### Tier 3: Big Lifts (Post-Hackathon)
-#### Full AutoGen Integration
-If we want multi-agent capabilities later:
-```python
-# POST-HACKATHON: Multi-agent drug repurposing
-from autogen_agentchat import AssistantAgent, GroupChat
-literature_agent = AssistantAgent(
-    name="LiteratureReviewer",
-    tools=[pubmed_search, web_search],
-    system_message="You search and summarize medical literature."
-)
-mechanism_agent = AssistantAgent(
-    name="MechanismAnalyzer",
-    tools=[pathway_db, protein_db],
-    system_message="You analyze disease mechanisms and drug targets."
-)
-synthesis_agent = AssistantAgent(
-    name="ReportSynthesizer",
-    system_message="You synthesize findings into actionable reports."
-)
-# Orchestrate multi-agent workflow
-group_chat = GroupChat(
-    agents=[literature_agent, mechanism_agent, synthesis_agent],
-    max_round=10
-)
-```
-**Why it's Gucci:** True multi-agent collaboration, specialized roles, scalable.
----
-## Priority Order for Stretch Goals
-| Priority | Feature | Source | Effort | Impact |
-|----------|---------|--------|--------|--------|
-| 1 | `@tool` decorator | Claude SDK | 2 hrs | High - cleaner code |
-| 2 | Reflect on tool use | AutoGen | 3 hrs | High - better accuracy |
-| 3 | Hooks system | Claude SDK | 4 hrs | Medium - observability |
-| 4 | Concurrent + cancellation | AutoGen | 4 hrs | Medium - robustness |
-| 5 | Multi-agent | AutoGen | 8+ hrs | Post-hackathon |
----
-## The Bottom Line
-```
-┌─────────────────────────────────────────────────────────────┐
-│  MVP (Days 1-4): Pydantic AI + FastMCP                      │
-│  - Ship working drug repurposing agent                      │
-│  - Search-judge loop with PubMed + Web                      │
-│  - Gradio UI with streaming                                 │
-│  - MCP server for hackathon track                           │
-├─────────────────────────────────────────────────────────────┤
-│  If Crushing It (Days 5-6): Steal the Gucci                 │
-│  - @tool decorators from Claude SDK                         │
-│  - Reflect on tool use from AutoGen                         │
-│  - Hooks for observability                                  │
-├─────────────────────────────────────────────────────────────┤
-│  Post-Hackathon: Full AutoGen Integration                   │
-│  - Multi-agent workflows                                    │
-│  - Specialized agent roles                                  │
-│  - Production-grade orchestration                           │
-└─────────────────────────────────────────────────────────────┘
-```
-**Ship MVP first. Steal bangers if time. Scale later.**
----
-## 16. Reference Implementation Resources
-We've cloned production-ready repos into `reference_repos/` that we can vendor, copy from, or just USE directly. This section documents what's available and how to leverage it.
-### Cloned Repositories
-| Repository | Location | What It Provides |
-|------------|----------|------------------|
-| **pydanticai-research-agent** | `reference_repos/pydanticai-research-agent/` | Complete PydanticAI agent with Brave Search |
-| **pubmed-mcp-server** | `reference_repos/pubmed-mcp-server/` | Production-grade PubMed MCP server (TypeScript) |
-| **autogen-microsoft** | `reference_repos/autogen-microsoft/` | Microsoft's multi-agent framework |
-| **claude-agent-sdk** | `reference_repos/claude-agent-sdk/` | Anthropic's agent SDK with @tool decorator |
-### 🔥 CHEAT CODE: Production PubMed MCP Already Exists
-The `pubmed-mcp-server` is **production-grade** and has EVERYTHING we need:
-```bash
-# Already available tools in pubmed-mcp-server:
-pubmed_search_articles    # Search PubMed with filters, date ranges
-pubmed_fetch_contents     # Get full article details by PMID
-pubmed_article_connections # Find citations, related articles
-pubmed_research_agent     # Generate research plan outlines
-pubmed_generate_chart     # Create PNG charts from data
-```
-**Option 1: Use it directly via npx**
-```json
-{
-  "mcpServers": {
-    "pubmed": {
-      "command": "npx",
-      "args": ["@cyanheads/pubmed-mcp-server"],
-      "env": { "NCBI_API_KEY": "your_key" }
-    }
-  }
-}
-```
-**Option 2: Vendor the logic into Python**
-The TypeScript code in `reference_repos/pubmed-mcp-server/src/` shows exactly how to:
-- Construct PubMed E-utilities queries
-- Handle rate limiting (3/sec without key, 10/sec with key)
-- Parse XML responses
-- Extract article metadata
-### PydanticAI Research Agent Patterns
-The `pydanticai-research-agent` repo provides copy-paste patterns:
-**Agent Definition** (`agents/research_agent.py`):
-```python
-from pydantic_ai import Agent, RunContext
-from dataclasses import dataclass
-@dataclass
-class ResearchAgentDependencies:
-    brave_api_key: str
-    session_id: Optional[str] = None
-research_agent = Agent(
-    get_llm_model(),
-    deps_type=ResearchAgentDependencies,
-    system_prompt=SYSTEM_PROMPT
-)
-@research_agent.tool
-async def search_web(
-    ctx: RunContext[ResearchAgentDependencies],
-    query: str,
-    max_results: int = 10
-) -> List[Dict[str, Any]]:
-    """Search with context access via ctx.deps"""
-    results = await search_web_tool(ctx.deps.brave_api_key, query, max_results)
-    return results
-```
-**Brave Search Tool** (`tools/brave_search.py`):
-```python
-async def search_web_tool(api_key: str, query: str, count: int = 10) -> List[Dict]:
-    headers = {"X-Subscription-Token": api_key, "Accept": "application/json"}
-    async with httpx.AsyncClient() as client:
-        response = await client.get(
-            "https://api.search.brave.com/res/v1/web/search",
-            headers=headers,
-            params={"q": query, "count": count},
-            timeout=30.0
-        )
-    # Handle 429 rate limit, 401 auth errors
-    data = response.json()
-    return data.get("web", {}).get("results", [])
-```
-**Pydantic Models** (`models/research_models.py`):
-```python
-class BraveSearchResult(BaseModel):
-    title: str
-    url: str
-    description: str
-    score: float = Field(ge=0.0, le=1.0)
-```
-### Microsoft Agent Framework Orchestration Patterns
-From [deepwiki.com/microsoft/agent-framework](https://deepwiki.com/microsoft/agent-framework/3.4-workflows-and-orchestration):
-#### Sequential Orchestration
-```
-Agent A → Agent B → Agent C (each receives prior outputs)
-```
-**Use when:** Tasks have dependencies, results inform next steps.
-#### Concurrent (Fan-out/Fan-in)
-```
-           ┌→ Agent A ─┐
-Dispatcher ├→ Agent B ─┼→ Aggregator
-           └→ Agent C ─┘
-```
-**Use when:** Independent tasks can run in parallel, results need consolidation.
-**Our use:** Parallel PubMed + Web search.
-#### Handoff Orchestration
-```
-Coordinator → routes to → Specialist A, B, or C based on request
-```
-**Use when:** Router decides which search strategy based on query type.
-**Our use:** Route "mechanism" vs "clinical trial" vs "drug info" queries.
-#### HITL (Human-in-the-Loop)
-```
-Agent → RequestInfoEvent → Human validates → Agent continues
-```
-**Use when:** Critical judgment points need human validation.
-**Our use:** Optional "approve drug candidates before synthesis" step.
-### Recommended Hybrid Pattern for Our Agent
-Based on all the research, here's our recommended implementation:
-```
-┌─────────────────────────────────────────────────────────┐
-│  1. ROUTER (Handoff Pattern)                             │
-│     - Analyze query type                                 │
-│     - Choose search strategy                             │
-├─────────────────────────────────────────────────────────┤
-│  2. SEARCH (Concurrent Pattern)                          │
-│     - Fan-out to PubMed + Web in parallel                │
-│     - Timeout handling per AutoGen patterns              │
-│     - Aggregate results                                  │
-├─────────────────────────────────────────────────────────┤
-│  3. JUDGE (Sequential + Budget)                          │
-│     - Quality assessment                                 │
-│     - Token/iteration budget check                       │
-│     - Recommend: continue or synthesize                  │
-├─────────────────────────────────────────────────────────┤
-│  4. SYNTHESIZE (Final Agent)                             │
-│     - Generate research report                           │
-│     - Include citations                                  │
-│     - Stream to Gradio UI                                │
-└─────────────────────────────────────────────────────────┘
-```
-### Quick Start: Minimal Implementation Path
-**Day 1-2: Core Loop**
-1. Copy `search_web_tool` from `pydanticai-research-agent/tools/brave_search.py`
-2. Implement PubMed search (reference `pubmed-mcp-server/src/` for E-utilities patterns)
-3. Wire up basic search-judge loop
-**Day 3: Judge + State**
-1. Implement quality judge with JSON structured output
-2. Add budget judge
-3. Add Pydantic state management
-**Day 4: UI + MCP**
-1. Gradio streaming UI
-2. Wrap PubMed tool as FastMCP server
-**Day 5-6: Polish + Deploy**
-1. HuggingFace Spaces deployment
-2. Demo video
-3. Stretch goals if time
----
-## 17. External Resources & MCP Servers
-### Available PubMed MCP Servers (Community)
-| Server | Author | Features | Link |
-|--------|--------|----------|------|
-| **pubmed-mcp-server** | cyanheads | Full E-utilities, research agent, charts | [GitHub](https://github.com/cyanheads/pubmed-mcp-server) |
-| **BioMCP** | GenomOncology | PubMed + ClinicalTrials + MyVariant | [GitHub](https://github.com/genomoncology/biomcp) |
-| **PubMed-MCP-Server** | JackKuo666 | Basic search, metadata access | [GitHub](https://github.com/JackKuo666/PubMed-MCP-Server) |
-### Web Search Options
-| Tool | Free Tier | API Key | Async Support |
-|------|-----------|---------|---------------|
-| **Brave Search** | 2000/month | Required | Yes (httpx) |
-| **DuckDuckGo** | Unlimited | No | Yes (duckduckgo-search) |
-| **SerpAPI** | None | Required | Yes |
-**Recommended:** Start with DuckDuckGo (free, no key), upgrade to Brave for production.
-```python
-# DuckDuckGo async search (no API key needed!)
-from duckduckgo_search import DDGS
-async def search_ddg(query: str, max_results: int = 10) -> List[Dict]:
-    with DDGS() as ddgs:
-        results = list(ddgs.text(query, max_results=max_results))
-    return [{"title": r["title"], "url": r["href"], "description": r["body"]} for r in results]
-```
----
-**Document Status**: Official Architecture Spec
-**Review Score**: 100/100 (Ironclad Gucci Banger Edition)
-**Sections**: 17 design patterns + data models appendix + reference repos + stretch goals
-**Last Updated**: November 2025

docs/architecture/graph-orchestration.md ADDED Viewed

	@@ -0,0 +1,152 @@

+# Graph Orchestration Architecture
+## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
+## Graph Structure
+### Nodes
+Graph nodes represent different stages in the research workflow:
+1. **Agent Nodes**: Execute Pydantic AI agents
+   - Input: Prompt/query
+   - Output: Structured or unstructured response
+   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
+2. **State Nodes**: Update or read workflow state
+   - Input: Current state
+   - Output: Updated state
+   - Examples: Update evidence, update conversation history
+3. **Decision Nodes**: Make routing decisions based on conditions
+   - Input: Current state/results
+   - Output: Next node ID
+   - Examples: Continue research vs. complete research
+4. **Parallel Nodes**: Execute multiple nodes concurrently
+   - Input: List of node IDs
+   - Output: Aggregated results
+   - Examples: Parallel iterative research loops
+### Edges
+Edges define transitions between nodes:
+1. **Sequential Edges**: Always traversed (no condition)
+   - From: Source node
+   - To: Target node
+   - Condition: None (always True)
+2. **Conditional Edges**: Traversed based on condition
+   - From: Source node
+   - To: Target node
+   - Condition: Callable that returns bool
+   - Example: If research complete → go to writer, else → continue loop
+3. **Parallel Edges**: Used for parallel execution branches
+   - From: Parallel node
+   - To: Multiple target nodes
+   - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
+## State Management
+State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
+- **Evidence**: Collected evidence from searches
+- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
+- **Embedding Service**: For semantic search
+State transitions occur at state nodes, which update the global workflow state.
+## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
+6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
+## Conditional Routing
+Decision nodes evaluate conditions and return next node IDs:
+- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
+- **Budget Decision**: If budget exceeded → exit, else → continue
+- **Iteration Decision**: If max iterations → exit, else → continue
+## Parallel Execution
+Parallel nodes execute multiple nodes concurrently:
+- Each parallel branch runs independently
+- Results are aggregated after all branches complete
+- State is synchronized after parallel execution
+- Errors in one branch don't stop other branches
+## Budget Enforcement
+Budget constraints are enforced at decision nodes:
+- **Token Budget**: Track LLM token usage
+- **Time Budget**: Track elapsed time
+- **Iteration Budget**: Track iteration count
+If any budget is exceeded, execution routes to exit node.
+## Error Handling
+Errors are handled at multiple levels:
+1. **Node Level**: Catch errors in individual node execution
+2. **Graph Level**: Handle errors during graph traversal
+3. **State Level**: Rollback state changes on error
+Errors are logged and yield error events for UI.
+## Backward Compatibility
+Graph execution is optional via feature flag:
+- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
+- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
+This allows gradual migration and fallback if needed.

docs/architecture/graph_orchestration.md CHANGED Viewed

	@@ -137,6 +137,14 @@ Graph execution is optional via feature flag:
137
138	This allows gradual migration and fallback if needed.
139








140
141
142

 This allows gradual migration and fallback if needed.
+## See Also
+- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Workflows](workflows.md) - Workflow diagrams and patterns
+- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/middleware.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Middleware Architecture
+DeepCritical uses middleware for state management, budget tracking, and workflow coordination.
+## State Management
+### WorkflowState
+**File**: `src/middleware/state_machine.py`
+**Purpose**: Thread-safe state management for research workflows
+**Implementation**: Uses `ContextVar` for thread-safe isolation
+**State Components**:
+- `evidence: list[Evidence]`: Collected evidence from searches
+- `conversation: Conversation`: Iteration history (gaps, tool calls, findings, thoughts)
+- `embedding_service: Any`: Embedding service for semantic search
+**Methods**:
+- `add_evidence(evidence: Evidence)`: Adds evidence with URL-based deduplication
+- `async search_related(query: str, top_k: int = 5) -> list[Evidence]`: Semantic search
+**Initialization**:
+```python
+from src.middleware.state_machine import init_workflow_state
+init_workflow_state(embedding_service)
+```
+**Access**:
+```python
+from src.middleware.state_machine import get_workflow_state
+state = get_workflow_state()  # Auto-initializes if missing
+```
+## Workflow Manager
+**File**: `src/middleware/workflow_manager.py`
+**Purpose**: Coordinates parallel research loops
+**Methods**:
+- `add_loop(loop: ResearchLoop)`: Add a research loop to manage
+- `async run_loops_parallel() -> list[ResearchLoop]`: Run all loops in parallel
+- `update_loop_status(loop_id: str, status: str)`: Update loop status
+- `sync_loop_evidence_to_state()`: Synchronize evidence from loops to global state
+**Features**:
+- Uses `asyncio.gather()` for parallel execution
+- Handles errors per loop (doesn't fail all if one fails)
+- Tracks loop status: `pending`, `running`, `completed`, `failed`, `cancelled`
+- Evidence deduplication across parallel loops
+**Usage**:
+```python
+from src.middleware.workflow_manager import WorkflowManager
+manager = WorkflowManager()
+manager.add_loop(loop1)
+manager.add_loop(loop2)
+completed_loops = await manager.run_loops_parallel()
+```
+## Budget Tracker
+**File**: `src/middleware/budget_tracker.py`
+**Purpose**: Tracks and enforces resource limits
+**Budget Components**:
+- **Tokens**: LLM token usage
+- **Time**: Elapsed time in seconds
+- **Iterations**: Number of iterations
+**Methods**:
+- `create_budget(token_limit, time_limit_seconds, iterations_limit) -> BudgetStatus`
+- `add_tokens(tokens: int)`: Add token usage
+- `start_timer()`: Start time tracking
+- `update_timer()`: Update elapsed time
+- `increment_iteration()`: Increment iteration count
+- `check_budget() -> BudgetStatus`: Check current budget status
+- `can_continue() -> bool`: Check if research can continue
+**Token Estimation**:
+- `estimate_tokens(text: str) -> int`: ~4 chars per token
+- `estimate_llm_call_tokens(prompt: str, response: str) -> int`: Estimate LLM call tokens
+**Usage**:
+```python
+from src.middleware.budget_tracker import BudgetTracker
+tracker = BudgetTracker()
+budget = tracker.create_budget(
+    token_limit=100000,
+    time_limit_seconds=600,
+    iterations_limit=10
+)
+tracker.start_timer()
+# ... research operations ...
+if not tracker.can_continue():
+    # Budget exceeded, stop research
+    pass
+```
+## Models
+All middleware models are defined in `src/utils/models.py`:
+- `IterationData`: Data for a single iteration
+- `Conversation`: Conversation history with iterations
+- `ResearchLoop`: Research loop state and configuration
+- `BudgetStatus`: Current budget status
+## Thread Safety
+All middleware components use `ContextVar` for thread-safe isolation:
+- Each request/thread has its own workflow state
+- No global mutable state
+- Safe for concurrent requests
+## See Also
+- [Orchestrators](orchestrators.md) - How middleware is used in orchestration
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
+- [Contributing - Code Style](../contributing/code-style.md) - Development guidelines

docs/architecture/orchestrators.md ADDED Viewed

	@@ -0,0 +1,198 @@

+# Orchestrators Architecture
+DeepCritical supports multiple orchestration patterns for research workflows.
+## Research Flows
+### IterativeResearchFlow
+**File**: `src/orchestrator/research_flow.py`
+**Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
+**Agents Used**:
+- `KnowledgeGapAgent`: Evaluates research completeness
+- `ToolSelectorAgent`: Selects tools for addressing gaps
+- `ThinkingAgent`: Generates observations
+- `WriterAgent`: Creates final report
+- `JudgeHandler`: Assesses evidence sufficiency
+**Features**:
+- Tracks iterations, time, budget
+- Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
+- Iterates until research complete or constraints met
+**Usage**:
+```python
+from src.orchestrator.research_flow import IterativeResearchFlow
+flow = IterativeResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=False
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
+### DeepResearchFlow
+**File**: `src/orchestrator/research_flow.py`
+**Pattern**: Planner → Parallel iterative loops per section → Synthesizer
+**Agents Used**:
+- `PlannerAgent`: Breaks query into report sections
+- `IterativeResearchFlow`: Per-section research (parallel)
+- `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
+**Features**:
+- Uses `WorkflowManager` for parallel execution
+- Budget tracking per section and globally
+- State synchronization across parallel loops
+- Supports graph execution and agent chains
+**Usage**:
+```python
+from src.orchestrator.research_flow import DeepResearchFlow
+flow = DeepResearchFlow(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    use_graph=True
+)
+async for event in flow.run(query):
+    # Handle events
+    pass
+```
+## Graph Orchestrator
+**File**: `src/orchestrator/graph_orchestrator.py`
+**Purpose**: Graph-based execution using Pydantic AI agents as nodes
+**Features**:
+- Uses Pydantic AI Graphs (when available) or agent chains (fallback)
+- Routes based on research mode (iterative/deep/auto)
+- Streams `AgentEvent` objects for UI
+**Node Types**:
+- **Agent Nodes**: Execute Pydantic AI agents
+- **State Nodes**: Update or read workflow state
+- **Decision Nodes**: Make routing decisions
+- **Parallel Nodes**: Execute multiple nodes concurrently
+**Edge Types**:
+- **Sequential Edges**: Always traversed
+- **Conditional Edges**: Traversed based on condition
+- **Parallel Edges**: Used for parallel execution branches
+## Orchestrator Factory
+**File**: `src/orchestrator_factory.py`
+**Purpose**: Factory for creating orchestrators
+**Modes**:
+- **Simple**: Legacy orchestrator (backward compatible)
+- **Advanced**: Magentic orchestrator (requires OpenAI API key)
+- **Auto-detect**: Chooses based on API key availability
+**Usage**:
+```python
+from src.orchestrator_factory import create_orchestrator
+orchestrator = create_orchestrator(
+    search_handler=search_handler,
+    judge_handler=judge_handler,
+    config={},
+    mode="advanced"  # or "simple" or None for auto-detect
+)
+```
+## Magentic Orchestrator
+**File**: `src/orchestrator_magentic.py`
+**Purpose**: Multi-agent coordination using Microsoft Agent Framework
+**Features**:
+- Uses `agent-framework-core`
+- ChatAgent pattern with internal LLMs per agent
+- `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
+- Manager orchestrates agents via `OpenAIChatClient`
+- Requires OpenAI API key (function calling support)
+- Event-driven: converts Magentic events to `AgentEvent` for UI streaming
+**Requirements**:
+- `agent-framework-core` package
+- OpenAI API key
+## Hierarchical Orchestrator
+**File**: `src/orchestrator_hierarchical.py`
+**Purpose**: Hierarchical orchestrator using middleware and sub-teams
+**Features**:
+- Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
+- Adapts Magentic ChatAgent to `SubIterationTeam` protocol
+- Event-driven via `asyncio.Queue` for coordination
+- Supports sub-iteration patterns for complex research tasks
+## Legacy Simple Mode
+**File**: `src/legacy_orchestrator.py`
+**Purpose**: Linear search-judge-synthesize loop
+**Features**:
+- Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
+- Generator-based design yielding `AgentEvent` objects
+- Backward compatibility for simple use cases
+## State Initialization
+All orchestrators must initialize workflow state:
+```python
+from src.middleware.state_machine import init_workflow_state
+from src.services.embeddings import get_embedding_service
+embedding_service = get_embedding_service()
+init_workflow_state(embedding_service)
+```
+## Event Streaming
+All orchestrators yield `AgentEvent` objects:
+**Event Types**:
+- `started`: Research started
+- `search_complete`: Search completed
+- `judge_complete`: Evidence evaluation completed
+- `hypothesizing`: Generating hypotheses
+- `synthesizing`: Synthesizing results
+- `complete`: Research completed
+- `error`: Error occurred
+**Event Structure**:
+```python
+class AgentEvent:
+    type: str
+    iteration: int | None
+    data: dict[str, Any]
+```
+## See Also
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow diagrams and patterns
+- [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/architecture/overview.md DELETED Viewed

@@ -1,474 +0,0 @@
-# DeepCritical: Medical Drug Repurposing Research Agent
-## Project Overview
----
-## Executive Summary
-**DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.
-### The Problem We Solve
-Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
-- Search thousands of papers across multiple databases
-- Identify molecular mechanisms
-- Find relevant clinical trials
-- Assess safety profiles
-- Synthesize evidence into actionable insights
-**DeepCritical automates this process from hours to minutes.**
-### What Is Drug Repurposing?
-**Simple Explanation:**
-Using existing approved drugs to treat NEW diseases they weren't originally designed for.
-**Real Examples:**
-- **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
-- **Thalidomide**: Once banned → Now treats multiple myeloma
-- **Aspirin**: Pain reliever → Heart attack prevention
-- **Metformin**: Diabetes drug → Being tested for aging/longevity
-**Why It Matters:**
-- Faster than developing new drugs (years vs decades)
-- Cheaper (known safety profiles)
-- Lower risk (already FDA approved)
-- Immediate patient benefit potential
----
-## Core Use Case
-### Primary Query Type
-> "What existing drugs might help treat [disease/condition]?"
-### Example Queries
-1. **Long COVID Fatigue**
-   - Query: "What existing drugs might help treat long COVID fatigue?"
-   - Agent searches: PubMed, clinical trials, drug databases
-   - Output: List of candidate drugs with mechanisms + evidence + citations
-2. **Alzheimer's Disease**
-   - Query: "Find existing drugs that target beta-amyloid pathways"
-   - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
-   - Output: Comprehensive research report with drug candidates
-3. **Rare Disease Treatment**
-   - Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
-   - Agent finds: Similar conditions → Shared pathways → Potential treatments
-   - Output: Evidence-based treatment suggestions
----
-## System Architecture
-### High-Level Design (Phases 1-8)
-```text
-User Query
-    ↓
-Gradio UI (Phase 4)
-    ↓
-Magentic Manager (Phase 5) ← LLM-powered coordinator
-    ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
-    ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
-    ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
-    └── ReportAgent (Phase 8) ←→ Final Synthesis
-    ↓
-Structured Research Report
-```
-### Key Components
-1. **Magentic Manager (Orchestrator)**
-   - LLM-powered multi-agent coordinator
-   - Dynamic planning and agent selection
-   - Built-in stall detection and replanning
-   - Microsoft Agent Framework integration
-2. **SearchAgent (Phase 2+5+6)**
-   - PubMed E-utilities search
-   - DuckDuckGo web search
-   - Semantic search via ChromaDB (Phase 6)
-   - Evidence deduplication
-3. **HypothesisAgent (Phase 7)**
-   - Generates Drug → Target → Pathway → Effect hypotheses
-   - Guides targeted searches
-   - Scientific reasoning about mechanisms
-4. **JudgeAgent (Phase 3+5)**
-   - LLM-based evidence assessment
-   - Mechanism score + Clinical score
-   - Recommends continue/synthesize
-   - Generates refined search queries
-5. **ReportAgent (Phase 8)**
-   - Structured scientific reports
-   - Executive summary, methodology
-   - Hypotheses tested with evidence counts
-   - Proper citations and limitations
-6. **Gradio UI (Phase 4)**
-   - Chat interface for questions
-   - Real-time progress via events
-   - Mode toggle (Simple/Magentic)
-   - Formatted markdown output
----
-## Design Patterns
-### 1. Search-and-Judge Loop (Primary Pattern)
-```python
-def research(question: str) -> Report:
-    context = []
-    for iteration in range(max_iterations):
-        # SEARCH: Query relevant tools
-        results = search_tools(question, context)
-        context.extend(results)
-        # JUDGE: Evaluate quality
-        if judge.is_sufficient(question, context):
-            break
-        # REFINE: Adjust search strategy
-        query = refine_query(question, context)
-    # SYNTHESIZE: Generate report
-    return synthesize_report(question, context)
-```
-**Why This Pattern:**
-- Simple to implement and debug
-- Clear loop termination conditions
-- Iterative improvement of search quality
-- Balances depth vs speed
-### 2. Multi-Tool Orchestration
-```
-Question → Agent decides which tools to use
-           ↓
-       ┌───┴────┬─────────┬──────────┐
-       ↓        ↓         ↓          ↓
-   PubMed  Web Search  Trials DB  Drug DB
-       ↓        ↓         ↓          ↓
-       └───┬────┴─────────┴──��───────┘
-           ↓
-    Aggregate Results → Judge
-```
-**Why This Pattern:**
-- Different sources provide different evidence types
-- Parallel tool execution (when possible)
-- Comprehensive coverage
-### 3. LLM-as-Judge with Token Budget
-**Dual Stopping Conditions:**
-- **Smart Stop**: LLM judge says "we have sufficient evidence"
-- **Hard Stop**: Token budget exhausted OR max iterations reached
-**Why Both:**
-- Judge enables early exit when answer is good
-- Budget prevents runaway costs
-- Iterations prevent infinite loops
-### 4. Stateful Checkpointing
-```
-.deepresearch/
-├── state/
-│   └── query_123.json    # Current research state
-├── checkpoints/
-│   └── query_123_iter3/  # Checkpoint at iteration 3
-└── workspace/
-    └── query_123/        # Downloaded papers, data
-```
-**Why This Pattern:**
-- Resume interrupted research
-- Debugging and analysis
-- Cost savings (don't re-search)
----
-## Component Breakdown
-### Agent (Orchestrator)
-- **Responsibility**: Coordinate research process
-- **Size**: ~100 lines
-- **Key Methods**:
-  - `research(question)` - Main entry point
-  - `plan_search_strategy()` - Decide what to search
-  - `execute_search()` - Run tool queries
-  - `evaluate_progress()` - Call judge
-  - `synthesize_findings()` - Generate report
-### Tools
-- **Responsibility**: Interface with external data sources
-- **Size**: ~50 lines per tool
-- **Implementations**:
-  - `PubMedTool` - Search biomedical literature
-  - `WebSearchTool` - General medical information
-  - `ClinicalTrialsTool` - Trial data (optional)
-  - `DrugInfoTool` - FDA drug database (optional)
-### Judge
-- **Responsibility**: Evaluate evidence quality
-- **Size**: ~50 lines
-- **Key Methods**:
-  - `is_sufficient(question, evidence)` → bool
-  - `assess_quality(evidence)` → score
-  - `identify_gaps(question, evidence)` → missing_info
-### Gradio App
-- **Responsibility**: User interface
-- **Size**: ~50 lines
-- **Features**:
-  - Text input for questions
-  - Progress indicators
-  - Formatted output with citations
-  - Download research report
----
-## Technical Stack
-### Core Dependencies
-```toml
-[dependencies]
-python = ">=3.10"
-pydantic = "^2.7"
-pydantic-ai = "^0.0.16"
-fastmcp = "^0.1.0"
-gradio = "^5.0"
-beautifulsoup4 = "^4.12"
-httpx = "^0.27"
-```
-### Optional Enhancements
-- `modal` - For GPU-accelerated local LLM
-- `fastmcp` - MCP server integration
-- `sentence-transformers` - Semantic search
-- `faiss-cpu` - Vector similarity
-### Tool APIs & Rate Limits
-| API | Cost | Rate Limit | API Key? | Notes |
-|-----|------|------------|----------|-------|
-| **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
-| **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
-| **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
-| **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
-| **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |
-**Web Search Strategy (Priority Order):**
-1. **Brave Search API** (free tier: 2000 queries/month) - Primary
-2. **DuckDuckGo** (unofficial, no API key) - Fallback
-3. **SerpAPI** ($50/month) - Only if free options fail
-**Why NOT SerpAPI first?**
-- Costs money (hackathon budget = $0)
-- Free alternatives work fine for demo
-- Can upgrade later if needed
----
-## Success Criteria
-### Phase 1-5 (MVP) ✅ COMPLETE
-**Completed in ONE DAY:**
-- [x] User can ask drug repurposing question
-- [x] Agent searches PubMed (async)
-- [x] Agent searches web (DuckDuckGo)
-- [x] LLM judge evaluates evidence quality
-- [x] System respects token budget and iterations
-- [x] Output includes drug candidates + citations
-- [x] Works end-to-end for demo query
-- [x] Gradio UI with streaming progress
-- [x] Magentic multi-agent orchestration
-- [x] 38 unit tests passing
-- [x] CI/CD pipeline green
-### Hackathon Submission ✅ COMPLETE
-- [x] Gradio UI deployed on HuggingFace Spaces
-- [x] Example queries working and tested
-- [x] Architecture documentation
-- [x] README with setup instructions
-### Phase 6-8 (Enhanced)
-**Specs ready for implementation:**
-- [ ] Embeddings & Semantic Search (Phase 6)
-- [ ] Hypothesis Agent (Phase 7)
-- [ ] Report Agent (Phase 8)
-### What's EXPLICITLY Out of Scope
-**NOT building (to stay focused):**
-- ❌ User authentication
-- ❌ Database storage of queries
-- ❌ Multi-user support
-- ❌ Payment/billing
-- ❌ Production monitoring
-- ❌ Mobile UI
----
-## Implementation Timeline
-### Day 1 (Today): Architecture & Setup
-- [x] Define use case (drug repurposing) ✅
-- [x] Write architecture docs ✅
-- [ ] Create project structure
-- [ ] First PR: Structure + Docs
-### Day 2: Core Agent Loop
-- [ ] Implement basic orchestrator
-- [ ] Add PubMed search tool
-- [ ] Simple judge (keyword-based)
-- [ ] Test with 1 query
-### Day 3: Intelligence Layer
-- [ ] Upgrade to LLM judge
-- [ ] Add web search tool
-- [ ] Token budget tracking
-- [ ] Test with multiple queries
-### Day 4: UI & Integration
-- [ ] Build Gradio interface
-- [ ] Wire up agent to UI
-- [ ] Add progress indicators
-- [ ] Format output nicely
-### Day 5: Polish & Extend
-- [ ] Add more tools (clinical trials)
-- [ ] Improve judge prompts
-- [ ] Checkpoint system
-- [ ] Error handling
-### Day 6: Deploy & Document
-- [ ] Deploy to HuggingFace Spaces
-- [ ] Record demo video
-- [ ] Write submission materials
-- [ ] Final testing
----
-## Questions This Document Answers
-### For The Maintainer
-**Q: "What should our design pattern be?"**
-A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)
-**Q: "Should we use LLM-as-judge or token budget?"**
-A: Both - judge for smart stopping, budget for cost control
-**Q: "What's the break pattern?"**
-A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)
-**Q: "What components do we need?"**
-A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)
-### For The Team
-**Q: "What are we actually building?"**
-A: Medical drug repurposing research agent (see Core Use Case)
-**Q: "How complex should it be?"**
-A: Simple but complete - ~300 lines of core code (see Component sizes)
-**Q: "What's the timeline?"**
-A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)
-**Q: "What datasets/APIs do we use?"**
-A: PubMed (free), web search, clinical trials.gov (see Tool APIs)
----
-## Next Steps
-1. **Review this document** - Team feedback on architecture
-2. **Finalize design** - Incorporate feedback
-3. **Create project structure** - Scaffold repository
-4. **Move to proper docs** - `docs/architecture/` folder
-5. **Open first PR** - Structure + Documentation
-6. **Start implementation** - Day 2 onward
----
-## Notes & Decisions
-### Why Drug Repurposing?
-- Clear, impressive use case
-- Real-world medical impact
-- Good data availability (PubMed, trials)
-- Easy to explain (Viagra example!)
-- Physician on team ✅
-### Why Simple Architecture?
-- 6-day timeline
-- Need working end-to-end system
-- Hackathon judges value "works" over "complex"
-- Can extend later if successful
-### Why These Tools First?
-- PubMed: Best biomedical literature source
-- Web search: General medical knowledge
-- Clinical trials: Evidence of actual testing
-- Others: Nice-to-have, not critical for MVP
----
----
-## Appendix A: Demo Queries (Pre-tested)
-These queries will be used for demo and testing. They're chosen because:
-1. They have good PubMed coverage
-2. They're medically interesting
-3. They show the system's capabilities
-### Primary Demo Query
-```
-"What existing drugs might help treat long COVID fatigue?"
-```
-**Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
-**Expected sources**: 20+ PubMed papers, 2-3 clinical trials
-### Secondary Demo Queries
-```
-"Find existing drugs that might slow Alzheimer's progression"
-"What approved medications could help with fibromyalgia pain?"
-"Which diabetes drugs show promise for cancer treatment?"
-```
-### Why These Queries?
-- Represent real clinical needs
-- Have substantial literature
-- Show diverse drug classes
-- Physician on team can validate results
----
-## Appendix B: Risk Assessment
-| Risk | Likelihood | Impact | Mitigation |
-|------|------------|--------|------------|
-| PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
-| Web search API fails | Low | Medium | DuckDuckGo fallback |
-| LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
-| Judge quality poor | Medium | High | Pre-test prompts, iterate |
-| HuggingFace deploy issues | Low | High | Test deployment Day 4 |
-| Demo crashes live | Medium | High | Pre-recorded backup video |
----
----
-**Document Status**: Official Architecture Spec
-**Review Score**: 98/100
-**Last Updated**: November 2025

docs/architecture/services.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Services Architecture
+DeepCritical provides several services for embeddings, RAG, and statistical analysis.
+## Embedding Service
+**File**: `src/services/embeddings.py`
+**Purpose**: Local sentence-transformers for semantic search and deduplication
+**Features**:
+- **No API Key Required**: Uses local sentence-transformers models
+- **Async-Safe**: All operations use `run_in_executor()` to avoid blocking
+- **ChromaDB Storage**: Vector storage for embeddings
+- **Deduplication**: 0.85 similarity threshold (85% similarity = duplicate)
+**Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
+**Methods**:
+- `async def embed(text: str) -> list[float]`: Generate embeddings
+- `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding
+- `async def similarity(text1: str, text2: str) -> float`: Calculate similarity
+- `async def find_duplicates(texts: list[str], threshold: float = 0.85) -> list[tuple[int, int]]`: Find duplicates
+**Usage**:
+```python
+from src.services.embeddings import get_embedding_service
+service = get_embedding_service()
+embedding = await service.embed("text to embed")
+```
+## LlamaIndex RAG Service
+**File**: `src/services/rag.py`
+**Purpose**: Retrieval-Augmented Generation using LlamaIndex
+**Features**:
+- **OpenAI Embeddings**: Requires `OPENAI_API_KEY`
+- **ChromaDB Storage**: Vector database for document storage
+- **Metadata Preservation**: Preserves source, title, URL, date, authors
+- **Lazy Initialization**: Graceful fallback if OpenAI key not available
+**Methods**:
+- `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
+- `async def retrieve(query: str, top_k: int = 5) -> list[Document]`: Retrieve relevant documents
+- `async def query(query: str, top_k: int = 5) -> str`: Query with RAG
+**Usage**:
+```python
+from src.services.rag import get_rag_service
+service = get_rag_service()
+if service:
+    documents = await service.retrieve("query", top_k=5)
+```
+## Statistical Analyzer
+**File**: `src/services/statistical_analyzer.py`
+**Purpose**: Secure execution of AI-generated statistical code
+**Features**:
+- **Modal Sandbox**: Secure, isolated execution environment
+- **Code Generation**: Generates Python code via LLM
+- **Library Pinning**: Version-pinned libraries in `SANDBOX_LIBRARIES`
+- **Network Isolation**: `block_network=True` by default
+**Libraries Available**:
+- pandas, numpy, scipy
+- matplotlib, scikit-learn
+- statsmodels
+**Output**: `AnalysisResult` with:
+- `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
+- `code`: Generated analysis code
+- `output`: Execution output
+- `error`: Error message if execution failed
+**Usage**:
+```python
+from src.services.statistical_analyzer import StatisticalAnalyzer
+analyzer = StatisticalAnalyzer()
+result = await analyzer.analyze(
+    hypothesis="Metformin reduces cancer risk",
+    evidence=evidence_list
+)
+```
+## Singleton Pattern
+All services use the singleton pattern with `@lru_cache(maxsize=1)`:
+```python
+@lru_cache(maxsize=1)
+def get_embedding_service() -> EmbeddingService:
+    return EmbeddingService()
+```
+This ensures:
+- Single instance per process
+- Lazy initialization
+- No dependencies required at import time
+## Service Availability
+Services check availability before use:
+```python
+from src.utils.config import settings
+if settings.modal_available:
+    # Use Modal sandbox
+    pass
+if settings.has_openai_key:
+    # Use OpenAI embeddings for RAG
+    pass
+```
+## See Also
+- [Tools](tools.md) - How services are used by search tools
+- [API Reference - Services](../api/services.md) - API documentation
+- [Configuration](../configuration/index.md) - Service configuration

docs/architecture/tools.md ADDED Viewed

	@@ -0,0 +1,165 @@

+# Tools Architecture
+DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.
+## SearchTool Protocol
+All tools implement the `SearchTool` protocol from `src/tools/base.py`:
+```python
+class SearchTool(Protocol):
+    @property
+    def name(self) -> str: ...
+    async def search(
+        self,
+        query: str,
+        max_results: int = 10
+    ) -> list[Evidence]: ...
+```
+## Rate Limiting
+All tools use the `@retry` decorator from tenacity:
+```python
+@retry(
+    stop=stop_after_attempt(3),
+    wait=wait_exponential(...)
+)
+async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
+    # Implementation
+```
+Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
+## Error Handling
+Tools raise custom exceptions:
+- `SearchError`: General search failures
+- `RateLimitError`: Rate limit exceeded
+Tools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).
+## Query Preprocessing
+Tools use `preprocess_query()` from `src/tools/query_utils.py` to:
+- Remove noise from queries
+- Expand synonyms
+- Normalize query format
+## Evidence Conversion
+All tools convert API responses to `Evidence` objects with:
+- `Citation`: Title, URL, date, authors
+- `content`: Evidence text
+- `relevance_score`: 0.0-1.0 relevance score
+- `metadata`: Additional metadata
+Missing fields are handled gracefully with defaults.
+## Tool Implementations
+### PubMed Tool
+**File**: `src/tools/pubmed.py`
+**API**: NCBI E-utilities (ESearch → EFetch)
+**Rate Limiting**:
+- 0.34s between requests (3 req/sec without API key)
+- 0.1s between requests (10 req/sec with NCBI API key)
+**Features**:
+- XML parsing with `xmltodict`
+- Handles single vs. multiple articles
+- Query preprocessing
+- Evidence conversion with metadata extraction
+### ClinicalTrials Tool
+**File**: `src/tools/clinicaltrials.py`
+**API**: ClinicalTrials.gov API v2
+**Important**: Uses `requests` library (NOT httpx) because WAF blocks httpx TLS fingerprint.
+**Execution**: Runs in thread pool: `await asyncio.to_thread(requests.get, ...)`
+**Filtering**:
+- Only interventional studies
+- Status: `COMPLETED`, `ACTIVE_NOT_RECRUITING`, `RECRUITING`, `ENROLLING_BY_INVITATION`
+**Features**:
+- Parses nested JSON structure
+- Extracts trial metadata
+- Evidence conversion
+### Europe PMC Tool
+**File**: `src/tools/europepmc.py`
+**API**: Europe PMC REST API
+**Features**:
+- Handles preprint markers: `[PREPRINT - Not peer-reviewed]`
+- Builds URLs from DOI or PMID
+- Checks `pubTypeList` for preprint detection
+- Includes both preprints and peer-reviewed articles
+### RAG Tool
+**File**: `src/tools/rag_tool.py`
+**Purpose**: Semantic search within collected evidence
+**Implementation**: Wraps `LlamaIndexRAGService`
+**Features**:
+- Returns Evidence from RAG results
+- Handles evidence ingestion
+- Semantic similarity search
+- Metadata preservation
+### Search Handler
+**File**: `src/tools/search_handler.py`
+**Purpose**: Orchestrates parallel searches across multiple tools
+**Features**:
+- Uses `asyncio.gather()` with `return_exceptions=True`
+- Aggregates results into `SearchResult`
+- Handles tool failures gracefully
+- Deduplicates results by URL
+## Tool Registration
+Tools are registered in the search handler:
+```python
+from src.tools.pubmed import PubMedTool
+from src.tools.clinicaltrials import ClinicalTrialsTool
+from src.tools.europepmc import EuropePMCTool
+search_handler = SearchHandler(
+    tools=[
+        PubMedTool(),
+        ClinicalTrialsTool(),
+        EuropePMCTool(),
+    ]
+)
+```
+## See Also
+- [Services](services.md) - RAG and embedding services
+- [API Reference - Tools](../api/tools.md) - API documentation
+- [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines

docs/architecture/workflow-diagrams.md ADDED Viewed

	@@ -0,0 +1,670 @@

+# DeepCritical Workflow - Simplified Magentic Architecture
+> **Architecture Pattern**: Microsoft Magentic Orchestration
+> **Design Philosophy**: Simple, dynamic, manager-driven coordination
+> **Key Innovation**: Intelligent manager replaces rigid sequential phases
+---
+## 1. High-Level Magentic Workflow
+```mermaid
+flowchart TD
+    Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
+    Manager -->|Plans| Task1[Task Decomposition]
+    Task1 --> Manager
+    Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
+    Manager -->|Selects & Executes| SearchAgent[Search Agent]
+    Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
+    Manager -->|Selects & Executes| ReportAgent[Report Agent]
+    HypAgent -->|Results| Manager
+    SearchAgent -->|Results| Manager
+    AnalysisAgent -->|Results| Manager
+    ReportAgent -->|Results| Manager
+    Manager -->|Assesses Quality| Decision{Good Enough?}
+    Decision -->|No - Refine| Manager
+    Decision -->|No - Different Agent| Manager
+    Decision -->|No - Stalled| Replan[Reset Plan]
+    Replan --> Manager
+    Decision -->|Yes| Synthesis[Synthesize Final Result]
+    Synthesis --> Output([Research Report])
+    style Start fill:#e1f5e1
+    style Manager fill:#ffe6e6
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style Decision fill:#ffd6d6
+    style Synthesis fill:#d4edda
+    style Output fill:#e1f5e1
+```
+## 2. Magentic Manager: The 6-Phase Cycle
+```mermaid
+flowchart LR
+    P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
+    P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
+    P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
+    P4 --> Decision{Quality OK?<br/>Progress made?}
+    Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
+    Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
+    P5 --> P2
+    P6 --> Done([Complete])
+    style P1 fill:#fff4e6
+    style P2 fill:#ffe6e6
+    style P3 fill:#e6f3ff
+    style P4 fill:#ffd6d6
+    style P5 fill:#fff3cd
+    style P6 fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 3. Simplified Agent Architecture
+```mermaid
+graph TB
+    subgraph "Orchestration Layer"
+        Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
+        SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
+        Manager <--> SharedContext
+    end
+    subgraph "Specialist Agents"
+        HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
+        SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
+        AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
+        ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
+    end
+    subgraph "MCP Tools"
+        WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
+        CodeExec[Code Execution<br/>Sandboxed Python]
+        RAG[RAG Retrieval<br/>Vector DB • Embeddings]
+        Viz[Visualization<br/>Charts • Graphs]
+    end
+    Manager -->|Selects & Directs| HypAgent
+    Manager -->|Selects & Directs| SearchAgent
+    Manager -->|Selects & Directs| AnalysisAgent
+    Manager -->|Selects & Directs| ReportAgent
+    HypAgent --> SharedContext
+    SearchAgent --> SharedContext
+    AnalysisAgent --> SharedContext
+    ReportAgent --> SharedContext
+    SearchAgent --> WebSearch
+    SearchAgent --> RAG
+    AnalysisAgent --> CodeExec
+    ReportAgent --> CodeExec
+    ReportAgent --> Viz
+    style Manager fill:#ffe6e6
+    style SharedContext fill:#ffe6f0
+    style HypAgent fill:#fff4e6
+    style SearchAgent fill:#fff4e6
+    style AnalysisAgent fill:#fff4e6
+    style ReportAgent fill:#fff4e6
+    style WebSearch fill:#e6f3ff
+    style CodeExec fill:#e6f3ff
+    style RAG fill:#e6f3ff
+    style Viz fill:#e6f3ff
+```
+## 4. Dynamic Workflow Example
+```mermaid
+sequenceDiagram
+    participant User
+    participant Manager
+    participant HypAgent
+    participant SearchAgent
+    participant AnalysisAgent
+    participant ReportAgent
+    User->>Manager: "Research protein folding in Alzheimer's"
+    Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
+    Manager->>HypAgent: Generate 3 hypotheses
+    HypAgent-->>Manager: Returns 3 hypotheses
+    Note over Manager: ASSESS: Good quality, proceed
+    Manager->>SearchAgent: Search literature for hypothesis 1
+    SearchAgent-->>Manager: Returns 15 papers
+    Note over Manager: ASSESS: Good results, continue
+    Manager->>SearchAgent: Search for hypothesis 2
+    SearchAgent-->>Manager: Only 2 papers found
+    Note over Manager: ASSESS: Insufficient, refine search
+    Manager->>SearchAgent: Refined query for hypothesis 2
+    SearchAgent-->>Manager: Returns 12 papers
+    Note over Manager: ASSESS: Better, proceed
+    Manager->>AnalysisAgent: Analyze evidence for all hypotheses
+    AnalysisAgent-->>Manager: Returns analysis with code
+    Note over Manager: ASSESS: Complete, generate report
+    Manager->>ReportAgent: Create comprehensive report
+    ReportAgent-->>Manager: Returns formatted report
+    Note over Manager: SYNTHESIZE: Combine all results
+    Manager->>User: Final Research Report
+```
+## 5. Manager Decision Logic
+```mermaid
+flowchart TD
+    Start([Manager Receives Task]) --> Plan[Create Initial Plan]
+    Plan --> Select[Select Agent for Next Subtask]
+    Select --> Execute[Execute Agent]
+    Execute --> Collect[Collect Results]
+    Collect --> Assess[Assess Quality & Progress]
+    Assess --> Q1{Quality Sufficient?}
+    Q1 -->|No| Q2{Same Agent Can Fix?}
+    Q2 -->|Yes| Feedback[Provide Specific Feedback]
+    Feedback --> Execute
+    Q2 -->|No| Different[Try Different Agent]
+    Different --> Select
+    Q1 -->|Yes| Q3{Task Complete?}
+    Q3 -->|No| Q4{Making Progress?}
+    Q4 -->|Yes| Select
+    Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
+    Replan --> Plan
+    Q3 -->|Yes| Synth[Synthesize Final Result]
+    Synth --> Done([Return Report])
+    style Start fill:#e1f5e1
+    style Plan fill:#fff4e6
+    style Select fill:#ffe6e6
+    style Execute fill:#e6f3ff
+    style Assess fill:#ffd6d6
+    style Q1 fill:#ffe6e6
+    style Q2 fill:#ffe6e6
+    style Q3 fill:#ffe6e6
+    style Q4 fill:#ffe6e6
+    style Synth fill:#d4edda
+    style Done fill:#e1f5e1
+```
+## 6. Hypothesis Agent Workflow
+```mermaid
+flowchart LR
+    Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
+    Domain --> Context[Retrieve Background<br/>Knowledge]
+    Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
+    Generate --> Refine[Refine for<br/>Testability]
+    Refine --> Rank[Rank by<br/>Quality Score]
+    Rank --> Output[Return Top<br/>Hypotheses]
+    Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
+    style Input fill:#e1f5e1
+    style Output fill:#fff4e6
+    style Struct fill:#e6f3ff
+```
+## 7. Search Agent Workflow
+```mermaid
+flowchart TD
+    Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
+    Strategy --> Multi[Multi-Source Search]
+    Multi --> PubMed[PubMed Search<br/>via MCP]
+    Multi --> ArXiv[arXiv Search<br/>via MCP]
+    Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
+    PubMed --> Aggregate[Aggregate Results]
+    ArXiv --> Aggregate
+    BioRxiv --> Aggregate
+    Aggregate --> Filter[Filter & Rank<br/>by Relevance]
+    Filter --> Dedup[Deduplicate<br/>Cross-Reference]
+    Dedup --> Embed[Embed Documents<br/>via MCP]
+    Embed --> Vector[(Vector DB)]
+    Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
+    RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
+    style Input fill:#fff4e6
+    style Multi fill:#ffe6e6
+    style Vector fill:#ffe6f0
+    style Output fill:#e6f3ff
+```
+## 8. Analysis Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Hypotheses] --> Extract
+    Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
+    Extract --> Methods[Determine Analysis<br/>Methods Needed]
+    Methods --> Branch{Requires<br/>Computation?}
+    Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
+    Branch -->|No| Qual[Qualitative<br/>Synthesis]
+    GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
+    Execute --> Interpret1[Interpret<br/>Results]
+    Qual --> Interpret2[Interpret<br/>Findings]
+    Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
+    Interpret2 --> Synthesize
+    Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
+    Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
+    Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
+    Gaps --> Output[Return Analysis<br/>Report]
+    style Input1 fill:#fff4e6
+    style Input2 fill:#e6f3ff
+    style Execute fill:#ffe6e6
+    style Output fill:#e6ffe6
+```
+## 9. Report Agent Workflow
+```mermaid
+flowchart TD
+    Input1[Query] --> Assemble
+    Input2[Hypotheses] --> Assemble
+    Input3[Search Results] --> Assemble
+    Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
+    Assemble --> Exec[Executive Summary]
+    Assemble --> Intro[Introduction]
+    Assemble --> Methods[Methods]
+    Assemble --> Results[Results per<br/>Hypothesis]
+    Assemble --> Discussion[Discussion]
+    Assemble --> Future[Future Directions]
+    Assemble --> Refs[References]
+    Results --> VizCheck{Needs<br/>Visualization?}
+    VizCheck -->|Yes| GenViz[Generate Viz Code]
+    GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
+    ExecViz --> Combine
+    VizCheck -->|No| Combine[Combine All<br/>Sections]
+    Exec --> Combine
+    Intro --> Combine
+    Methods --> Combine
+    Discussion --> Combine
+    Future --> Combine
+    Refs --> Combine
+    Combine --> Format[Format Output]
+    Format --> MD[Markdown]
+    Format --> PDF[PDF]
+    Format --> JSON[JSON]
+    MD --> Output[Return Final<br/>Report]
+    PDF --> Output
+    JSON --> Output
+    style Input1 fill:#e1f5e1
+    style Input2 fill:#fff4e6
+    style Input3 fill:#e6f3ff
+    style Input4 fill:#e6ffe6
+    style Output fill:#d4edda
+```
+## 10. Data Flow & Event Streaming
+```mermaid
+flowchart TD
+    User[👤 User] -->|Research Query| UI[Gradio UI]
+    UI -->|Submit| Manager[Magentic Manager]
+    Manager -->|Event: Planning| UI
+    Manager -->|Select Agent| HypAgent[Hypothesis Agent]
+    HypAgent -->|Event: Delta/Message| UI
+    HypAgent -->|Hypotheses| Context[(Shared Context)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| SearchAgent[Search Agent]
+    SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
+    WebSearch -->|Results| SearchAgent
+    SearchAgent -->|Event: Delta/Message| UI
+    SearchAgent -->|Documents| Context
+    SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
+    AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
+    CodeExec -->|Results| AnalysisAgent
+    AnalysisAgent -->|Event: Delta/Message| UI
+    AnalysisAgent -->|Analysis| Context
+    Context -->|Retrieved by| Manager
+    Manager -->|Select Agent| ReportAgent[Report Agent]
+    ReportAgent -->|MCP Request| CodeExec
+    ReportAgent -->|Event: Delta/Message| UI
+    ReportAgent -->|Report| Context
+    Manager -->|Event: Final Result| UI
+    UI -->|Display| User
+    style User fill:#e1f5e1
+    style UI fill:#e6f3ff
+    style Manager fill:#ffe6e6
+    style Context fill:#ffe6f0
+    style VectorDB fill:#ffe6f0
+    style WebSearch fill:#f0f0f0
+    style CodeExec fill:#f0f0f0
+```
+## 11. MCP Tool Architecture
+```mermaid
+graph TB
+    subgraph "Agent Layer"
+        Manager[Magentic Manager]
+        HypAgent[Hypothesis Agent]
+        SearchAgent[Search Agent]
+        AnalysisAgent[Analysis Agent]
+        ReportAgent[Report Agent]
+    end
+    subgraph "MCP Protocol Layer"
+        Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
+    end
+    subgraph "MCP Servers"
+        Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
+        Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
+        Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
+        Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
+    end
+    subgraph "External Services"
+        PubMed[PubMed API]
+        ArXiv[arXiv API]
+        BioRxiv[bioRxiv API]
+        Modal[Modal Sandbox]
+        ChromaDB[(ChromaDB)]
+    end
+    SearchAgent -->|Request| Registry
+    AnalysisAgent -->|Request| Registry
+    ReportAgent -->|Request| Registry
+    Registry --> Server1
+    Registry --> Server2
+    Registry --> Server3
+    Registry --> Server4
+    Server1 --> PubMed
+    Server1 --> ArXiv
+    Server1 --> BioRxiv
+    Server2 --> Modal
+    Server3 --> ChromaDB
+    style Manager fill:#ffe6e6
+    style Registry fill:#fff4e6
+    style Server1 fill:#e6f3ff
+    style Server2 fill:#e6f3ff
+    style Server3 fill:#e6f3ff
+    style Server4 fill:#e6f3ff
+```
+## 12. Progress Tracking & Stall Detection
+```mermaid
+stateDiagram-v2
+    [*] --> Initialization: User Query
+    Initialization --> Planning: Manager starts
+    Planning --> AgentExecution: Select agent
+    AgentExecution --> Assessment: Collect results
+    Assessment --> QualityCheck: Evaluate output
+    QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
+    QualityCheck --> Planning: Poor quality<br/>(try different agent)
+    QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
+    QualityCheck --> Synthesis: Good quality<br/>(task complete)
+    NextAgent --> AgentExecution: Select next agent
+    state StallDetection <<choice>>
+    Assessment --> StallDetection: Check progress
+    StallDetection --> Planning: No progress<br/>(stall count < max)
+    StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
+    ErrorRecovery --> PartialReport: Generate partial results
+    PartialReport --> [*]
+    Synthesis --> FinalReport: Combine all outputs
+    FinalReport --> [*]
+    note right of QualityCheck
+        Manager assesses:
+        • Output completeness
+        • Quality metrics
+        • Progress made
+    end note
+    note right of StallDetection
+        Stall = no new progress
+        after agent execution
+        Triggers plan reset
+    end note
+```
+## 13. Gradio UI Integration
+```mermaid
+graph TD
+    App[Gradio App<br/>DeepCritical Research Agent]
+    App --> Input[Input Section]
+    App --> Status[Status Section]
+    App --> Output[Output Section]
+    Input --> Query[Research Question<br/>Text Area]
+    Input --> Controls[Controls]
+    Controls --> MaxHyp[Max Hypotheses: 1-10]
+    Controls --> MaxRounds[Max Rounds: 5-20]
+    Controls --> Submit[Start Research Button]
+    Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
+    Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
+    Output --> Tabs[Tabbed Results]
+    Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
+    Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
+    Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
+    Tabs --> Tab4[Report Tab<br/>Final research report]
+    Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
+    Submit -.->|Triggers| Workflow[Magentic Workflow]
+    Workflow -.->|MagenticOrchestratorMessageEvent| Log
+    Workflow -.->|MagenticAgentDeltaEvent| Log
+    Workflow -.->|MagenticAgentMessageEvent| Log
+    Workflow -.->|MagenticFinalResultEvent| Tab4
+    style App fill:#e1f5e1
+    style Input fill:#fff4e6
+    style Status fill:#e6f3ff
+    style Output fill:#e6ffe6
+    style Workflow fill:#ffe6e6
+```
+## 14. Complete System Context
+```mermaid
+graph LR
+    User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
+    DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
+    DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
+    DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
+    DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
+    DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
+    DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
+    DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
+    PubMed -->|Results| DC
+    ArXiv -->|Results| DC
+    BioRxiv -->|Results| DC
+    Claude -->|Responses| DC
+    Modal -->|Output| DC
+    Chroma -->|Context| DC
+    DC -->|Research report| User
+    style User fill:#e1f5e1
+    style DC fill:#ffe6e6
+    style PubMed fill:#e6f3ff
+    style ArXiv fill:#e6f3ff
+    style BioRxiv fill:#e6f3ff
+    style Claude fill:#ffd6d6
+    style Modal fill:#f0f0f0
+    style Chroma fill:#ffe6f0
+    style HF fill:#d4edda
+```
+## 15. Workflow Timeline (Simplified)
+```mermaid
+gantt
+    title DeepCritical Magentic Workflow - Typical Execution
+    dateFormat mm:ss
+    axisFormat %M:%S
+    section Manager Planning
+    Initial planning         :p1, 00:00, 10s
+    section Hypothesis Agent
+    Generate hypotheses      :h1, after p1, 30s
+    Manager assessment       :h2, after h1, 5s
+    section Search Agent
+    Search hypothesis 1      :s1, after h2, 20s
+    Search hypothesis 2      :s2, after s1, 20s
+    Search hypothesis 3      :s3, after s2, 20s
+    RAG processing          :s4, after s3, 15s
+    Manager assessment      :s5, after s4, 5s
+    section Analysis Agent
+    Evidence extraction     :a1, after s5, 15s
+    Code generation        :a2, after a1, 20s
+    Code execution         :a3, after a2, 25s
+    Synthesis              :a4, after a3, 20s
+    Manager assessment     :a5, after a4, 5s
+    section Report Agent
+    Report assembly        :r1, after a5, 30s
+    Visualization          :r2, after r1, 15s
+    Formatting             :r3, after r2, 10s
+    section Manager Synthesis
+    Final synthesis        :f1, after r3, 10s
+```
+---
+## Key Differences from Original Design
+| Aspect | Original (Judge-in-Loop) | New (Magentic) |
+|--------|-------------------------|----------------|
+| **Control Flow** | Fixed sequential phases | Dynamic agent selection |
+| **Quality Control** | Separate Judge Agent | Manager assessment built-in |
+| **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
+| **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
+| **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
+| **Progress Tracking** | Manual state management | Built-in round/stall detection |
+| **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
+| **Error Recovery** | Retry same phase | Try different agent or replan |
+---
+## Simplified Design Principles
+1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
+2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
+3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
+4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
+5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
+6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
+7. **Shared Context**: Centralized state accessible to all agents
+8. **Progress Awareness**: Manager tracks what's been done and what's needed
+---
+## Legend
+- 🔴 **Red/Pink**: Manager, orchestration, decision-making
+- 🟡 **Yellow/Orange**: Specialist agents, processing
+- 🔵 **Blue**: Data, tools, MCP services
+- 🟣 **Purple/Pink**: Storage, databases, state
+- 🟢 **Green**: User interactions, final outputs
+- ⚪ **Gray**: External services, APIs
+---
+## Implementation Highlights
+**Simple 4-Agent Setup:**
+```python
+workflow = (
+    MagenticBuilder()
+    .participants(
+        hypothesis=HypothesisAgent(tools=[background_tool]),
+        search=SearchAgent(tools=[web_search, rag_tool]),
+        analysis=AnalysisAgent(tools=[code_execution]),
+        report=ReportAgent(tools=[code_execution, visualization])
+    )
+    .with_standard_manager(
+        chat_client=AnthropicClient(model="claude-sonnet-4"),
+        max_round_count=15,    # Prevent infinite loops
+        max_stall_count=3      # Detect stuck workflows
+    )
+    .build()
+)
+```
+**Manager handles quality assessment in its instructions:**
+- Checks hypothesis quality (testable, novel, clear)
+- Validates search results (relevant, authoritative, recent)
+- Assesses analysis soundness (methodology, evidence, conclusions)
+- Ensures report completeness (all sections, proper citations)
+No separate Judge Agent needed - manager does it all!
+---
+**Document Version**: 2.0 (Magentic Simplified)
+**Last Updated**: 2025-11-24
+**Architecture**: Microsoft Magentic Orchestration Pattern
+**Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
+**License**: MIT
+## See Also
+- [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
+- [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
+- [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
+- [Workflows](workflows.md) - Workflow patterns summary
+- [API Reference - Orchestrators](../api/orchestrators.md) - API documentation

docs/{workflow-diagrams.md → architecture/workflows.md} RENAMED Viewed

File without changes

docs/brainstorming/00_ROADMAP_SUMMARY.md DELETED Viewed

@@ -1,194 +0,0 @@
-# DeepCritical Data Sources: Roadmap Summary
-**Created**: 2024-11-27
-**Purpose**: Future maintainability and hackathon continuation
----
-## Current State
-### Working Tools
-| Tool | Status | Data Quality |
-|------|--------|--------------|
-| PubMed | ✅ Works | Good (abstracts only) |
-| ClinicalTrials.gov | ✅ Works | Good (filtered for interventional) |
-| Europe PMC | ✅ Works | Good (includes preprints) |
-### Removed Tools
-| Tool | Status | Reason |
-|------|--------|--------|
-| bioRxiv | ❌ Removed | No search API - only date/DOI lookup |
----
-## Priority Improvements
-### P0: Critical (Do First)
-1. **Add Rate Limiting to PubMed**
-   - NCBI will block us without it
-   - Use `limits` library (see reference repo)
-   - 3/sec without key, 10/sec with key
-### P1: High Value, Medium Effort
-2. **Add OpenAlex as 4th Source**
-   - Citation network (huge for drug repurposing)
-   - Concept tagging (semantic discovery)
-   - Already implemented in reference repo
-   - Free, no API key
-3. **PubMed Full-Text via BioC**
-   - Get full paper text for PMC papers
-   - Already in reference repo
-### P2: Nice to Have
-4. **ClinicalTrials.gov Results**
-   - Get efficacy data from completed trials
-   - Requires more complex API calls
-5. **Europe PMC Annotations**
-   - Text-mined entities (genes, drugs, diseases)
-   - Automatic entity extraction
----
-## Effort Estimates
-| Improvement | Effort | Impact | Priority |
-|-------------|--------|--------|----------|
-| PubMed rate limiting | 1 hour | Stability | P0 |
-| OpenAlex basic search | 2 hours | High | P1 |
-| OpenAlex citations | 2 hours | Very High | P1 |
-| PubMed full-text | 3 hours | Medium | P1 |
-| CT.gov results | 4 hours | Medium | P2 |
-| Europe PMC annotations | 3 hours | Medium | P2 |
----
-## Architecture Decision
-### Option A: Keep Current + Add OpenAlex
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- PubMed          ClinicalTrials        Europe PMC
- (abstracts)     (trials only)         (preprints)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                   OpenAlex              ← NEW
-               (citations, concepts)
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Low risk, additive
-**Cons**: More complexity, some overlap
-### Option B: OpenAlex as Primary
-```
-                    User Query
-                        ↓
-    ┌───────────────────┼───────────────────┐
-    ↓                   ↓                   ↓
- OpenAlex          ClinicalTrials      Europe PMC
- (primary          (trials only)       (full-text
-  search)                               fallback)
-    ↓                   ↓                   ↓
-    └───────────────────┼───────────────────┘
-                        ↓
-                  Orchestrator
-                        ↓
-                     Report
-```
-**Pros**: Simpler, citation network built-in
-**Cons**: Lose some PubMed-specific features
-### Recommendation: Option A
-Keep current architecture working, add OpenAlex incrementally.
----
-## Quick Wins (Can Do Today)
-1. **Add `limits` to `pyproject.toml`**
-   ```toml
-   dependencies = [
-       "limits>=3.0",
-   ]
-   ```
-2. **Copy OpenAlex tool from reference repo**
-   - File: `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`
-   - Adapt to our `SearchTool` base class
-3. **Enable NCBI API Key**
-   - Add to `.env`: `NCBI_API_KEY=your_key`
-   - 10x rate limit improvement
----
-## External Resources Worth Exploring
-### Python Libraries
-| Library | For | Notes |
-|---------|-----|-------|
-| `limits` | Rate limiting | Used by reference repo |
-| `pyalex` | OpenAlex wrapper | [GitHub](https://github.com/J535D165/pyalex) |
-| `metapub` | PubMed | Full-featured |
-| `sentence-transformers` | Semantic search | For embeddings |
-### APIs Not Yet Used
-| API | Provides | Effort |
-|-----|----------|--------|
-| RxNorm | Drug name normalization | Low |
-| DrugBank | Drug targets/mechanisms | Medium (license) |
-| UniProt | Protein data | Medium |
-| ChEMBL | Bioactivity data | Medium |
-### RAG Tools (Future)
-| Tool | Purpose |
-|------|---------|
-| [PaperQA](https://github.com/Future-House/paper-qa) | RAG for scientific papers |
-| [txtai](https://github.com/neuml/txtai) | Embeddings + search |
-| [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | Biomedical embeddings |
----
-## Files in This Directory
-| File | Contents |
-|------|----------|
-| `00_ROADMAP_SUMMARY.md` | This file |
-| `01_PUBMED_IMPROVEMENTS.md` | PubMed enhancement details |
-| `02_CLINICALTRIALS_IMPROVEMENTS.md` | ClinicalTrials.gov details |
-| `03_EUROPEPMC_IMPROVEMENTS.md` | Europe PMC details |
-| `04_OPENALEX_INTEGRATION.md` | OpenAlex integration plan |
----
-## For Future Maintainers
-If you're picking this up after the hackathon:
-1. **Start with OpenAlex** - biggest bang for buck
-2. **Add rate limiting** - prevents API blocks
-3. **Don't bother with bioRxiv** - use Europe PMC instead
-4. **Reference repo is gold** - `reference_repos/DeepCritical/` has working implementations
-Good luck! 🚀

docs/brainstorming/01_PUBMED_IMPROVEMENTS.md DELETED Viewed

@@ -1,125 +0,0 @@
-# PubMed Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source)
----
-## Current Implementation
-### What We Have (`src/tools/pubmed.py`)
-- Basic E-utilities search via `esearch.fcgi` and `efetch.fcgi`
-- Query preprocessing (strips question words, expands synonyms)
-- Returns: title, abstract, authors, journal, PMID
-- Rate limiting: None implemented (relying on NCBI defaults)
-### Current Limitations
-1. **No Full-Text Access**: Only retrieves abstracts, not full paper text
-2. **No Rate Limiting**: Risk of being blocked by NCBI
-3. **No BioC Format**: Missing structured full-text extraction
-4. **No Figure Retrieval**: No supplementary materials access
-5. **No PMC Integration**: Missing open-access full-text via PMC
----
-## Reference Implementation (DeepCritical Reference Repo)
-The reference repo at `reference_repos/DeepCritical/DeepResearch/src/tools/bioinformatics_tools.py` has a more sophisticated implementation:
-### Features We're Missing
-```python
-# Rate limiting (lines 47-50)
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-rate_limit = parse("3/second")  # NCBI allows 3/sec without API key, 10/sec with
-# Full-text via BioC format (lines 108-120)
-def _get_fulltext(pmid: int) -> dict[str, Any] | None:
-    pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Returns structured JSON with full text for open-access papers
-# Figure retrieval via Europe PMC (lines 123-149)
-def _get_figures(pmcid: str) -> dict[str, str]:
-    suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
-    # Returns base64-encoded images from supplementary materials
-```
----
-## Recommended Improvements
-### Phase 1: Rate Limiting (Critical)
-```python
-# Add to src/tools/pubmed.py
-from limits import parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-storage = MemoryStorage()
-limiter = MovingWindowRateLimiter(storage)
-# With NCBI_API_KEY: 10/sec, without: 3/sec
-def get_rate_limit():
-    if settings.ncbi_api_key:
-        return parse("10/second")
-    return parse("3/second")
-```
-**Dependencies**: `pip install limits`
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmid: str) -> str | None:
-    """Get full text for open-access papers via BioC API."""
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    # Only works for PMC papers (open access)
-```
-### Phase 3: PMC ID Resolution
-```python
-async def get_pmc_id(pmid: str) -> str | None:
-    """Convert PMID to PMCID for full-text access."""
-    url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"
-```
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [Biopython](https://biopython.org/) | `Bio.Entrez` module | Official, well-maintained |
-| [PyMed](https://pypi.org/project/pymed/) | PubMed wrapper | Simpler API, less control |
-| [metapub](https://pypi.org/project/metapub/) | Full-featured | Tested on 1/3 of PubMed |
-| [limits](https://pypi.org/project/limits/) | Rate limiting | Used by reference repo |
----
-## API Endpoints Reference
-| Endpoint | Purpose | Rate Limit |
-|----------|---------|------------|
-| `esearch.fcgi` | Search for PMIDs | 3/sec (10 with key) |
-| `efetch.fcgi` | Fetch metadata | 3/sec (10 with key) |
-| `esummary.fcgi` | Quick metadata | 3/sec (10 with key) |
-| `pmcoa.cgi/BioC_json` | Full text (PMC only) | Unknown |
-| `idconv/v1.0` | PMID ↔ PMCID | Unknown |
----
-## Sources
-- [PubMed E-utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/)
-- [NCBI BioC API](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/)
-- [Searching PubMed with Python](https://marcobonzanini.com/2015/01/12/searching-pubmed-with-python/)
-- [PyMed on PyPI](https://pypi.org/project/pymed/)

docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md DELETED Viewed

@@ -1,193 +0,0 @@
-# ClinicalTrials.gov Tool: Current State & Future Improvements
-**Status**: Currently Implemented
-**Priority**: High (Core Data Source for Drug Repurposing)
----
-## Current Implementation
-### What We Have (`src/tools/clinicaltrials.py`)
-- V2 API search via `clinicaltrials.gov/api/v2/studies`
-- Filters: `INTERVENTIONAL` study type, `RECRUITING` status
-- Returns: NCT ID, title, conditions, interventions, phase, status
-- Query preprocessing via shared `query_utils.py`
-### Current Strengths
-1. **Good Filtering**: Already filtering for interventional + recruiting
-2. **V2 API**: Using the modern API (v1 deprecated)
-3. **Phase Info**: Extracting trial phases for drug development context
-### Current Limitations
-1. **No Outcome Data**: Missing primary/secondary outcomes
-2. **No Eligibility Criteria**: Missing inclusion/exclusion details
-3. **No Sponsor Info**: Missing who's running the trial
-4. **No Result Data**: For completed trials, no efficacy data
-5. **Limited Drug Mapping**: No integration with drug databases
----
-## API Capabilities We're Not Using
-### Fields We Could Request
-```python
-# Current fields
-fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
-# Additional valuable fields
-additional_fields = [
-    "PrimaryOutcomeMeasure",      # What are they measuring?
-    "SecondaryOutcomeMeasure",    # Secondary endpoints
-    "EligibilityCriteria",        # Who can participate?
-    "LeadSponsorName",            # Who's funding?
-    "ResultsFirstPostDate",       # Has results?
-    "StudyFirstPostDate",         # When started?
-    "CompletionDate",             # When finished?
-    "EnrollmentCount",            # Sample size
-    "InterventionDescription",    # Drug details
-    "ArmGroupLabel",              # Treatment arms
-    "InterventionOtherName",      # Drug aliases
-]
-```
-### Filter Enhancements
-```python
-# Current
-aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
-# Could add
-"status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED"  # Include completed for results
-"phase:PHASE2,PHASE3"  # Only later-stage trials
-"resultsFirstPostDateRange:2020-01-01_"  # Trials with posted results
-```
----
-## Recommended Improvements
-### Phase 1: Richer Metadata
-```python
-EXTENDED_FIELDS = [
-    "NCTId",
-    "BriefTitle",
-    "OfficialTitle",
-    "Condition",
-    "InterventionName",
-    "InterventionDescription",
-    "InterventionOtherName",  # Drug synonyms!
-    "Phase",
-    "OverallStatus",
-    "PrimaryOutcomeMeasure",
-    "EnrollmentCount",
-    "LeadSponsorName",
-    "StudyFirstPostDate",
-]
-```
-### Phase 2: Results Retrieval
-For completed trials, we can get actual efficacy data:
-```python
-async def get_trial_results(nct_id: str) -> dict | None:
-    """Fetch results for completed trials."""
-    url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
-    params = {
-        "fields": "ResultsSection",
-    }
-    # Returns outcome measures and statistics
-```
-### Phase 3: Drug Name Normalization
-Map intervention names to standard identifiers:
-```python
-# Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
-# Solution: Use RxNorm or DrugBank for normalization
-async def normalize_drug_name(intervention: str) -> str:
-    """Normalize drug name via RxNorm API."""
-    url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
-    # Returns standardized RxCUI
-```
----
-## Integration Opportunities
-### With PubMed
-Cross-reference trials with publications:
-```python
-# ClinicalTrials.gov provides PMID links
-# Can correlate trial results with published papers
-```
-### With DrugBank/ChEMBL
-Map interventions to:
-- Mechanism of action
-- Known targets
-- Adverse effects
-- Drug-drug interactions
----
-## Python Libraries to Consider
-| Library | Purpose | Notes |
-|---------|---------|-------|
-| [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
-| [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
-| [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
----
-## API Quirks & Gotchas
-1. **Rate Limiting**: Undocumented, be conservative
-2. **Pagination**: Max 1000 results per request
-3. **Field Names**: Case-sensitive, camelCase
-4. **Empty Results**: Some fields may be null even if requested
-5. **Status Changes**: Trials change status frequently
----
-## Example Enhanced Query
-```python
-async def search_drug_repurposing_trials(
-    drug_name: str,
-    condition: str,
-    include_completed: bool = True,
-) -> list[Evidence]:
-    """Search for trials repurposing a drug for a new condition."""
-    statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
-    if include_completed:
-        statuses.append("COMPLETED")
-    params = {
-        "query.intr": drug_name,
-        "query.cond": condition,
-        "filter.overallStatus": ",".join(statuses),
-        "filter.studyType": "INTERVENTIONAL",
-        "fields": ",".join(EXTENDED_FIELDS),
-        "pageSize": 50,
-    }
-```
----
-## Sources
-- [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
-- [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
-- [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)

docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md DELETED Viewed

@@ -1,211 +0,0 @@
-# Europe PMC Tool: Current State & Future Improvements
-**Status**: Currently Implemented (Replaced bioRxiv)
-**Priority**: High (Preprint + Open Access Source)
----
-## Why Europe PMC Over bioRxiv?
-### bioRxiv API Limitations (Why We Abandoned It)
-1. **No Search API**: Only returns papers by date range or DOI
-2. **No Query Capability**: Cannot search for "metformin cancer"
-3. **Workaround Required**: Would need to download ALL preprints and build local search
-4. **Known Issue**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) documents the limitation
-### Europe PMC Advantages
-1. **Full Search API**: Boolean queries, filters, facets
-2. **Aggregates bioRxiv**: Includes bioRxiv, medRxiv content anyway
-3. **Includes PubMed**: Also has MEDLINE content
-4. **34 Preprint Servers**: Not just bioRxiv
-5. **Open Access Focus**: Full-text when available
----
-## Current Implementation
-### What We Have (`src/tools/europepmc.py`)
-- REST API search via `europepmc.org/webservices/rest/search`
-- Preprint flagging via `firstPublicationDate` heuristics
-- Returns: title, abstract, authors, DOI, source
-- Marks preprints for transparency
-### Current Limitations
-1. **No Full-Text Retrieval**: Only metadata/abstracts
-2. **No Citation Network**: Missing references/citations
-3. **No Supplementary Files**: Not fetching figures/data
-4. **Basic Preprint Detection**: Heuristic, not explicit flag
----
-## Europe PMC API Capabilities
-### Endpoints We Could Use
-| Endpoint | Purpose | Currently Using |
-|----------|---------|-----------------|
-| `/search` | Query papers | Yes |
-| `/fulltext/{ID}` | Full text (XML/JSON) | No |
-| `/{PMCID}/supplementaryFiles` | Figures, data | No |
-| `/citations/{ID}` | Who cited this | No |
-| `/references/{ID}` | What this cites | No |
-| `/annotations` | Text-mined entities | No |
-### Rich Query Syntax
-```python
-# Current simple query
-query = "metformin cancer"
-# Could use advanced syntax
-query = "(TITLE:metformin OR ABSTRACT:metformin) AND (cancer OR oncology)"
-query += " AND (SRC:PPR)"  # Only preprints
-query += " AND (FIRST_PDATE:[2023-01-01 TO 2024-12-31])"  # Date range
-query += " AND (OPEN_ACCESS:y)"  # Only open access
-```
-### Source Filters
-```python
-# Filter by source
-"SRC:MED"     # MEDLINE
-"SRC:PMC"     # PubMed Central
-"SRC:PPR"     # Preprints (bioRxiv, medRxiv, etc.)
-"SRC:AGR"     # Agricola
-"SRC:CBA"     # Chinese Biological Abstracts
-```
----
-## Recommended Improvements
-### Phase 1: Rich Metadata
-```python
-# Add to search results
-additional_fields = [
-    "citedByCount",           # Impact indicator
-    "source",                 # Explicit source (MED, PMC, PPR)
-    "isOpenAccess",           # Boolean flag
-    "fullTextUrlList",        # URLs for full text
-    "authorAffiliations",     # Institution info
-    "grantsList",             # Funding info
-]
-```
-### Phase 2: Full-Text Retrieval
-```python
-async def get_fulltext(pmcid: str) -> str | None:
-    """Get full text for open access papers."""
-    # XML format
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextXML"
-    # Or JSON
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextJSON"
-```
-### Phase 3: Citation Network
-```python
-async def get_citations(pmcid: str) -> list[str]:
-    """Get papers that cite this one."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/citations"
-async def get_references(pmcid: str) -> list[str]:
-    """Get papers this one cites."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/references"
-```
-### Phase 4: Text-Mined Annotations
-Europe PMC extracts entities automatically:
-```python
-async def get_annotations(pmcid: str) -> dict:
-    """Get text-mined entities (genes, diseases, drugs)."""
-    url = f"https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds"
-    params = {
-        "articleIds": f"PMC:{pmcid}",
-        "type": "Gene_Proteins,Diseases,Chemicals",
-        "format": "JSON",
-    }
-    # Returns structured entity mentions with positions
-```
----
-## Supplementary File Retrieval
-From reference repo (`bioinformatics_tools.py` lines 123-149):
-```python
-def get_figures(pmcid: str) -> dict[str, str]:
-    """Download figures and supplementary files."""
-    url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles?includeInlineImage=true"
-    # Returns ZIP with images, returns base64-encoded
-```
----
-## Preprint-Specific Features
-### Identify Preprint Servers
-```python
-PREPRINT_SOURCES = {
-    "PPR": "General preprints",
-    "bioRxiv": "Biology preprints",
-    "medRxiv": "Medical preprints",
-    "chemRxiv": "Chemistry preprints",
-    "Research Square": "Multi-disciplinary",
-    "Preprints.org": "MDPI preprints",
-}
-# Check if published version exists
-async def check_published_version(preprint_doi: str) -> str | None:
-    """Check if preprint has been peer-reviewed and published."""
-    # Europe PMC links preprints to final versions
-```
----
-## Rate Limiting
-Europe PMC is more generous than NCBI:
-```python
-# No documented hard limit, but be respectful
-# Recommend: 10-20 requests/second max
-# Use email in User-Agent for polite pool
-headers = {
-    "User-Agent": "DeepCritical/1.0 (mailto:[email protected])"
-}
-```
----
-## vs. The Lens & OpenAlex
-| Feature | Europe PMC | The Lens | OpenAlex |
-|---------|------------|----------|----------|
-| Biomedical Focus | Yes | Partial | Partial |
-| Preprints | Yes (34 servers) | Yes | Yes |
-| Full Text | PMC papers | Links | No |
-| Citations | Yes | Yes | Yes |
-| Annotations | Yes (text-mined) | No | No |
-| Rate Limits | Generous | Moderate | Very generous |
-| API Key | Optional | Required | Optional |
----
-## Sources
-- [Europe PMC REST API](https://europepmc.org/RestfulWebService)
-- [Europe PMC Annotations API](https://europepmc.org/AnnotationsApi)
-- [Europe PMC Articles API](https://europepmc.org/ArticlesApi)
-- [rOpenSci medrxivr](https://docs.ropensci.org/medrxivr/)
-- [bioRxiv TDM Resources](https://www.biorxiv.org/tdm)

docs/brainstorming/04_OPENALEX_INTEGRATION.md DELETED Viewed

@@ -1,303 +0,0 @@
-# OpenAlex Integration: The Missing Piece?
-**Status**: NOT Implemented (Candidate for Addition)
-**Priority**: HIGH - Could Replace Multiple Tools
-**Reference**: Already implemented in `reference_repos/DeepCritical`
----
-## What is OpenAlex?
-OpenAlex is a **fully open** index of the global research system:
-- **209M+ works** (papers, books, datasets)
-- **2B+ author records** (disambiguated)
-- **124K+ venues** (journals, repositories)
-- **109K+ institutions**
-- **65K+ concepts** (hierarchical, linked to Wikidata)
-**Free. Open. No API key required.**
----
-## Why OpenAlex for DeepCritical?
-### Current Architecture
-```
-User Query
-    ↓
-┌──────────────────────────────────────┐
-│  PubMed    ClinicalTrials  Europe PMC │  ← 3 separate APIs
-└──────────────────────────────────────┘
-    ↓
-Orchestrator (deduplicate, judge, synthesize)
-```
-### With OpenAlex
-```
-User Query
-    ↓
-┌──────────────────────────────────────┐
-│              OpenAlex                 │  ← Single API
-│  (includes PubMed + preprints +       │
-│   citations + concepts + authors)     │
-└──────────────────────────────────────┘
-    ↓
-Orchestrator (enrich with CT.gov for trials)
-```
-**OpenAlex already aggregates**:
-- PubMed/MEDLINE
-- Crossref
-- ORCID
-- Unpaywall (open access links)
-- Microsoft Academic Graph (legacy)
-- Preprint servers
----
-## Reference Implementation
-From `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`:
-```python
-class OpenAlexFetchTool(ToolRunner):
-    def __init__(self):
-        super().__init__(
-            ToolSpec(
-                name="openalex_fetch",
-                description="Fetch OpenAlex work or author",
-                inputs={"entity": "TEXT", "identifier": "TEXT"},
-                outputs={"result": "JSON"},
-            )
-        )
-    def run(self, params: dict[str, Any]) -> ExecutionResult:
-        entity = params["entity"]      # "works", "authors", "venues"
-        identifier = params["identifier"]
-        base = "https://api.openalex.org"
-        url = f"{base}/{entity}/{identifier}"
-        resp = requests.get(url, timeout=30)
-        return ExecutionResult(success=True, data={"result": resp.json()})
-```
----
-## OpenAlex API Features
-### Search Works (Papers)
-```python
-# Search for metformin + cancer papers
-url = "https://api.openalex.org/works"
-params = {
-    "search": "metformin cancer drug repurposing",
-    "filter": "publication_year:>2020,type:article",
-    "sort": "cited_by_count:desc",
-    "per_page": 50,
-}
-```
-### Rich Filtering
-```python
-# Filter examples
-"publication_year:2023"
-"type:article"                      # vs preprint, book, etc.
-"is_oa:true"                        # Open access only
-"concepts.id:C71924100"             # Papers about "Medicine"
-"authorships.institutions.id:I27837315"  # From Harvard
-"cited_by_count:>100"               # Highly cited
-"has_fulltext:true"                 # Full text available
-```
-### What You Get Back
-```json
-{
-    "id": "W2741809807",
-    "title": "Metformin: A candidate drug for...",
-    "publication_year": 2023,
-    "type": "article",
-    "cited_by_count": 45,
-    "is_oa": true,
-    "primary_location": {
-        "source": {"display_name": "Nature Medicine"},
-        "pdf_url": "https://...",
-        "landing_page_url": "https://..."
-    },
-    "concepts": [
-        {"id": "C71924100", "display_name": "Medicine", "score": 0.95},
-        {"id": "C54355233", "display_name": "Pharmacology", "score": 0.88}
-    ],
-    "authorships": [
-        {
-            "author": {"id": "A123", "display_name": "John Smith"},
-            "institutions": [{"display_name": "Harvard Medical School"}]
-        }
-    ],
-    "referenced_works": ["W123", "W456"],  # Citations
-    "related_works": ["W789", "W012"]       # Similar papers
-}
-```
----
-## Key Advantages Over Current Tools
-### 1. Citation Network (We Don't Have This!)
-```python
-# Get papers that cite a work
-url = f"https://api.openalex.org/works?filter=cites:{work_id}"
-# Get papers cited by a work
-# Already in `referenced_works` field
-```
-### 2. Concept Tagging (We Don't Have This!)
-OpenAlex auto-tags papers with hierarchical concepts:
-- "Medicine" → "Pharmacology" → "Drug Repurposing"
-- Can search by concept, not just keywords
-### 3. Author Disambiguation (We Don't Have This!)
-```python
-# Find all works by an author
-url = f"https://api.openalex.org/works?filter=authorships.author.id:{author_id}"
-```
-### 4. Institution Tracking
-```python
-# Find drug repurposing papers from top institutions
-url = "https://api.openalex.org/works"
-params = {
-    "search": "drug repurposing",
-    "filter": "authorships.institutions.id:I27837315",  # Harvard
-}
-```
-### 5. Related Works
-Each paper comes with `related_works` - semantically similar papers discovered by OpenAlex's ML.
----
-## Proposed Implementation
-### New Tool: `src/tools/openalex.py`
-```python
-"""OpenAlex search tool for comprehensive scholarly data."""
-import httpx
-from src.tools.base import SearchTool
-from src.utils.models import Evidence
-class OpenAlexTool(SearchTool):
-    """Search OpenAlex for scholarly works with rich metadata."""
-    name = "openalex"
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        async with httpx.AsyncClient() as client:
-            resp = await client.get(
-                "https://api.openalex.org/works",
-                params={
-                    "search": query,
-                    "filter": "type:article,is_oa:true",
-                    "sort": "cited_by_count:desc",
-                    "per_page": max_results,
-                    "mailto": "[email protected]",  # Polite pool
-                },
-            )
-            data = resp.json()
-        return [
-            Evidence(
-                source="openalex",
-                title=work["title"],
-                abstract=work.get("abstract", ""),
-                url=work["primary_location"]["landing_page_url"],
-                metadata={
-                    "cited_by_count": work["cited_by_count"],
-                    "concepts": [c["display_name"] for c in work["concepts"][:5]],
-                    "is_open_access": work["is_oa"],
-                    "pdf_url": work["primary_location"].get("pdf_url"),
-                },
-            )
-            for work in data["results"]
-        ]
-```
----
-## Rate Limits
-OpenAlex is **extremely generous**:
-- No hard rate limit documented
-- Recommended: <100,000 requests/day
-- **Polite pool**: Add `[email protected]` param for faster responses
-- No API key required (optional for priority support)
----
-## Should We Add OpenAlex?
-### Arguments FOR
-1. **Already in reference repo** - proven pattern
-2. **Richer data** - citations, concepts, authors
-3. **Single source** - reduces API complexity
-4. **Free & open** - no keys, no limits
-5. **Institution adoption** - Leiden, Sorbonne switched to it
-### Arguments AGAINST
-1. **Adds complexity** - another data source
-2. **Overlap** - duplicates some PubMed data
-3. **Not biomedical-focused** - covers all disciplines
-4. **No full text** - still need PMC/Europe PMC for that
-### Recommendation
-**Add OpenAlex as a 4th source**, don't replace existing tools.
-Use it for:
-- Citation network analysis
-- Concept-based discovery
-- High-impact paper finding
-- Author/institution tracking
-Keep PubMed, ClinicalTrials, Europe PMC for:
-- Authoritative biomedical search
-- Clinical trial data
-- Full-text access
-- Preprint tracking
----
-## Implementation Priority
-| Task | Effort | Value |
-|------|--------|-------|
-| Basic search | Low | High |
-| Citation network | Medium | Very High |
-| Concept filtering | Low | High |
-| Related works | Low | High |
-| Author tracking | Medium | Medium |
----
-## Sources
-- [OpenAlex Documentation](https://docs.openalex.org)
-- [OpenAlex API Overview](https://docs.openalex.org/api)
-- [OpenAlex Wikipedia](https://en.wikipedia.org/wiki/OpenAlex)
-- [Leiden University Announcement](https://www.leidenranking.com/information/openalex)
-- [OpenAlex: A fully-open index (Paper)](https://arxiv.org/abs/2205.01833)

docs/brainstorming/implementation/15_PHASE_OPENALEX.md DELETED Viewed

@@ -1,603 +0,0 @@
-# Phase 15: OpenAlex Integration
-**Priority**: HIGH - Biggest bang for buck
-**Effort**: ~2-3 hours
-**Dependencies**: None (existing codebase patterns sufficient)
----
-## Prerequisites (COMPLETED)
-The following model changes have been implemented to support this integration:
-1. **`SourceName` Literal Updated** (`src/utils/models.py:9`)
-   ```python
-   SourceName = Literal["pubmed", "clinicaltrials", "europepmc", "preprint", "openalex"]
-   ```
-   - Without this, `source="openalex"` would fail Pydantic validation
-2. **`Evidence.metadata` Field Added** (`src/utils/models.py:39-42`)
-   ```python
-   metadata: dict[str, Any] = Field(
-       default_factory=dict,
-       description="Additional metadata (e.g., cited_by_count, concepts, is_open_access)",
-   )
-   ```
-   - Required for storing `cited_by_count`, `concepts`, etc.
-   - Model is still frozen - metadata must be passed at construction time
-3. **`__init__.py` Exports Updated** (`src/tools/__init__.py`)
-   - All tools are now exported: `ClinicalTrialsTool`, `EuropePMCTool`, `PubMedTool`
-   - OpenAlexTool should be added here after implementation
----
-## Overview
-Add OpenAlex as a 4th data source for comprehensive scholarly data including:
-- Citation networks (who cites whom)
-- Concept tagging (hierarchical topic classification)
-- Author disambiguation
-- 209M+ works indexed
-**Why OpenAlex?**
-- Free, no API key required
-- Already implemented in reference repo
-- Provides citation data we don't have
-- Aggregates PubMed + preprints + more
----
-## TDD Implementation Plan
-### Step 1: Write the Tests First
-**File**: `tests/unit/tools/test_openalex.py`
-```python
-"""Tests for OpenAlex search tool."""
-import pytest
-import respx
-from httpx import Response
-from src.tools.openalex import OpenAlexTool
-from src.utils.models import Evidence
-class TestOpenAlexTool:
-    """Test suite for OpenAlex search functionality."""
-    @pytest.fixture
-    def tool(self) -> OpenAlexTool:
-        return OpenAlexTool()
-    def test_name_property(self, tool: OpenAlexTool) -> None:
-        """Tool should identify itself as 'openalex'."""
-        assert tool.name == "openalex"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_returns_evidence(self, tool: OpenAlexTool) -> None:
-        """Search should return list of Evidence objects."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W2741809807",
-                    "title": "Metformin and cancer: A systematic review",
-                    "publication_year": 2023,
-                    "cited_by_count": 45,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "Nature Medicine"},
-                        "landing_page_url": "https://doi.org/10.1038/example",
-                        "pdf_url": None,
-                    },
-                    "abstract_inverted_index": {
-                        "Metformin": [0],
-                        "shows": [1],
-                        "anticancer": [2],
-                        "effects": [3],
-                    },
-                    "concepts": [
-                        {"display_name": "Medicine", "score": 0.95},
-                        {"display_name": "Oncology", "score": 0.88},
-                    ],
-                    "authorships": [
-                        {
-                            "author": {"display_name": "John Smith"},
-                            "institutions": [{"display_name": "Harvard"}],
-                        }
-                    ],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("metformin cancer", max_results=10)
-        assert len(results) == 1
-        assert isinstance(results[0], Evidence)
-        assert "Metformin and cancer" in results[0].citation.title
-        assert results[0].citation.source == "openalex"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_empty_results(self, tool: OpenAlexTool) -> None:
-        """Search with no results should return empty list."""
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json={"results": []})
-        )
-        results = await tool.search("xyznonexistentquery123")
-        assert results == []
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_handles_missing_abstract(self, tool: OpenAlexTool) -> None:
-        """Tool should handle papers without abstracts."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W123",
-                    "title": "Paper without abstract",
-                    "publication_year": 2023,
-                    "cited_by_count": 10,
-                    "type": "article",
-                    "is_oa": False,
-                    "primary_location": {
-                        "source": {"display_name": "Journal"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": None,
-                    "concepts": [],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("test query")
-        assert len(results) == 1
-        assert results[0].content == ""  # No abstract
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_extracts_citation_count(self, tool: OpenAlexTool) -> None:
-        """Citation count should be in metadata."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W456",
-                    "title": "Highly cited paper",
-                    "publication_year": 2020,
-                    "cited_by_count": 500,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "Science"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": {"Test": [0]},
-                    "concepts": [],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("highly cited")
-        assert results[0].metadata["cited_by_count"] == 500
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_extracts_concepts(self, tool: OpenAlexTool) -> None:
-        """Concepts should be extracted for semantic discovery."""
-        mock_response = {
-            "results": [
-                {
-                    "id": "W789",
-                    "title": "Drug repurposing study",
-                    "publication_year": 2023,
-                    "cited_by_count": 25,
-                    "type": "article",
-                    "is_oa": True,
-                    "primary_location": {
-                        "source": {"display_name": "PLOS ONE"},
-                        "landing_page_url": "https://example.com",
-                    },
-                    "abstract_inverted_index": {"Drug": [0], "repurposing": [1]},
-                    "concepts": [
-                        {"display_name": "Pharmacology", "score": 0.92},
-                        {"display_name": "Drug Discovery", "score": 0.85},
-                        {"display_name": "Medicine", "score": 0.80},
-                    ],
-                    "authorships": [],
-                }
-            ]
-        }
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        results = await tool.search("drug repurposing")
-        assert "Pharmacology" in results[0].metadata["concepts"]
-        assert "Drug Discovery" in results[0].metadata["concepts"]
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_api_error_raises_search_error(
-        self, tool: OpenAlexTool
-    ) -> None:
-        """API errors should raise SearchError."""
-        from src.utils.exceptions import SearchError
-        respx.get("https://api.openalex.org/works").mock(
-            return_value=Response(500, text="Internal Server Error")
-        )
-        with pytest.raises(SearchError):
-            await tool.search("test query")
-    def test_reconstruct_abstract(self, tool: OpenAlexTool) -> None:
-        """Test abstract reconstruction from inverted index."""
-        inverted_index = {
-            "Metformin": [0, 5],
-            "is": [1],
-            "a": [2],
-            "diabetes": [3],
-            "drug": [4],
-            "effective": [6],
-        }
-        abstract = tool._reconstruct_abstract(inverted_index)
-        assert abstract == "Metformin is a diabetes drug Metformin effective"
-```
----
-### Step 2: Create the Implementation
-**File**: `src/tools/openalex.py`
-```python
-"""OpenAlex search tool for comprehensive scholarly data."""
-from typing import Any
-import httpx
-from tenacity import retry, stop_after_attempt, wait_exponential
-from src.utils.exceptions import SearchError
-from src.utils.models import Citation, Evidence
-class OpenAlexTool:
-    """
-    Search OpenAlex for scholarly works with rich metadata.
-    OpenAlex provides:
-    - 209M+ scholarly works
-    - Citation counts and networks
-    - Concept tagging (hierarchical)
-    - Author disambiguation
-    - Open access links
-    API Docs: https://docs.openalex.org/
-    """
-    BASE_URL = "https://api.openalex.org/works"
-    def __init__(self, email: str | None = None) -> None:
-        """
-        Initialize OpenAlex tool.
-        Args:
-            email: Optional email for polite pool (faster responses)
-        """
-        self.email = email or "[email protected]"
-    @property
-    def name(self) -> str:
-        return "openalex"
-    @retry(
-        stop=stop_after_attempt(3),
-        wait=wait_exponential(multiplier=1, min=1, max=10),
-        reraise=True,
-    )
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        """
-        Search OpenAlex for scholarly works.
-        Args:
-            query: Search terms
-            max_results: Maximum results to return (max 200 per request)
-        Returns:
-            List of Evidence objects with citation metadata
-        Raises:
-            SearchError: If API request fails
-        """
-        params = {
-            "search": query,
-            "filter": "type:article",  # Only peer-reviewed articles
-            "sort": "cited_by_count:desc",  # Most cited first
-            "per_page": min(max_results, 200),
-            "mailto": self.email,  # Polite pool for faster responses
-        }
-        async with httpx.AsyncClient(timeout=30.0) as client:
-            try:
-                response = await client.get(self.BASE_URL, params=params)
-                response.raise_for_status()
-                data = response.json()
-                results = data.get("results", [])
-                return [self._to_evidence(work) for work in results[:max_results]]
-            except httpx.HTTPStatusError as e:
-                raise SearchError(f"OpenAlex API error: {e}") from e
-            except httpx.RequestError as e:
-                raise SearchError(f"OpenAlex connection failed: {e}") from e
-    def _to_evidence(self, work: dict[str, Any]) -> Evidence:
-        """Convert OpenAlex work to Evidence object."""
-        title = work.get("title", "Untitled")
-        pub_year = work.get("publication_year", "Unknown")
-        cited_by = work.get("cited_by_count", 0)
-        is_oa = work.get("is_oa", False)
-        # Reconstruct abstract from inverted index
-        abstract_index = work.get("abstract_inverted_index")
-        abstract = self._reconstruct_abstract(abstract_index) if abstract_index else ""
-        # Extract concepts (top 5)
-        concepts = [
-            c.get("display_name", "")
-            for c in work.get("concepts", [])[:5]
-            if c.get("display_name")
-        ]
-        # Extract authors (top 5)
-        authorships = work.get("authorships", [])
-        authors = [
-            a.get("author", {}).get("display_name", "")
-            for a in authorships[:5]
-            if a.get("author", {}).get("display_name")
-        ]
-        # Get URL
-        primary_loc = work.get("primary_location") or {}
-        url = primary_loc.get("landing_page_url", "")
-        if not url:
-            # Fallback to OpenAlex page
-            work_id = work.get("id", "").replace("https://openalex.org/", "")
-            url = f"https://openalex.org/{work_id}"
-        return Evidence(
-            content=abstract[:2000],
-            citation=Citation(
-                source="openalex",
-                title=title[:500],
-                url=url,
-                date=str(pub_year),
-                authors=authors,
-            ),
-            relevance=min(0.9, 0.5 + (cited_by / 1000)),  # Boost by citations
-            metadata={
-                "cited_by_count": cited_by,
-                "is_open_access": is_oa,
-                "concepts": concepts,
-                "pdf_url": primary_loc.get("pdf_url"),
-            },
-        )
-    def _reconstruct_abstract(
-        self, inverted_index: dict[str, list[int]]
-    ) -> str:
-        """
-        Reconstruct abstract from OpenAlex inverted index format.
-        OpenAlex stores abstracts as {"word": [position1, position2, ...]}.
-        This rebuilds the original text.
-        """
-        if not inverted_index:
-            return ""
-        # Build position -> word mapping
-        position_word: dict[int, str] = {}
-        for word, positions in inverted_index.items():
-            for pos in positions:
-                position_word[pos] = word
-        # Reconstruct in order
-        if not position_word:
-            return ""
-        max_pos = max(position_word.keys())
-        words = [position_word.get(i, "") for i in range(max_pos + 1)]
-        return " ".join(w for w in words if w)
-```
----
-### Step 3: Register in Search Handler
-**File**: `src/tools/search_handler.py` (add to imports and tool list)
-```python
-# Add import
-from src.tools.openalex import OpenAlexTool
-# Add to _create_tools method
-def _create_tools(self) -> list[SearchTool]:
-    return [
-        PubMedTool(),
-        ClinicalTrialsTool(),
-        EuropePMCTool(),
-        OpenAlexTool(),  # NEW
-    ]
-```
----
-### Step 4: Update `__init__.py`
-**File**: `src/tools/__init__.py`
-```python
-from src.tools.openalex import OpenAlexTool
-__all__ = [
-    "PubMedTool",
-    "ClinicalTrialsTool",
-    "EuropePMCTool",
-    "OpenAlexTool",  # NEW
-    # ...
-]
-```
----
-## Demo Script
-**File**: `examples/openalex_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify OpenAlex integration."""
-import asyncio
-from src.tools.openalex import OpenAlexTool
-async def main():
-    """Run OpenAlex search demo."""
-    tool = OpenAlexTool()
-    print("=" * 60)
-    print("OpenAlex Integration Demo")
-    print("=" * 60)
-    # Test 1: Basic drug repurposing search
-    print("\n[Test 1] Searching for 'metformin cancer drug repurposing'...")
-    results = await tool.search("metformin cancer drug repurposing", max_results=5)
-    for i, evidence in enumerate(results, 1):
-        print(f"\n--- Result {i} ---")
-        print(f"Title: {evidence.citation.title}")
-        print(f"Year: {evidence.citation.date}")
-        print(f"Citations: {evidence.metadata.get('cited_by_count', 'N/A')}")
-        print(f"Concepts: {', '.join(evidence.metadata.get('concepts', []))}")
-        print(f"Open Access: {evidence.metadata.get('is_open_access', False)}")
-        print(f"URL: {evidence.citation.url}")
-        if evidence.content:
-            print(f"Abstract: {evidence.content[:200]}...")
-    # Test 2: High-impact papers
-    print("\n" + "=" * 60)
-    print("[Test 2] Finding highly-cited papers on 'long COVID treatment'...")
-    results = await tool.search("long COVID treatment", max_results=3)
-    for evidence in results:
-        print(f"\n- {evidence.citation.title}")
-        print(f"  Citations: {evidence.metadata.get('cited_by_count', 0)}")
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run just OpenAlex tests
-uv run pytest tests/unit/tools/test_openalex.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo script with real API
-uv run python examples/openalex_demo.py
-# Expected: Real results from OpenAlex API
-```
-### Full Test Suite
-```bash
-# Ensure nothing broke
-make check
-# Expected: All 110+ tests pass, mypy clean
-```
----
-## Success Criteria
-1. **Unit tests pass**: All mocked tests in `test_openalex.py` pass
-2. **Integration works**: Demo script returns real results
-3. **No regressions**: `make check` passes completely
-4. **SearchHandler integration**: OpenAlex appears in search results alongside other sources
-5. **Citation metadata**: Results include `cited_by_count`, `concepts`, `is_open_access`
----
-## Future Enhancements (P2)
-Once basic integration works:
-1. **Citation Network Queries**
-   ```python
-   # Get papers citing a specific work
-   async def get_citing_works(self, work_id: str) -> list[Evidence]:
-       params = {"filter": f"cites:{work_id}"}
-       ...
-   ```
-2. **Concept-Based Search**
-   ```python
-   # Search by OpenAlex concept ID
-   async def search_by_concept(self, concept_id: str) -> list[Evidence]:
-       params = {"filter": f"concepts.id:{concept_id}"}
-       ...
-   ```
-3. **Author Tracking**
-   ```python
-   # Find all works by an author
-   async def search_by_author(self, author_id: str) -> list[Evidence]:
-       params = {"filter": f"authorships.author.id:{author_id}"}
-       ...
-   ```
----
-## Notes
-- OpenAlex is **very generous** with rate limits (no documented hard limit)
-- Adding `mailto` parameter gives priority access (polite pool)
-- Abstract is stored as inverted index - must reconstruct
-- Citation count is a good proxy for paper quality/impact
-- Consider caching responses for repeated queries

docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md DELETED Viewed

@@ -1,586 +0,0 @@
-# Phase 16: PubMed Full-Text Retrieval
-**Priority**: MEDIUM - Enhances evidence quality
-**Effort**: ~3 hours
-**Dependencies**: None (existing PubMed tool sufficient)
----
-## Prerequisites (COMPLETED)
-The `Evidence.metadata` field has been added to `src/utils/models.py` to support:
-```python
-metadata={"has_fulltext": True}
-```
----
-## Architecture Decision: Constructor Parameter vs Method Parameter
-**IMPORTANT**: The original spec proposed `include_fulltext` as a method parameter:
-```python
-# WRONG - SearchHandler won't pass this parameter
-async def search(self, query: str, max_results: int = 10, include_fulltext: bool = False):
-```
-**Problem**: `SearchHandler` calls `tool.search(query, max_results)` uniformly across all tools.
-It has no mechanism to pass tool-specific parameters like `include_fulltext`.
-**Solution**: Use constructor parameter instead:
-```python
-# CORRECT - Configured at instantiation time
-class PubMedTool:
-    def __init__(self, api_key: str | None = None, include_fulltext: bool = False):
-        self.include_fulltext = include_fulltext
-        ...
-```
-This way, you can create a full-text-enabled PubMed tool:
-```python
-# In orchestrator or wherever tools are created
-tools = [
-    PubMedTool(include_fulltext=True),  # Full-text enabled
-    ClinicalTrialsTool(),
-    EuropePMCTool(),
-]
-```
----
-## Overview
-Add full-text retrieval for PubMed papers via the BioC API, enabling:
-- Complete paper text for open-access PMC papers
-- Structured sections (intro, methods, results, discussion)
-- Better evidence for LLM synthesis
-**Why Full-Text?**
-- Abstracts only give ~200-300 words
-- Full text provides detailed methods, results, figures
-- Reference repo already has this implemented
-- Makes LLM judgments more accurate
----
-## TDD Implementation Plan
-### Step 1: Write the Tests First
-**File**: `tests/unit/tools/test_pubmed_fulltext.py`
-```python
-"""Tests for PubMed full-text retrieval."""
-import pytest
-import respx
-from httpx import Response
-from src.tools.pubmed import PubMedTool
-class TestPubMedFullText:
-    """Test suite for PubMed full-text functionality."""
-    @pytest.fixture
-    def tool(self) -> PubMedTool:
-        return PubMedTool()
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_pmc_id_success(self, tool: PubMedTool) -> None:
-        """Should convert PMID to PMCID for full-text access."""
-        mock_response = {
-            "records": [
-                {
-                    "pmid": "12345678",
-                    "pmcid": "PMC1234567",
-                }
-            ]
-        }
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        pmcid = await tool.get_pmc_id("12345678")
-        assert pmcid == "PMC1234567"
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_pmc_id_not_in_pmc(self, tool: PubMedTool) -> None:
-        """Should return None if paper not in PMC."""
-        mock_response = {
-            "records": [
-                {
-                    "pmid": "12345678",
-                    # No pmcid means not in PMC
-                }
-            ]
-        }
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(200, json=mock_response)
-        )
-        pmcid = await tool.get_pmc_id("12345678")
-        assert pmcid is None
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_success(self, tool: PubMedTool) -> None:
-        """Should retrieve full text for PMC papers."""
-        # Mock BioC API response
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {
-                            "infons": {"section_type": "INTRO"},
-                            "text": "Introduction text here.",
-                        },
-                        {
-                            "infons": {"section_type": "METHODS"},
-                            "text": "Methods description here.",
-                        },
-                        {
-                            "infons": {"section_type": "RESULTS"},
-                            "text": "Results summary here.",
-                        },
-                        {
-                            "infons": {"section_type": "DISCUSS"},
-                            "text": "Discussion and conclusions.",
-                        },
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        fulltext = await tool.get_fulltext("12345678")
-        assert fulltext is not None
-        assert "Introduction text here" in fulltext
-        assert "Methods description here" in fulltext
-        assert "Results summary here" in fulltext
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_not_available(self, tool: PubMedTool) -> None:
-        """Should return None if full text not available."""
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/99999999/unicode"
-        ).mock(return_value=Response(404))
-        fulltext = await tool.get_fulltext("99999999")
-        assert fulltext is None
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_get_fulltext_structured(self, tool: PubMedTool) -> None:
-        """Should return structured sections dict."""
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {"infons": {"section_type": "INTRO"}, "text": "Intro..."},
-                        {"infons": {"section_type": "METHODS"}, "text": "Methods..."},
-                        {"infons": {"section_type": "RESULTS"}, "text": "Results..."},
-                        {"infons": {"section_type": "DISCUSS"}, "text": "Discussion..."},
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        sections = await tool.get_fulltext_structured("12345678")
-        assert sections is not None
-        assert "introduction" in sections
-        assert "methods" in sections
-        assert "results" in sections
-        assert "discussion" in sections
-    @respx.mock
-    @pytest.mark.asyncio
-    async def test_search_with_fulltext_enabled(self) -> None:
-        """Search should include full text when tool is configured for it."""
-        # Create tool WITH full-text enabled via constructor
-        tool = PubMedTool(include_fulltext=True)
-        # Mock esearch
-        respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi").mock(
-            return_value=Response(
-                200, json={"esearchresult": {"idlist": ["12345678"]}}
-            )
-        )
-        # Mock efetch (abstract)
-        mock_xml = """
-        <PubmedArticleSet>
-          <PubmedArticle>
-            <MedlineCitation>
-              <PMID>12345678</PMID>
-              <Article>
-                <ArticleTitle>Test Paper</ArticleTitle>
-                <Abstract><AbstractText>Short abstract.</AbstractText></Abstract>
-                <AuthorList><Author><LastName>Smith</LastName></Author></AuthorList>
-              </Article>
-            </MedlineCitation>
-          </PubmedArticle>
-        </PubmedArticleSet>
-        """
-        respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi").mock(
-            return_value=Response(200, text=mock_xml)
-        )
-        # Mock ID converter
-        respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
-            return_value=Response(
-                200, json={"records": [{"pmid": "12345678", "pmcid": "PMC1234567"}]}
-            )
-        )
-        # Mock BioC full text
-        mock_bioc = {
-            "documents": [
-                {
-                    "passages": [
-                        {"infons": {"section_type": "INTRO"}, "text": "Full intro..."},
-                    ]
-                }
-            ]
-        }
-        respx.get(
-            "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
-        ).mock(return_value=Response(200, json=mock_bioc))
-        # NOTE: No include_fulltext param - it's set via constructor
-        results = await tool.search("test", max_results=1)
-        assert len(results) == 1
-        # Full text should be appended or replace abstract
-        assert "Full intro" in results[0].content or "Short abstract" in results[0].content
-```
----
-### Step 2: Implement Full-Text Methods
-**File**: `src/tools/pubmed.py` (additions to existing class)
-```python
-# Add these methods to PubMedTool class
-async def get_pmc_id(self, pmid: str) -> str | None:
-    """
-    Convert PMID to PMCID for full-text access.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        PMCID if paper is in PMC, None otherwise
-    """
-    url = "https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/"
-    params = {"ids": pmid, "format": "json"}
-    async with httpx.AsyncClient(timeout=30.0) as client:
-        try:
-            response = await client.get(url, params=params)
-            response.raise_for_status()
-            data = response.json()
-            records = data.get("records", [])
-            if records and records[0].get("pmcid"):
-                return records[0]["pmcid"]
-            return None
-        except httpx.HTTPError:
-            return None
-async def get_fulltext(self, pmid: str) -> str | None:
-    """
-    Get full text for a PubMed paper via BioC API.
-    Only works for open-access papers in PubMed Central.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        Full text as string, or None if not available
-    """
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    async with httpx.AsyncClient(timeout=60.0) as client:
-        try:
-            response = await client.get(url)
-            if response.status_code == 404:
-                return None
-            response.raise_for_status()
-            data = response.json()
-            # Extract text from all passages
-            documents = data.get("documents", [])
-            if not documents:
-                return None
-            passages = documents[0].get("passages", [])
-            text_parts = [p.get("text", "") for p in passages if p.get("text")]
-            return "\n\n".join(text_parts) if text_parts else None
-        except httpx.HTTPError:
-            return None
-async def get_fulltext_structured(self, pmid: str) -> dict[str, str] | None:
-    """
-    Get structured full text with sections.
-    Args:
-        pmid: PubMed ID
-    Returns:
-        Dict mapping section names to text, or None if not available
-    """
-    url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
-    async with httpx.AsyncClient(timeout=60.0) as client:
-        try:
-            response = await client.get(url)
-            if response.status_code == 404:
-                return None
-            response.raise_for_status()
-            data = response.json()
-            documents = data.get("documents", [])
-            if not documents:
-                return None
-            # Map section types to readable names
-            section_map = {
-                "INTRO": "introduction",
-                "METHODS": "methods",
-                "RESULTS": "results",
-                "DISCUSS": "discussion",
-                "CONCL": "conclusion",
-                "ABSTRACT": "abstract",
-            }
-            sections: dict[str, list[str]] = {}
-            for passage in documents[0].get("passages", []):
-                section_type = passage.get("infons", {}).get("section_type", "other")
-                section_name = section_map.get(section_type, "other")
-                text = passage.get("text", "")
-                if text:
-                    if section_name not in sections:
-                        sections[section_name] = []
-                    sections[section_name].append(text)
-            # Join multiple passages per section
-            return {k: "\n\n".join(v) for k, v in sections.items()}
-        except httpx.HTTPError:
-            return None
-```
----
-### Step 3: Update Constructor and Search Method
-Add full-text flag to constructor and update search to use it:
-```python
-class PubMedTool:
-    """Search tool for PubMed/NCBI."""
-    def __init__(
-        self,
-        api_key: str | None = None,
-        include_fulltext: bool = False,  # NEW CONSTRUCTOR PARAM
-    ) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        self._last_request_time = 0.0
-        self.include_fulltext = include_fulltext  # Store for use in search()
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        """
-        Search PubMed and return evidence.
-        Note: Full-text enrichment is controlled by constructor parameter,
-        not method parameter, because SearchHandler doesn't pass extra args.
-        """
-        # ... existing search logic ...
-        evidence_list = self._parse_pubmed_xml(fetch_resp.text)
-        # Optionally enrich with full text (if configured at construction)
-        if self.include_fulltext:
-            evidence_list = await self._enrich_with_fulltext(evidence_list)
-        return evidence_list
-async def _enrich_with_fulltext(
-    self, evidence_list: list[Evidence]
-) -> list[Evidence]:
-    """Attempt to add full text to evidence items."""
-    enriched = []
-    for evidence in evidence_list:
-        # Extract PMID from URL
-        url = evidence.citation.url
-        pmid = url.rstrip("/").split("/")[-1] if url else None
-        if pmid:
-            fulltext = await self.get_fulltext(pmid)
-            if fulltext:
-                # Replace abstract with full text (truncated)
-                evidence = Evidence(
-                    content=fulltext[:8000],  # Larger limit for full text
-                    citation=evidence.citation,
-                    relevance=evidence.relevance,
-                    metadata={
-                        **evidence.metadata,
-                        "has_fulltext": True,
-                    },
-                )
-        enriched.append(evidence)
-    return enriched
-```
----
-## Demo Script
-**File**: `examples/pubmed_fulltext_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify PubMed full-text retrieval."""
-import asyncio
-from src.tools.pubmed import PubMedTool
-async def main():
-    """Run PubMed full-text demo."""
-    tool = PubMedTool()
-    print("=" * 60)
-    print("PubMed Full-Text Demo")
-    print("=" * 60)
-    # Test 1: Convert PMID to PMCID
-    print("\n[Test 1] Converting PMID to PMCID...")
-    # Use a known open-access paper
-    test_pmid = "34450029"  # Example: COVID-related open-access paper
-    pmcid = await tool.get_pmc_id(test_pmid)
-    print(f"PMID {test_pmid} -> PMCID: {pmcid or 'Not in PMC'}")
-    # Test 2: Get full text
-    print("\n[Test 2] Fetching full text...")
-    if pmcid:
-        fulltext = await tool.get_fulltext(test_pmid)
-        if fulltext:
-            print(f"Full text length: {len(fulltext)} characters")
-            print(f"Preview: {fulltext[:500]}...")
-        else:
-            print("Full text not available")
-    # Test 3: Get structured sections
-    print("\n[Test 3] Fetching structured sections...")
-    if pmcid:
-        sections = await tool.get_fulltext_structured(test_pmid)
-        if sections:
-            print("Available sections:")
-            for section, text in sections.items():
-                print(f"  - {section}: {len(text)} chars")
-        else:
-            print("Structured text not available")
-    # Test 4: Search with full text
-    print("\n[Test 4] Search with full-text enrichment...")
-    results = await tool.search(
-        "metformin cancer open access",
-        max_results=3,
-        include_fulltext=True
-    )
-    for i, evidence in enumerate(results, 1):
-        has_ft = evidence.metadata.get("has_fulltext", False)
-        print(f"\n--- Result {i} ---")
-        print(f"Title: {evidence.citation.title}")
-        print(f"Has Full Text: {has_ft}")
-        print(f"Content Length: {len(evidence.content)} chars")
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run full-text tests
-uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
-# Run all PubMed tests
-uv run pytest tests/unit/tools/test_pubmed.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo with real API
-uv run python examples/pubmed_fulltext_demo.py
-# Expected: Real full text from PMC papers
-```
-### Full Test Suite
-```bash
-make check
-# Expected: All tests pass, mypy clean
-```
----
-## Success Criteria
-1. **ID Conversion works**: PMID -> PMCID conversion successful
-2. **Full text retrieval works**: BioC API returns paper text
-3. **Structured sections work**: Can get intro/methods/results/discussion separately
-4. **Search integration works**: `include_fulltext=True` enriches results
-5. **No regressions**: Existing tests still pass
-6. **Graceful degradation**: Non-PMC papers still return abstracts
----
-## Notes
-- Only ~30% of PubMed papers have full text in PMC
-- BioC API has no documented rate limit, but be respectful
-- Full text can be very long - truncate appropriately
-- Consider caching full text responses (they don't change)
-- Timeout should be longer for full text (60s vs 30s)

docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md DELETED Viewed

@@ -1,540 +0,0 @@
-# Phase 17: Rate Limiting with `limits` Library
-**Priority**: P0 CRITICAL - Prevents API blocks
-**Effort**: ~1 hour
-**Dependencies**: None
----
-## CRITICAL: Async Safety Requirements
-**WARNING**: The rate limiter MUST be async-safe. Blocking the event loop will freeze:
-- The Gradio UI
-- All parallel searches
-- The orchestrator
-**Rules**:
-1. **NEVER use `time.sleep()`** - Always use `await asyncio.sleep()`
-2. **NEVER use blocking while loops** - Use async-aware polling
-3. **The `limits` library check is synchronous** - Wrap it carefully
-The implementation below uses a polling pattern that:
-- Checks the limit (synchronous, fast)
-- If exceeded, `await asyncio.sleep()` (non-blocking)
-- Retry the check
-**Alternative**: If `limits` proves problematic, use `aiolimiter` which is pure-async.
----
-## Overview
-Replace naive `asyncio.sleep` rate limiting with proper rate limiter using the `limits` library, which provides:
-- Moving window rate limiting
-- Per-API configurable limits
-- Thread-safe storage
-- Already used in reference repo
-**Why This Matters?**
-- NCBI will block us without proper rate limiting (3/sec without key, 10/sec with)
-- Current implementation only has simple sleep delay
-- Need coordinated limits across all PubMed calls
-- Professional-grade rate limiting prevents production issues
----
-## Current State
-### What We Have (`src/tools/pubmed.py:20-21, 34-41`)
-```python
-RATE_LIMIT_DELAY = 0.34  # ~3 requests/sec without API key
-async def _rate_limit(self) -> None:
-    """Enforce NCBI rate limiting."""
-    loop = asyncio.get_running_loop()
-    now = loop.time()
-    elapsed = now - self._last_request_time
-    if elapsed < self.RATE_LIMIT_DELAY:
-        await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
-    self._last_request_time = loop.time()
-```
-### Problems
-1. **Not shared across instances**: Each `PubMedTool()` has its own counter
-2. **Simple delay vs moving window**: Doesn't handle bursts properly
-3. **Hardcoded rate**: Doesn't adapt to API key presence
-4. **No backoff on 429**: Just retries blindly
----
-## TDD Implementation Plan
-### Step 1: Add Dependency
-**File**: `pyproject.toml`
-```toml
-dependencies = [
-    # ... existing deps ...
-    "limits>=3.0",
-]
-```
-Then run:
-```bash
-uv sync
-```
----
-### Step 2: Write the Tests First
-**File**: `tests/unit/tools/test_rate_limiting.py`
-```python
-"""Tests for rate limiting functionality."""
-import asyncio
-import time
-import pytest
-from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter
-class TestRateLimiter:
-    """Test suite for rate limiter."""
-    def test_create_limiter_without_api_key(self) -> None:
-        """Should create 3/sec limiter without API key."""
-        limiter = RateLimiter(rate="3/second")
-        assert limiter.rate == "3/second"
-    def test_create_limiter_with_api_key(self) -> None:
-        """Should create 10/sec limiter with API key."""
-        limiter = RateLimiter(rate="10/second")
-        assert limiter.rate == "10/second"
-    @pytest.mark.asyncio
-    async def test_limiter_allows_requests_under_limit(self) -> None:
-        """Should allow requests under the rate limit."""
-        limiter = RateLimiter(rate="10/second")
-        # 3 requests should all succeed immediately
-        for _ in range(3):
-            allowed = await limiter.acquire()
-            assert allowed is True
-    @pytest.mark.asyncio
-    async def test_limiter_blocks_when_exceeded(self) -> None:
-        """Should wait when rate limit exceeded."""
-        limiter = RateLimiter(rate="2/second")
-        # First 2 should be instant
-        await limiter.acquire()
-        await limiter.acquire()
-        # Third should block briefly
-        start = time.monotonic()
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        # Should have waited ~0.5 seconds (half second window for 2/sec)
-        assert elapsed >= 0.3
-    @pytest.mark.asyncio
-    async def test_limiter_resets_after_window(self) -> None:
-        """Rate limit should reset after time window."""
-        limiter = RateLimiter(rate="5/second")
-        # Use up the limit
-        for _ in range(5):
-            await limiter.acquire()
-        # Wait for window to pass
-        await asyncio.sleep(1.1)
-        # Should be allowed again
-        start = time.monotonic()
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        assert elapsed < 0.1  # Should be nearly instant
-class TestGetPubmedLimiter:
-    """Test PubMed-specific limiter factory."""
-    def test_limiter_without_api_key(self) -> None:
-        """Should return 3/sec limiter without key."""
-        limiter = get_pubmed_limiter(api_key=None)
-        assert "3" in limiter.rate
-    def test_limiter_with_api_key(self) -> None:
-        """Should return 10/sec limiter with key."""
-        limiter = get_pubmed_limiter(api_key="my-api-key")
-        assert "10" in limiter.rate
-    def test_limiter_is_singleton(self) -> None:
-        """Same API key should return same limiter instance."""
-        limiter1 = get_pubmed_limiter(api_key="key1")
-        limiter2 = get_pubmed_limiter(api_key="key1")
-        assert limiter1 is limiter2
-    def test_different_keys_different_limiters(self) -> None:
-        """Different API keys should return different limiters."""
-        limiter1 = get_pubmed_limiter(api_key="key1")
-        limiter2 = get_pubmed_limiter(api_key="key2")
-        # Clear cache for clean test
-        # Actually, different keys SHOULD share the same limiter
-        # since we're limiting against the same API
-        assert limiter1 is limiter2  # Shared NCBI rate limit
-```
----
-### Step 3: Create Rate Limiter Module
-**File**: `src/tools/rate_limiter.py`
-```python
-"""Rate limiting utilities using the limits library."""
-import asyncio
-from typing import ClassVar
-from limits import RateLimitItem, parse
-from limits.storage import MemoryStorage
-from limits.strategies import MovingWindowRateLimiter
-class RateLimiter:
-    """
-    Async-compatible rate limiter using limits library.
-    Uses moving window algorithm for smooth rate limiting.
-    """
-    def __init__(self, rate: str) -> None:
-        """
-        Initialize rate limiter.
-        Args:
-            rate: Rate string like "3/second" or "10/second"
-        """
-        self.rate = rate
-        self._storage = MemoryStorage()
-        self._limiter = MovingWindowRateLimiter(self._storage)
-        self._rate_limit: RateLimitItem = parse(rate)
-        self._identity = "default"  # Single identity for shared limiting
-    async def acquire(self, wait: bool = True) -> bool:
-        """
-        Acquire permission to make a request.
-        ASYNC-SAFE: Uses asyncio.sleep(), never time.sleep().
-        The polling pattern allows other coroutines to run while waiting.
-        Args:
-            wait: If True, wait until allowed. If False, return immediately.
-        Returns:
-            True if allowed, False if not (only when wait=False)
-        """
-        while True:
-            # Check if we can proceed (synchronous, fast - ~microseconds)
-            if self._limiter.hit(self._rate_limit, self._identity):
-                return True
-            if not wait:
-                return False
-            # CRITICAL: Use asyncio.sleep(), NOT time.sleep()
-            # This yields control to the event loop, allowing other
-            # coroutines (UI, parallel searches) to run
-            await asyncio.sleep(0.1)
-    def reset(self) -> None:
-        """Reset the rate limiter (for testing)."""
-        self._storage.reset()
-# Singleton limiter for PubMed/NCBI
-_pubmed_limiter: RateLimiter | None = None
-def get_pubmed_limiter(api_key: str | None = None) -> RateLimiter:
-    """
-    Get the shared PubMed rate limiter.
-    Rate depends on whether API key is provided:
-    - Without key: 3 requests/second
-    - With key: 10 requests/second
-    Args:
-        api_key: NCBI API key (optional)
-    Returns:
-        Shared RateLimiter instance
-    """
-    global _pubmed_limiter
-    if _pubmed_limiter is None:
-        rate = "10/second" if api_key else "3/second"
-        _pubmed_limiter = RateLimiter(rate)
-    return _pubmed_limiter
-def reset_pubmed_limiter() -> None:
-    """Reset the PubMed limiter (for testing)."""
-    global _pubmed_limiter
-    _pubmed_limiter = None
-# Factory for other APIs
-class RateLimiterFactory:
-    """Factory for creating/getting rate limiters for different APIs."""
-    _limiters: ClassVar[dict[str, RateLimiter]] = {}
-    @classmethod
-    def get(cls, api_name: str, rate: str) -> RateLimiter:
-        """
-        Get or create a rate limiter for an API.
-        Args:
-            api_name: Unique identifier for the API
-            rate: Rate limit string (e.g., "10/second")
-        Returns:
-            RateLimiter instance (shared for same api_name)
-        """
-        if api_name not in cls._limiters:
-            cls._limiters[api_name] = RateLimiter(rate)
-        return cls._limiters[api_name]
-    @classmethod
-    def reset_all(cls) -> None:
-        """Reset all limiters (for testing)."""
-        cls._limiters.clear()
-```
----
-### Step 4: Update PubMed Tool
-**File**: `src/tools/pubmed.py` (replace rate limiting code)
-```python
-# Replace imports and rate limiting
-from src.tools.rate_limiter import get_pubmed_limiter
-class PubMedTool:
-    """Search tool for PubMed/NCBI."""
-    BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
-    HTTP_TOO_MANY_REQUESTS = 429
-    def __init__(self, api_key: str | None = None) -> None:
-        self.api_key = api_key or settings.ncbi_api_key
-        if self.api_key == "your-ncbi-key-here":
-            self.api_key = None
-        # Use shared rate limiter
-        self._limiter = get_pubmed_limiter(self.api_key)
-    async def _rate_limit(self) -> None:
-        """Enforce NCBI rate limiting using shared limiter."""
-        await self._limiter.acquire()
-    # ... rest of class unchanged ...
-```
----
-### Step 5: Add Rate Limiters for Other APIs
-**File**: `src/tools/clinicaltrials.py` (optional)
-```python
-from src.tools.rate_limiter import RateLimiterFactory
-class ClinicalTrialsTool:
-    def __init__(self) -> None:
-        # ClinicalTrials.gov doesn't document limits, but be conservative
-        self._limiter = RateLimiterFactory.get("clinicaltrials", "5/second")
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        await self._limiter.acquire()
-        # ... rest of method ...
-```
-**File**: `src/tools/europepmc.py` (optional)
-```python
-from src.tools.rate_limiter import RateLimiterFactory
-class EuropePMCTool:
-    def __init__(self) -> None:
-        # Europe PMC is generous, but still be respectful
-        self._limiter = RateLimiterFactory.get("europepmc", "10/second")
-    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
-        await self._limiter.acquire()
-        # ... rest of method ...
-```
----
-## Demo Script
-**File**: `examples/rate_limiting_demo.py`
-```python
-#!/usr/bin/env python3
-"""Demo script to verify rate limiting works correctly."""
-import asyncio
-import time
-from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter, reset_pubmed_limiter
-from src.tools.pubmed import PubMedTool
-async def test_basic_limiter():
-    """Test basic rate limiter behavior."""
-    print("=" * 60)
-    print("Rate Limiting Demo")
-    print("=" * 60)
-    # Test 1: Basic limiter
-    print("\n[Test 1] Testing 3/second limiter...")
-    limiter = RateLimiter("3/second")
-    start = time.monotonic()
-    for i in range(6):
-        await limiter.acquire()
-        elapsed = time.monotonic() - start
-        print(f"  Request {i+1} at {elapsed:.2f}s")
-    total = time.monotonic() - start
-    print(f"  Total time for 6 requests: {total:.2f}s (expected ~2s)")
-async def test_pubmed_limiter():
-    """Test PubMed-specific limiter."""
-    print("\n[Test 2] Testing PubMed limiter (shared)...")
-    reset_pubmed_limiter()  # Clean state
-    # Without API key: 3/sec
-    limiter = get_pubmed_limiter(api_key=None)
-    print(f"  Rate without key: {limiter.rate}")
-    # Multiple tools should share the same limiter
-    tool1 = PubMedTool()
-    tool2 = PubMedTool()
-    # Verify they share the limiter
-    print(f"  Tools share limiter: {tool1._limiter is tool2._limiter}")
-async def test_concurrent_requests():
-    """Test rate limiting under concurrent load."""
-    print("\n[Test 3] Testing concurrent request limiting...")
-    limiter = RateLimiter("5/second")
-    async def make_request(i: int):
-        await limiter.acquire()
-        return time.monotonic()
-    start = time.monotonic()
-    # Launch 10 concurrent requests
-    tasks = [make_request(i) for i in range(10)]
-    times = await asyncio.gather(*tasks)
-    # Calculate distribution
-    relative_times = [t - start for t in times]
-    print(f"  Request times: {[f'{t:.2f}s' for t in sorted(relative_times)]}")
-    total = max(relative_times)
-    print(f"  All 10 requests completed in {total:.2f}s (expected ~2s)")
-async def main():
-    await test_basic_limiter()
-    await test_pubmed_limiter()
-    await test_concurrent_requests()
-    print("\n" + "=" * 60)
-    print("Demo complete!")
-if __name__ == "__main__":
-    asyncio.run(main())
-```
----
-## Verification Checklist
-### Unit Tests
-```bash
-# Run rate limiting tests
-uv run pytest tests/unit/tools/test_rate_limiting.py -v
-# Expected: All tests pass
-```
-### Integration Test (Manual)
-```bash
-# Run demo
-uv run python examples/rate_limiting_demo.py
-# Expected: Requests properly spaced
-```
-### Full Test Suite
-```bash
-make check
-# Expected: All tests pass, mypy clean
-```
----
-## Success Criteria
-1. **`limits` library installed**: Dependency added to pyproject.toml
-2. **RateLimiter class works**: Can create and use limiters
-3. **PubMed uses new limiter**: Shared limiter across instances
-4. **Rate adapts to API key**: 3/sec without, 10/sec with
-5. **Concurrent requests handled**: Multiple async requests properly queued
-6. **No regressions**: All existing tests pass
----
-## API Rate Limit Reference
-| API | Without Key | With Key |
-|-----|-------------|----------|
-| PubMed/NCBI | 3/sec | 10/sec |
-| ClinicalTrials.gov | Undocumented (~5/sec safe) | N/A |
-| Europe PMC | ~10-20/sec (generous) | N/A |
-| OpenAlex | ~100k/day (no per-sec limit) | Faster with `mailto` |
----
-## Notes
-- `limits` library uses moving window algorithm (fairer than fixed window)
-- Singleton pattern ensures all PubMed calls share the limit
-- The factory pattern allows easy extension to other APIs
-- Consider adding 429 response detection + exponential backoff
-- In production, consider Redis storage for distributed rate limiting

docs/brainstorming/implementation/README.md DELETED Viewed

@@ -1,143 +0,0 @@
-# Implementation Plans
-TDD implementation plans based on the brainstorming documents. Each phase is a self-contained vertical slice with tests, implementation, and demo scripts.
----
-## Prerequisites (COMPLETED)
-The following foundational changes have been implemented to support all three phases:
-| Change | File | Status |
-|--------|------|--------|
-| Add `"openalex"` to `SourceName` | `src/utils/models.py:9` | ✅ Done |
-| Add `metadata` field to `Evidence` | `src/utils/models.py:39-42` | ✅ Done |
-| Export all tools from `__init__.py` | `src/tools/__init__.py` | ✅ Done |
-All 110 tests pass after these changes.
----
-## Priority Order
-| Phase | Name | Priority | Effort | Value |
-|-------|------|----------|--------|-------|
-| **17** | Rate Limiting | P0 CRITICAL | 1 hour | Stability |
-| **15** | OpenAlex | HIGH | 2-3 hours | Very High |
-| **16** | PubMed Full-Text | MEDIUM | 3 hours | High |
-**Recommended implementation order**: 17 → 15 → 16
----
-## Phase 15: OpenAlex Integration
-**File**: [15_PHASE_OPENALEX.md](./15_PHASE_OPENALEX.md)
-Add OpenAlex as 4th data source for:
-- Citation networks (who cites whom)
-- Concept tagging (semantic discovery)
-- 209M+ scholarly works
-- Free, no API key required
-**Quick Start**:
-```bash
-# Create the tool
-touch src/tools/openalex.py
-touch tests/unit/tools/test_openalex.py
-# Run tests first (TDD)
-uv run pytest tests/unit/tools/test_openalex.py -v
-# Demo
-uv run python examples/openalex_demo.py
-```
----
-## Phase 16: PubMed Full-Text
-**File**: [16_PHASE_PUBMED_FULLTEXT.md](./16_PHASE_PUBMED_FULLTEXT.md)
-Add full-text retrieval via BioC API for:
-- Complete paper text (not just abstracts)
-- Structured sections (intro, methods, results)
-- Better evidence for LLM synthesis
-**Quick Start**:
-```bash
-# Add methods to existing pubmed.py
-# Tests in test_pubmed_fulltext.py
-# Run tests
-uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
-# Demo
-uv run python examples/pubmed_fulltext_demo.py
-```
----
-## Phase 17: Rate Limiting
-**File**: [17_PHASE_RATE_LIMITING.md](./17_PHASE_RATE_LIMITING.md)
-Replace naive sleep-based rate limiting with `limits` library for:
-- Moving window algorithm
-- Shared limits across instances
-- Configurable per-API rates
-- Production-grade stability
-**Quick Start**:
-```bash
-# Add dependency
-uv add limits
-# Create module
-touch src/tools/rate_limiter.py
-touch tests/unit/tools/test_rate_limiting.py
-# Run tests
-uv run pytest tests/unit/tools/test_rate_limiting.py -v
-# Demo
-uv run python examples/rate_limiting_demo.py
-```
----
-## TDD Workflow
-Each implementation doc follows this pattern:
-1. **Write tests first** - Define expected behavior
-2. **Run tests** - Verify they fail (red)
-3. **Implement** - Write minimal code to pass
-4. **Run tests** - Verify they pass (green)
-5. **Refactor** - Clean up if needed
-6. **Demo** - Verify end-to-end with real APIs
-7. **`make check`** - Ensure no regressions
----
-## Related Brainstorming Docs
-These implementation plans are derived from:
-- [00_ROADMAP_SUMMARY.md](../00_ROADMAP_SUMMARY.md) - Priority overview
-- [01_PUBMED_IMPROVEMENTS.md](../01_PUBMED_IMPROVEMENTS.md) - PubMed details
-- [02_CLINICALTRIALS_IMPROVEMENTS.md](../02_CLINICALTRIALS_IMPROVEMENTS.md) - CT.gov details
-- [03_EUROPEPMC_IMPROVEMENTS.md](../03_EUROPEPMC_IMPROVEMENTS.md) - Europe PMC details
-- [04_OPENALEX_INTEGRATION.md](../04_OPENALEX_INTEGRATION.md) - OpenAlex integration
----
-## Future Phases (Not Yet Documented)
-Based on brainstorming, these could be added later:
-- **Phase 18**: ClinicalTrials.gov Results Retrieval
-- **Phase 19**: Europe PMC Annotations API
-- **Phase 20**: Drug Name Normalization (RxNorm)
-- **Phase 21**: Citation Network Queries (OpenAlex)
-- **Phase 22**: Semantic Search with Embeddings

docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md DELETED Viewed

@@ -1,189 +0,0 @@
-# Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
-**Date:** November 27, 2025
-**Status:** ACTIVE DECISION REQUIRED
-**Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
----
-## 1. The Problem
-We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
-**They are not.** They are complementary:
-- **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
-- **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
----
-## 2. Current Branch State
-| Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
-|--------|----------|---------------------|------------------------------|--------|
-| `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
-| `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
-| `origin/main` | GitHub | YES | NO | **SAFE** |
-| `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
-| `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
-| Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
-### Key Files at Risk
-**On `origin/dev` (PRESERVED):**
-```text
-src/agents/
-├── analysis_agent.py      # StatisticalAnalyzer wrapper
-├── hypothesis_agent.py    # Hypothesis generation
-├── judge_agent.py         # JudgeHandler wrapper
-├── magentic_agents.py     # Multi-agent definitions
-├── report_agent.py        # Report synthesis
-├── search_agent.py        # SearchHandler wrapper
-├── state.py               # Thread-safe state management
-└── tools.py               # @ai_function decorated tools
-src/orchestrator_magentic.py  # Multi-agent orchestrator
-src/utils/llm_factory.py      # Centralized LLM client factory
-```
-**Deleted in refactor branch (would be lost if merged):**
-- All of the above
----
-## 3. Target Architecture
-```text
-┌─────────────────────────────────────────────────────────────────┐
-│  Microsoft Agent Framework (Orchestration Layer)                │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
-│  │ SearchAgent  │→ │ JudgeAgent   │→ │ ReportAgent  │          │
-│  │ (BaseAgent)  │  │ (BaseAgent)  │  │ (BaseAgent)  │          │
-│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
-│         │                 │                 │                  │
-│         ▼                 ▼                 ▼                  │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
-│  │ pydantic-ai  │  │ pydantic-ai  │  │ pydantic-ai  │          │
-│  │ Agent()      │  │ Agent()      │  │ Agent()      │          │
-│  │ output_type= │  │ output_type= │  │ output_type= │          │
-│  │ SearchResult │  │ JudgeAssess  │  │ Report       │          │
-│  └──────────────┘  └──────────────┘  └──────────────┘          │
-└─────────────────────────────────────────────────────────────────┘
-```
-**Why this architecture:**
-1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
-2. **pydantic-ai** handles: type-safe LLM calls within each agent
----
-## 4. CRITICAL: Naming Confusion Clarification
-> **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
-**The naming confusion:**
-- `magentic` (PyPI package): A different library for structured LLM outputs
-- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
-- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
-**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
----
-## 5. What the Refactor DID Get Right
-The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
-1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
-2. **HuggingFace free tier support** - `HuggingFaceModel` integration
-3. **Test fix** - Properly mocks `HuggingFaceModel` class
-4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
-**What it got WRONG:**
-1. Deleted `src/agents/` entirely instead of refactoring them
-2. Deleted `src/orchestrator_magentic.py` instead of fixing it
-3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
----
-## 6. Options for Path Forward
-### Option A: Abandon Refactor, Start Fresh
-- Close PR #41
-- Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
-- Reset local `dev` to match `origin/dev`
-- Cherry-pick ONLY the good parts (judges.py improvements, HF support)
-- **Pros:** Clean, safe
-- **Cons:** Lose some work, need to redo carefully
-### Option B: Cherry-Pick Good Parts to origin/dev
-- Do NOT merge PR #41
-- Create new branch from `origin/dev`
-- Cherry-pick specific commits/changes that improve pydantic-ai usage
-- Keep agent framework code intact
-- **Pros:** Preserves both, surgical
-- **Cons:** Requires careful file-by-file review
-### Option C: Revert Deletions in Refactor Branch
-- On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
-- Keep the pydantic-ai improvements
-- Merge THAT to dev
-- **Pros:** Gets both
-- **Cons:** Complex git operations, risk of conflicts
----
-## 7. Recommended Action: Option B (Cherry-Pick)
-**Step-by-step:**
-1. **Close PR #41** (do not merge)
-2. **Delete redundant branches:**
-   - `refactor/pydantic-unification` (local)
-   - Reset local `dev` to `origin/dev`
-3. **Create new branch from origin/dev:**
-   ```bash
-   git checkout -b feat/pydantic-ai-improvements origin/dev
-   ```
-4. **Cherry-pick or manually port these improvements:**
-   - `src/agent_factory/judges.py` - the unified `get_model()` function
-   - `examples/free_tier_demo.py` - HuggingFace demo
-   - Test improvements
-5. **Do NOT delete any agent framework files**
-6. **Create PR for review**
----
-## 8. Files to Cherry-Pick (Safe Improvements)
-| File | What Changed | Safe to Port? |
-|------|-------------|---------------|
-| `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
-| `examples/free_tier_demo.py` | New demo for HF inference | YES |
-| `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
-| `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
----
-## 9. Questions to Answer Before Proceeding
-1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
-2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
-3. **Timeline**: How much time do we have to get this right?
----
-## 10. Immediate Actions (DO NOW)
-- [ ] **DO NOT merge PR #41**
-- [ ] Close PR #41 with comment explaining the situation
-- [ ] Do not push local `dev` branch anywhere
-- [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
----
-## 11. Decision Log
-| Date | Decision | Rationale |
-|------|----------|-----------|
-| 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
-| TBD | ? | Awaiting decision on path forward |

docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md DELETED Viewed

@@ -1,289 +0,0 @@
-# Architecture Specification: Dual-Mode Agent System
-**Date:** November 27, 2025
-**Status:** SPECIFICATION
-**Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
----
-## 1. Core Concept: Two Operating Modes
-```text
-┌─────────────────────────────────────────────────────────────────────┐
-│                        USER REQUEST                                 │
-│                            │                                        │
-│                            ▼                                        │
-│                   ┌─────────────────┐                               │
-│                   │  Mode Selection │                               │
-│                   │  (Auto-detect)  │                               │
-│                   └────────┬────────┘                               │
-│                            │                                        │
-│            ┌───────────────┴───────────────┐                        │
-│            │                               │                        │
-│            ▼                               ▼                        │
-│   ┌─────────────────┐             ┌─────────────────┐               │
-│   │   SIMPLE MODE   │             │  ADVANCED MODE  │               │
-│   │  (Free Tier)    │             │  (Paid Tier)    │               │
-│   │                 │             │                 │               │
-│   │  pydantic-ai    │             │  MS Agent Fwk   │               │
-│   │  single-agent   │             │  + pydantic-ai  │               │
-│   │  loop           │             │  multi-agent    │               │
-│   └─────────────────┘             └─────────────────┘               │
-│            │                               │                        │
-│            └───────────────┬───────────────┘                        │
-│                            ▼                                        │
-│                   ┌─────────────────┐                               │
-│                   │  Research Report │                              │
-│                   │  with Citations  │                              │
-│                   └─────────────────┘                               │
-└─────────────────────────────────────────────────────────────────────┘
-```
----
-## 2. Mode Comparison
-| Aspect | Simple Mode | Advanced Mode |
-|--------|-------------|---------------|
-| **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
-| **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
-| **Architecture** | Single orchestrator loop | Multi-agent coordination |
-| **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
-| **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
-| **Quality** | Good (functional) | Better (specialized agents, coordination) |
-| **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
-| **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
----
-## 3. Simple Mode Architecture (pydantic-ai Only)
-```text
-┌─────────────────────────────────────────────────────┐
-│                  Orchestrator                       │
-│                                                     │
-│   while not sufficient and iteration < max:        │
-│       1. SearchHandler.execute(query)              │
-│       2. JudgeHandler.assess(evidence)    ◄── pydantic-ai Agent  │
-│       3. if sufficient: break                      │
-│       4. query = judge.next_queries                │
-│                                                     │
-│   return ReportGenerator.generate(evidence)        │
-└─────────────────────────────────────────────────────┘
-```
-**Components:**
-- `src/orchestrator.py` - Simple loop orchestrator
-- `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
-- `src/tools/search_handler.py` - Scatter-gather search
-- `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
----
-## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
-```text
-┌─────────────────────────────────────────────────────────────────────┐
-│              Microsoft Agent Framework Orchestrator                 │
-│                                                                     │
-│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
-│   │ SearchAgent │───▶│ JudgeAgent  │───▶│ ReportAgent │            │
-│   │ (BaseAgent) │    │ (BaseAgent) │    │ (BaseAgent) │            │
-│   └──────┬──────┘    └──────┬──────┘    └──────┬──────┘            │
-│          │                  │                  │                    │
-│          ▼                  ▼                  ▼                    │
-│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
-│   │ pydantic-ai │    │ pydantic-ai │    │ pydantic-ai │            │
-│   │ Agent()     │    │ Agent()     │    │ Agent()     │            │
-│   │ output_type=│    │ output_type=│    │ output_type=│            │
-│   │ SearchResult│    │ JudgeAssess │    │ Report      │            │
-│   └─────────────┘    └─────────────┘    └─────────────┘            │
-│                                                                     │
-│   Shared State: MagenticState (thread-safe via contextvars)        │
-│   - evidence: list[Evidence]                                       │
-│   - embedding_service: EmbeddingService                            │
-└─────────────────────────────────────────────────────────────────────┘
-```
-**Components:**
-- `src/orchestrator_magentic.py` - Multi-agent orchestrator
-- `src/agents/search_agent.py` - SearchAgent (BaseAgent)
-- `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
-- `src/agents/report_agent.py` - ReportAgent (BaseAgent)
-- `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
-- `src/agents/state.py` - Thread-safe state management
-- `src/agents/tools.py` - @ai_function decorated tools
----
-## 5. Mode Selection Logic
-```python
-# src/orchestrator_factory.py (actual implementation)
-def create_orchestrator(
-    search_handler: SearchHandlerProtocol | None = None,
-    judge_handler: JudgeHandlerProtocol | None = None,
-    config: OrchestratorConfig | None = None,
-    mode: Literal["simple", "magentic", "advanced"] | None = None,
-) -> Any:
-    """
-    Auto-select orchestrator based on available credentials.
-    Priority:
-    1. If mode explicitly set, use that
-    2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
-    3. Otherwise -> Simple Mode (HuggingFace free tier)
-    """
-    effective_mode = _determine_mode(mode)
-    if effective_mode == "advanced":
-        orchestrator_cls = _get_magentic_orchestrator_class()
-        return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
-    # Simple mode requires handlers
-    if search_handler is None or judge_handler is None:
-        raise ValueError("Simple mode requires search_handler and judge_handler")
-    return Orchestrator(
-        search_handler=search_handler,
-        judge_handler=judge_handler,
-        config=config,
-    )
-```
----
-## 6. Shared Components (Both Modes Use)
-These components work in both modes:
-| Component | Purpose |
-|-----------|---------|
-| `src/tools/pubmed.py` | PubMed search |
-| `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
-| `src/tools/europepmc.py` | Europe PMC search |
-| `src/tools/search_handler.py` | Scatter-gather orchestration |
-| `src/tools/rate_limiter.py` | Rate limiting |
-| `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
-| `src/utils/config.py` | Settings |
-| `src/services/embeddings.py` | Vector search (optional) |
----
-## 7. pydantic-ai Integration Points
-Both modes use pydantic-ai for structured LLM outputs:
-```python
-# In JudgeHandler (both modes)
-from pydantic_ai import Agent
-from pydantic_ai.models.huggingface import HuggingFaceModel
-from pydantic_ai.models.openai import OpenAIModel
-from pydantic_ai.models.anthropic import AnthropicModel
-class JudgeHandler:
-    def __init__(self, model: Any = None):
-        self.model = model or get_model()  # Auto-selects based on config
-        self.agent = Agent(
-            model=self.model,
-            output_type=JudgeAssessment,  # Structured output!
-            system_prompt=SYSTEM_PROMPT,
-        )
-    async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
-        result = await self.agent.run(format_prompt(question, evidence))
-        return result.output  # Guaranteed to be JudgeAssessment
-```
----
-## 8. Microsoft Agent Framework Integration Points
-Advanced mode wraps pydantic-ai agents in BaseAgent:
-```python
-# In JudgeAgent (advanced mode only)
-from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
-class JudgeAgent(BaseAgent):
-    def __init__(self, judge_handler: JudgeHandlerProtocol):
-        super().__init__(
-            name="JudgeAgent",
-            description="Evaluates evidence quality",
-        )
-        self._handler = judge_handler  # Uses pydantic-ai internally
-    async def run(self, messages, **kwargs) -> AgentRunResponse:
-        question = extract_question(messages)
-        evidence = self._evidence_store.get("current", [])
-        # Delegate to pydantic-ai powered handler
-        assessment = await self._handler.assess(question, evidence)
-        return AgentRunResponse(
-            messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
-            additional_properties={"assessment": assessment.model_dump()},
-        )
-```
----
-## 9. Benefits of This Architecture
-1. **Graceful Degradation**: Works without API keys (free tier)
-2. **Progressive Enhancement**: Better with API keys (orchestration)
-3. **Code Reuse**: pydantic-ai handlers shared between modes
-4. **Hackathon Ready**: Demo works without requiring paid keys
-5. **Production Ready**: Full orchestration available when needed
-6. **Future Proof**: Can add more agents to advanced mode
-7. **Testable**: Simple mode is easier to unit test
----
-## 10. Known Risks and Mitigations
-> **From Senior Agent Review**
-### 10.1 Bridge Complexity (MEDIUM)
-**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
-**Mitigation:**
-- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
-- Test context propagation explicitly in integration tests
-- If issues arise, pass state explicitly rather than via context vars
-### 10.2 Integration Drift (MEDIUM)
-**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
-**Mitigation:**
-- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
-- Handlers are the single source of truth for business logic
-- Agents are thin wrappers that delegate to handlers
-### 10.3 Testing Burden (LOW-MEDIUM)
-**Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
-**Mitigation:**
-- Unit test handlers independently (shared code)
-- Integration tests for each mode separately
-- End-to-end tests verify same output for same input (determinism permitting)
-### 10.4 Dependency Conflicts (LOW)
-**Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
-**Status:** Both use `pydantic>=2.x`. Should be compatible.
----
-## 11. Naming Clarification
-> See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
-**Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
-**Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.