Tonic commited on
Commit
ab33e9d
·
unverified ·
2 Parent(s): 3ab54ea ca3a4f7

Initial demo testing (#4)

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .github/README.md +46 -26
  2. .github/workflows/ci.yml +14 -14
  3. .github/workflows/docs.yml +56 -0
  4. .gitignore +4 -0
  5. .pre-commit-config.yaml +6 -16
  6. .pre-commit-hooks/run_pytest.ps1 +5 -0
  7. .pre-commit-hooks/run_pytest.sh +5 -0
  8. .pre-commit-hooks/run_pytest_embeddings.ps1 +14 -0
  9. .pre-commit-hooks/run_pytest_embeddings.sh +15 -0
  10. .pre-commit-hooks/run_pytest_unit.ps1 +14 -0
  11. .pre-commit-hooks/run_pytest_unit.sh +15 -0
  12. .pre-commit-hooks/run_pytest_with_sync.ps1 +25 -0
  13. .pre-commit-hooks/run_pytest_with_sync.py +93 -0
  14. =0.22.0 +0 -0
  15. =0.22.0, +0 -0
  16. CONTRIBUTING.md +0 -1
  17. Makefile +9 -0
  18. README.md +99 -173
  19. .cursorrules → dev/.cursorrules +1 -0
  20. AGENTS.txt → dev/AGENTS.txt +0 -0
  21. dev/Makefile +51 -0
  22. dev/docs_plugins.py +74 -0
  23. docs/CONFIGURATION.md +0 -301
  24. docs/api/agents.md +260 -0
  25. docs/api/models.md +238 -0
  26. docs/api/orchestrators.md +185 -0
  27. docs/api/services.md +191 -0
  28. docs/api/tools.md +225 -0
  29. docs/architecture/agents.md +182 -0
  30. docs/architecture/design-patterns.md +0 -1509
  31. docs/architecture/graph-orchestration.md +152 -0
  32. docs/architecture/graph_orchestration.md +8 -0
  33. docs/architecture/middleware.md +132 -0
  34. docs/architecture/orchestrators.md +198 -0
  35. docs/architecture/overview.md +0 -474
  36. docs/architecture/services.md +132 -0
  37. docs/architecture/tools.md +165 -0
  38. docs/architecture/workflow-diagrams.md +670 -0
  39. docs/{workflow-diagrams.md → architecture/workflows.md} +0 -0
  40. docs/brainstorming/00_ROADMAP_SUMMARY.md +0 -194
  41. docs/brainstorming/01_PUBMED_IMPROVEMENTS.md +0 -125
  42. docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md +0 -193
  43. docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md +0 -211
  44. docs/brainstorming/04_OPENALEX_INTEGRATION.md +0 -303
  45. docs/brainstorming/implementation/15_PHASE_OPENALEX.md +0 -603
  46. docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md +0 -586
  47. docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md +0 -540
  48. docs/brainstorming/implementation/README.md +0 -143
  49. docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +0 -189
  50. docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +0 -289
.github/README.md CHANGED
@@ -7,7 +7,11 @@ sdk: gradio
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
9
  app_file: src/app.py
10
- pinned: false
 
 
 
 
11
  license: mit
12
  tags:
13
  - mcp-in-action-track-enterprise
@@ -19,6 +23,18 @@ tags:
19
  - modal
20
  ---
21
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  # DeepCritical
23
 
24
  ## Intro
@@ -27,9 +43,10 @@ tags:
27
 
28
  - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
29
  - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
 
30
  - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
  - **LlamaIndex RAG**: Semantic search and evidence synthesis
32
- - **HuggingfaceInference**:
33
  - **HuggingfaceMCP Custom Config To Use Community Tools**:
34
  - **Strongly Typed Composable Graphs**:
35
  - **Specialized Research Teams of Agents**:
@@ -55,7 +72,20 @@ uv run gradio run src/app.py
55
 
56
  Open your browser to `http://localhost:7860`.
57
 
58
- ### 3. Connect via MCP
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
61
 
@@ -81,7 +111,13 @@ Add this to your `claude_desktop_config.json`:
81
  - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
82
 
83
 
84
- ## Deep Research Flows
 
 
 
 
 
 
85
 
86
  - iterativeResearch
87
  - deepResearch
@@ -89,6 +125,7 @@ Add this to your `claude_desktop_config.json`:
89
 
90
  ### Iterative Research
91
 
 
92
  sequenceDiagram
93
  participant IterativeFlow
94
  participant ThinkingAgent
@@ -121,10 +158,12 @@ sequenceDiagram
121
  JudgeHandler-->>IterativeFlow: should_continue
122
  end
123
  end
 
124
 
125
 
126
  ### Deep Research
127
 
 
128
  sequenceDiagram
129
  actor User
130
  participant GraphOrchestrator
@@ -159,8 +198,10 @@ sequenceDiagram
159
  end
160
 
161
  GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
 
162
 
163
  ### Research Team
 
164
  Critical Deep Research Agent
165
 
166
  ## Development
@@ -177,27 +218,6 @@ uv run pytest
177
  make check
178
  ```
179
 
180
- ## Architecture
181
-
182
- DeepCritical uses a Vertical Slice Architecture:
183
-
184
- 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
185
- 2. **Judge Slice**: Evaluating evidence quality using LLMs.
186
- 3. **Orchestrator Slice**: Managing the research loop and UI.
187
-
188
- Built with:
189
- - **PydanticAI**: For robust agent interactions.
190
- - **Gradio**: For the streaming user interface.
191
- - **PubMed, ClinicalTrials.gov, bioRxiv**: For biomedical data.
192
- - **MCP**: For universal tool access.
193
- - **Modal**: For secure code execution.
194
-
195
- ## Team
196
-
197
- - The-Obstacle-Is-The-Way
198
- - MarioAderman
199
- - Josephrp
200
-
201
  ## Links
202
 
203
- - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
 
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
9
  app_file: src/app.py
10
+ hf_oauth: true
11
+ hf_oauth_expiration_minutes: 480
12
+ hf_oauth_scopes:
13
+ - inference-api
14
+ pinned: true
15
  license: mit
16
  tags:
17
  - mcp-in-action-track-enterprise
 
23
  - modal
24
  ---
25
 
26
+ <div align="center">
27
+
28
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
29
+ [![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
30
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
31
+ [![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
32
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
33
+
34
+
35
+ </div>
36
+
37
+
38
  # DeepCritical
39
 
40
  ## Intro
 
43
 
44
  - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
45
  - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
46
+ - **HuggingFace OAuth**: Sign in with your HuggingFace account to automatically use your API token
47
  - **Modal Sandbox**: Secure execution of AI-generated statistical code
48
  - **LlamaIndex RAG**: Semantic search and evidence synthesis
49
+ - **HuggingfaceInference**: Free tier support with automatic fallback
50
  - **HuggingfaceMCP Custom Config To Use Community Tools**:
51
  - **Strongly Typed Composable Graphs**:
52
  - **Specialized Research Teams of Agents**:
 
72
 
73
  Open your browser to `http://localhost:7860`.
74
 
75
+ ### 3. Authentication (Optional)
76
+
77
+ **HuggingFace OAuth Login**:
78
+ - Click the "Sign in with HuggingFace" button at the top of the app
79
+ - Your HuggingFace API token will be automatically used for AI inference
80
+ - No need to manually enter API keys when logged in
81
+ - OAuth token is used only for the current session and never stored
82
+
83
+ **Manual API Key (BYOK)**:
84
+ - You can still provide your own API key in the Settings accordion
85
+ - Supports HuggingFace, OpenAI, or Anthropic API keys
86
+ - Manual keys take priority over OAuth tokens
87
+
88
+ ### 4. Connect via MCP
89
 
90
  This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
91
 
 
111
  - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
112
 
113
 
114
+ ## Architecture
115
+
116
+ DeepCritical uses a Vertical Slice Architecture:
117
+
118
+ 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
119
+ 2. **Judge Slice**: Evaluating evidence quality using LLMs.
120
+ 3. **Orchestrator Slice**: Managing the research loop and UI.
121
 
122
  - iterativeResearch
123
  - deepResearch
 
125
 
126
  ### Iterative Research
127
 
128
+ ```mermaid
129
  sequenceDiagram
130
  participant IterativeFlow
131
  participant ThinkingAgent
 
158
  JudgeHandler-->>IterativeFlow: should_continue
159
  end
160
  end
161
+ ```
162
 
163
 
164
  ### Deep Research
165
 
166
+ ```mermaid
167
  sequenceDiagram
168
  actor User
169
  participant GraphOrchestrator
 
198
  end
199
 
200
  GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
201
+ ```
202
 
203
  ### Research Team
204
+
205
  Critical Deep Research Agent
206
 
207
  ## Development
 
218
  make check
219
  ```
220
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
221
  ## Links
222
 
223
+ - [GitHub Repository](https://github.com/DeepCritical/GradioDemo)
.github/workflows/ci.yml CHANGED
@@ -16,6 +16,11 @@ jobs:
16
  steps:
17
  - uses: actions/checkout@v4
18
 
 
 
 
 
 
19
  - name: Set up Python ${{ matrix.python-version }}
20
  uses: actions/setup-python@v5
21
  with:
@@ -23,45 +28,40 @@ jobs:
23
 
24
  - name: Install dependencies
25
  run: |
26
- python -m pip install --upgrade pip
27
- pip install -e ".[dev]"
28
 
29
  - name: Lint with ruff
30
  run: |
31
- ruff check . --exclude tests
32
- ruff format --check . --exclude tests
33
 
34
  - name: Type check with mypy
35
  run: |
36
- mypy src
37
-
38
- - name: Install embedding dependencies
39
- run: |
40
- pip install -e ".[embeddings]"
41
 
42
- - name: Run unit tests (excluding OpenAI and embedding providers)
43
  env:
44
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
45
  run: |
46
- pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
47
 
48
  - name: Run local embeddings tests
49
  env:
50
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
51
  run: |
52
- pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
53
  continue-on-error: true # Allow failures if dependencies not available
54
 
55
  - name: Run HuggingFace integration tests
56
  env:
57
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
58
  run: |
59
- pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
60
  continue-on-error: true # Allow failures if HF_TOKEN not set
61
 
62
  - name: Run non-OpenAI integration tests (excluding embedding providers)
63
  env:
64
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
65
  run: |
66
- pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
67
  continue-on-error: true # Allow failures if dependencies not available
 
16
  steps:
17
  - uses: actions/checkout@v4
18
 
19
+ - name: Install uv
20
+ uses: astral-sh/setup-uv@v5
21
+ with:
22
+ version: "latest"
23
+
24
  - name: Set up Python ${{ matrix.python-version }}
25
  uses: actions/setup-python@v5
26
  with:
 
28
 
29
  - name: Install dependencies
30
  run: |
31
+ uv sync --dev
 
32
 
33
  - name: Lint with ruff
34
  run: |
35
+ uv run ruff check . --exclude tests
36
+ uv run ruff format --check . --exclude tests
37
 
38
  - name: Type check with mypy
39
  run: |
40
+ uv run mypy src
 
 
 
 
41
 
42
+ - name: Run unit tests (No Black Box Apis)
43
  env:
44
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
45
  run: |
46
+ uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
47
 
48
  - name: Run local embeddings tests
49
  env:
50
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
51
  run: |
52
+ uv run pytest tests/ -v -m "local_embeddings" --tb=short -p no:logfire || true
53
  continue-on-error: true # Allow failures if dependencies not available
54
 
55
  - name: Run HuggingFace integration tests
56
  env:
57
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
58
  run: |
59
+ uv run pytest tests/integration/ -v -m "huggingface and not embedding_provider" --tb=short -p no:logfire || true
60
  continue-on-error: true # Allow failures if HF_TOKEN not set
61
 
62
  - name: Run non-OpenAI integration tests (excluding embedding providers)
63
  env:
64
  HF_TOKEN: ${{ secrets.HF_TOKEN }}
65
  run: |
66
+ uv run pytest tests/integration/ -v -m "integration and not openai and not embedding_provider" --tb=short -p no:logfire || true
67
  continue-on-error: true # Allow failures if dependencies not available
.github/workflows/docs.yml ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Documentation
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - 'docs/**'
9
+ - 'mkdocs.yml'
10
+ - '.github/workflows/docs.yml'
11
+ pull_request:
12
+ branches:
13
+ - main
14
+ paths:
15
+ - 'docs/**'
16
+ - 'mkdocs.yml'
17
+ - '.github/workflows/docs.yml'
18
+ workflow_dispatch:
19
+
20
+ permissions:
21
+ contents: write
22
+
23
+ jobs:
24
+ build:
25
+ runs-on: ubuntu-latest
26
+ steps:
27
+ - uses: actions/checkout@v4
28
+
29
+ - name: Set up Python
30
+ uses: actions/setup-python@v5
31
+ with:
32
+ python-version: '3.11'
33
+
34
+ - name: Install uv
35
+ run: |
36
+ pip install uv
37
+
38
+ - name: Install dependencies
39
+ run: |
40
+ uv sync --all-extras --dev
41
+
42
+ - name: Build documentation
43
+ run: |
44
+ uv run mkdocs build --strict
45
+
46
+ - name: Deploy to GitHub Pages
47
+ if: github.ref == 'refs/heads/main' && github.event_name == 'push'
48
+ uses: peaceiris/actions-gh-pages@v3
49
+ with:
50
+ github_token: ${{ secrets.GITHUB_TOKEN }}
51
+ publish_dir: ./site
52
+ cname: false
53
+
54
+
55
+
56
+
.gitignore CHANGED
@@ -1,6 +1,10 @@
 
 
1
  folder/
 
2
  .cursor/
3
  .ruff_cache/
 
4
  # Python
5
  __pycache__/
6
  *.py[cod]
 
1
+ =0.22.0
2
+ =0.22.0,
3
  folder/
4
+ site/
5
  .cursor/
6
  .ruff_cache/
7
+ docs/contributing/
8
  # Python
9
  __pycache__/
10
  *.py[cod]
.pre-commit-config.yaml CHANGED
@@ -31,14 +31,9 @@ repos:
31
  types: [python]
32
  args: [
33
  "run",
34
- "pytest",
35
- "tests/unit/",
36
- "-v",
37
- "-m",
38
- "not openai and not embedding_provider",
39
- "--tb=short",
40
- "-p",
41
- "no:logfire",
42
  ]
43
  pass_filenames: false
44
  always_run: true
@@ -50,14 +45,9 @@ repos:
50
  types: [python]
51
  args: [
52
  "run",
53
- "pytest",
54
- "tests/",
55
- "-v",
56
- "-m",
57
- "local_embeddings",
58
- "--tb=short",
59
- "-p",
60
- "no:logfire",
61
  ]
62
  pass_filenames: false
63
  always_run: true
 
31
  types: [python]
32
  args: [
33
  "run",
34
+ "python",
35
+ ".pre-commit-hooks/run_pytest_with_sync.py",
36
+ "unit",
 
 
 
 
 
37
  ]
38
  pass_filenames: false
39
  always_run: true
 
45
  types: [python]
46
  args: [
47
  "run",
48
+ "python",
49
+ ".pre-commit-hooks/run_pytest_with_sync.py",
50
+ "embeddings",
 
 
 
 
 
51
  ]
52
  pass_filenames: false
53
  always_run: true
.pre-commit-hooks/run_pytest.ps1 CHANGED
@@ -2,6 +2,8 @@
2
  # Uses uv if available, otherwise falls back to pytest
3
 
4
  if (Get-Command uv -ErrorAction SilentlyContinue) {
 
 
5
  uv run pytest $args
6
  } else {
7
  Write-Warning "uv not found, using system pytest (may have missing dependencies)"
@@ -12,3 +14,6 @@ if (Get-Command uv -ErrorAction SilentlyContinue) {
12
 
13
 
14
 
 
 
 
 
2
  # Uses uv if available, otherwise falls back to pytest
3
 
4
  if (Get-Command uv -ErrorAction SilentlyContinue) {
5
+ # Sync dependencies before running tests
6
+ uv sync
7
  uv run pytest $args
8
  } else {
9
  Write-Warning "uv not found, using system pytest (may have missing dependencies)"
 
14
 
15
 
16
 
17
+
18
+
19
+
.pre-commit-hooks/run_pytest.sh CHANGED
@@ -3,6 +3,8 @@
3
  # Uses uv if available, otherwise falls back to pytest
4
 
5
  if command -v uv >/dev/null 2>&1; then
 
 
6
  uv run pytest "$@"
7
  else
8
  echo "Warning: uv not found, using system pytest (may have missing dependencies)"
@@ -13,3 +15,6 @@ fi
13
 
14
 
15
 
 
 
 
 
3
  # Uses uv if available, otherwise falls back to pytest
4
 
5
  if command -v uv >/dev/null 2>&1; then
6
+ # Sync dependencies before running tests
7
+ uv sync
8
  uv run pytest "$@"
9
  else
10
  echo "Warning: uv not found, using system pytest (may have missing dependencies)"
 
15
 
16
 
17
 
18
+
19
+
20
+
.pre-commit-hooks/run_pytest_embeddings.ps1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper to sync embeddings dependencies and run embeddings tests
2
+
3
+ $ErrorActionPreference = "Stop"
4
+
5
+ if (Get-Command uv -ErrorAction SilentlyContinue) {
6
+ Write-Host "Syncing embeddings dependencies..."
7
+ uv sync --extra embeddings
8
+ Write-Host "Running embeddings tests..."
9
+ uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
10
+ } else {
11
+ Write-Error "uv not found"
12
+ exit 1
13
+ }
14
+
.pre-commit-hooks/run_pytest_embeddings.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Wrapper script to sync embeddings dependencies and run embeddings tests
3
+
4
+ set -e
5
+
6
+ if command -v uv >/dev/null 2>&1; then
7
+ echo "Syncing embeddings dependencies..."
8
+ uv sync --extra embeddings
9
+ echo "Running embeddings tests..."
10
+ uv run pytest tests/ -v -m local_embeddings --tb=short -p no:logfire
11
+ else
12
+ echo "Error: uv not found"
13
+ exit 1
14
+ fi
15
+
.pre-commit-hooks/run_pytest_unit.ps1 ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper to sync dependencies and run unit tests
2
+
3
+ $ErrorActionPreference = "Stop"
4
+
5
+ if (Get-Command uv -ErrorAction SilentlyContinue) {
6
+ Write-Host "Syncing dependencies..."
7
+ uv sync
8
+ Write-Host "Running unit tests..."
9
+ uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
10
+ } else {
11
+ Write-Error "uv not found"
12
+ exit 1
13
+ }
14
+
.pre-commit-hooks/run_pytest_unit.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Wrapper script to sync dependencies and run unit tests
3
+
4
+ set -e
5
+
6
+ if command -v uv >/dev/null 2>&1; then
7
+ echo "Syncing dependencies..."
8
+ uv sync
9
+ echo "Running unit tests..."
10
+ uv run pytest tests/unit/ -v -m "not openai and not embedding_provider" --tb=short -p no:logfire
11
+ else
12
+ echo "Error: uv not found"
13
+ exit 1
14
+ fi
15
+
.pre-commit-hooks/run_pytest_with_sync.ps1 ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # PowerShell wrapper for pytest runner
2
+ # Ensures uv is available and runs the Python script
3
+
4
+ param(
5
+ [Parameter(Position=0)]
6
+ [string]$TestType = "unit"
7
+ )
8
+
9
+ $ErrorActionPreference = "Stop"
10
+
11
+ # Check if uv is available
12
+ if (-not (Get-Command uv -ErrorAction SilentlyContinue)) {
13
+ Write-Error "uv not found. Please install uv: https://github.com/astral-sh/uv"
14
+ exit 1
15
+ }
16
+
17
+ # Get the script directory
18
+ $ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
19
+ $PythonScript = Join-Path $ScriptDir "run_pytest_with_sync.py"
20
+
21
+ # Run the Python script using uv
22
+ uv run python $PythonScript $TestType
23
+
24
+ exit $LASTEXITCODE
25
+
.pre-commit-hooks/run_pytest_with_sync.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """Cross-platform pytest runner that syncs dependencies before running tests."""
3
+
4
+ import subprocess
5
+ import sys
6
+
7
+
8
+ def run_command(
9
+ cmd: list[str], check: bool = True, shell: bool = False, cwd: str | None = None
10
+ ) -> int:
11
+ """Run a command and return exit code."""
12
+ try:
13
+ result = subprocess.run(
14
+ cmd,
15
+ check=check,
16
+ shell=shell,
17
+ cwd=cwd,
18
+ env=None, # Use current environment, uv will handle venv
19
+ )
20
+ return result.returncode
21
+ except subprocess.CalledProcessError as e:
22
+ return e.returncode
23
+ except FileNotFoundError:
24
+ print(f"Error: Command not found: {cmd[0]}")
25
+ return 1
26
+
27
+
28
+ def main() -> int:
29
+ """Main entry point."""
30
+ import os
31
+ from pathlib import Path
32
+
33
+ # Get the project root (where pyproject.toml is)
34
+ script_dir = Path(__file__).parent
35
+ project_root = script_dir.parent
36
+
37
+ # Change to project root to ensure uv works correctly
38
+ os.chdir(project_root)
39
+
40
+ # Check if uv is available
41
+ if run_command(["uv", "--version"], check=False) != 0:
42
+ print("Error: uv not found. Please install uv: https://github.com/astral-sh/uv")
43
+ return 1
44
+
45
+ # Parse arguments
46
+ test_type = sys.argv[1] if len(sys.argv) > 1 else "unit"
47
+ extra_args = sys.argv[2:] if len(sys.argv) > 2 else []
48
+
49
+ # Sync dependencies - always include dev
50
+ # Note: embeddings dependencies are now in main dependencies, not optional
51
+ # So we just sync with --dev for all test types
52
+ sync_cmd = ["uv", "sync", "--dev"]
53
+
54
+ print(f"Syncing dependencies for {test_type} tests...")
55
+ if run_command(sync_cmd, cwd=project_root) != 0:
56
+ return 1
57
+
58
+ # Build pytest command - use uv run to ensure correct environment
59
+ if test_type == "unit":
60
+ pytest_args = [
61
+ "tests/unit/",
62
+ "-v",
63
+ "-m",
64
+ "not openai and not embedding_provider",
65
+ "--tb=short",
66
+ "-p",
67
+ "no:logfire",
68
+ ]
69
+ elif test_type == "embeddings":
70
+ pytest_args = [
71
+ "tests/",
72
+ "-v",
73
+ "-m",
74
+ "local_embeddings",
75
+ "--tb=short",
76
+ "-p",
77
+ "no:logfire",
78
+ ]
79
+ else:
80
+ pytest_args = []
81
+
82
+ pytest_args.extend(extra_args)
83
+
84
+ # Use uv run python -m pytest to ensure we use the venv's pytest
85
+ # This is more reliable than uv run pytest which might find system pytest
86
+ pytest_cmd = ["uv", "run", "python", "-m", "pytest", *pytest_args]
87
+
88
+ print(f"Running {test_type} tests...")
89
+ return run_command(pytest_cmd, cwd=project_root)
90
+
91
+
92
+ if __name__ == "__main__":
93
+ sys.exit(main())
=0.22.0 ADDED
File without changes
=0.22.0, ADDED
File without changes
CONTRIBUTING.md DELETED
@@ -1 +0,0 @@
1
- make sure you run the full pre-commit checks before opening a PR (not draft) otherwise Obstacle is the Way will loose his mind
 
 
Makefile CHANGED
@@ -37,6 +37,15 @@ typecheck:
37
  check: lint typecheck test-cov
38
  @echo "All checks passed!"
39
 
 
 
 
 
 
 
 
 
 
40
  clean:
41
  rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
42
  find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
 
37
  check: lint typecheck test-cov
38
  @echo "All checks passed!"
39
 
40
+ docs-build:
41
+ uv run mkdocs build
42
+
43
+ docs-serve:
44
+ uv run mkdocs serve
45
+
46
+ docs-clean:
47
+ rm -rf site/
48
+
49
  clean:
50
  rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
51
  find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
2
- title: DeepCritical
3
- emoji: 🧬
4
- colorFrom: blue
5
- colorTo: purple
6
  sdk: gradio
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
9
  app_file: src/app.py
10
- pinned: false
 
 
 
 
11
  license: mit
12
  tags:
13
  - mcp-in-action-track-enterprise
@@ -19,178 +23,100 @@ tags:
19
  - modal
20
  ---
21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  # DeepCritical
23
 
24
- ## Intro
25
-
26
- ## Features
27
-
28
- - **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
29
- - **MCP Integration**: Use our tools from Claude Desktop or any MCP client
30
- - **Modal Sandbox**: Secure execution of AI-generated statistical code
31
- - **LlamaIndex RAG**: Semantic search and evidence synthesis
32
- - **HuggingfaceInference**:
33
- - **HuggingfaceMCP Custom Config To Use Community Tools**:
34
- - **Strongly Typed Composable Graphs**:
35
- - **Specialized Research Teams of Agents**:
36
-
37
- ## Quick Start
38
-
39
- ### 1. Environment Setup
40
-
41
- ```bash
42
- # Install uv if you haven't already
43
- pip install uv
44
-
45
- # Sync dependencies
46
- uv sync
47
- ```
48
-
49
- ### 2. Run the UI
50
-
51
- ```bash
52
- # Start the Gradio app
53
- uv run gradio run src/app.py
54
- ```
55
-
56
- Open your browser to `http://localhost:7860`.
57
-
58
- ### 3. Connect via MCP
59
-
60
- This application exposes a Model Context Protocol (MCP) server, allowing you to use its search tools directly from Claude Desktop or other MCP clients.
61
-
62
- **MCP Server URL**: `http://localhost:7860/gradio_api/mcp/`
63
-
64
- **Claude Desktop Configuration**:
65
- Add this to your `claude_desktop_config.json`:
66
- ```json
67
- {
68
- "mcpServers": {
69
- "deepcritical": {
70
- "url": "http://localhost:7860/gradio_api/mcp/"
71
- }
72
- }
73
- }
74
- ```
75
-
76
- **Available Tools**:
77
- - `search_pubmed`: Search peer-reviewed biomedical literature.
78
- - `search_clinical_trials`: Search ClinicalTrials.gov.
79
- - `search_biorxiv`: Search bioRxiv/medRxiv preprints.
80
- - `search_all`: Search all sources simultaneously.
81
- - `analyze_hypothesis`: Secure statistical analysis using Modal sandboxes.
82
-
83
-
84
-
85
- ## Architecture
86
-
87
- DeepCritical uses a Vertical Slice Architecture:
88
-
89
- 1. **Search Slice**: Retrieving evidence from PubMed, ClinicalTrials.gov, and bioRxiv.
90
- 2. **Judge Slice**: Evaluating evidence quality using LLMs.
91
- 3. **Orchestrator Slice**: Managing the research loop and UI.
92
-
93
- - iterativeResearch
94
- - deepResearch
95
- - researchTeam
96
-
97
- ### Iterative Research
98
-
99
- sequenceDiagram
100
- participant IterativeFlow
101
- participant ThinkingAgent
102
- participant KnowledgeGapAgent
103
- participant ToolSelector
104
- participant ToolExecutor
105
- participant JudgeHandler
106
- participant WriterAgent
107
-
108
- IterativeFlow->>IterativeFlow: run(query)
109
-
110
- loop Until complete or max_iterations
111
- IterativeFlow->>ThinkingAgent: generate_observations()
112
- ThinkingAgent-->>IterativeFlow: observations
113
-
114
- IterativeFlow->>KnowledgeGapAgent: evaluate_gaps()
115
- KnowledgeGapAgent-->>IterativeFlow: KnowledgeGapOutput
116
-
117
- alt Research complete
118
- IterativeFlow->>WriterAgent: create_final_report()
119
- WriterAgent-->>IterativeFlow: final_report
120
- else Gaps remain
121
- IterativeFlow->>ToolSelector: select_agents(gap)
122
- ToolSelector-->>IterativeFlow: AgentSelectionPlan
123
-
124
- IterativeFlow->>ToolExecutor: execute_tool_tasks()
125
- ToolExecutor-->>IterativeFlow: ToolAgentOutput[]
126
-
127
- IterativeFlow->>JudgeHandler: assess_evidence()
128
- JudgeHandler-->>IterativeFlow: should_continue
129
- end
130
- end
131
-
132
-
133
- ### Deep Research
134
-
135
- sequenceDiagram
136
- actor User
137
- participant GraphOrchestrator
138
- participant InputParser
139
- participant GraphBuilder
140
- participant GraphExecutor
141
- participant Agent
142
- participant BudgetTracker
143
- participant WorkflowState
144
-
145
- User->>GraphOrchestrator: run(query)
146
- GraphOrchestrator->>InputParser: detect_research_mode(query)
147
- InputParser-->>GraphOrchestrator: mode (iterative/deep)
148
- GraphOrchestrator->>GraphBuilder: build_graph(mode)
149
- GraphBuilder-->>GraphOrchestrator: ResearchGraph
150
- GraphOrchestrator->>WorkflowState: init_workflow_state()
151
- GraphOrchestrator->>BudgetTracker: create_budget()
152
- GraphOrchestrator->>GraphExecutor: _execute_graph(graph)
153
-
154
- loop For each node in graph
155
- GraphExecutor->>Agent: execute_node(agent_node)
156
- Agent->>Agent: process_input
157
- Agent-->>GraphExecutor: result
158
- GraphExecutor->>WorkflowState: update_state(result)
159
- GraphExecutor->>BudgetTracker: add_tokens(used)
160
- GraphExecutor->>BudgetTracker: check_budget()
161
- alt Budget exceeded
162
- GraphExecutor->>GraphOrchestrator: emit(error_event)
163
- else Continue
164
- GraphExecutor->>GraphOrchestrator: emit(progress_event)
165
- end
166
- end
167
-
168
- GraphOrchestrator->>User: AsyncGenerator[AgentEvent]
169
-
170
- ### Research Team
171
-
172
- Critical Deep Research Agent
173
-
174
- ## Development
175
-
176
- ### Run Tests
177
-
178
- ```bash
179
- uv run pytest
180
- ```
181
-
182
- ### Run Checks
183
-
184
- ```bash
185
- make check
186
- ```
187
-
188
- ## Join Us
189
-
190
- - The-Obstacle-Is-The-Way
191
  - MarioAderman
192
  - Josephrp
193
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
194
  ## Links
195
 
196
- - [GitHub Repository](https://github.com/The-Obstacle-Is-The-Way/DeepCritical-1)
 
 
 
 
 
1
  ---
2
+ title: Critical Deep Resarch
3
+ emoji: 🐉
4
+ colorFrom: red
5
+ colorTo: yellow
6
  sdk: gradio
7
  sdk_version: "6.0.1"
8
  python_version: "3.11"
9
  app_file: src/app.py
10
+ hf_oauth: true
11
+ hf_oauth_expiration_minutes: 480
12
+ hf_oauth_scopes:
13
+ - inference-api
14
+ pinned: true
15
  license: mit
16
  tags:
17
  - mcp-in-action-track-enterprise
 
23
  - modal
24
  ---
25
 
26
+ > [!IMPORTANT]
27
+ > **You are reading the Gradio Demo README!**
28
+ >
29
+ > - 📚 **Documentation**: See our [technical documentation](docs/index.md) for detailed information
30
+ > - 📖 **Complete README**: Check out the [full README](.github/README.md) for setup, configuration, and contribution guidelines
31
+ > - 🏆 **Hackathon Submission**: Keep reading below for more information about our MCP Hackathon submission
32
+
33
+ <div align="center">
34
+
35
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
36
+ [![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
37
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
38
+ [![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
39
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
40
+
41
+
42
+ </div>
43
+
44
  # DeepCritical
45
 
46
+ ## About
47
+
48
+ The [Deep Critical Gradio Hackathon Team](### Team) met online in the Alzheimer's Critical Literature Review Group in the Hugging Science initiative. We're building the agent framework we want to use for ai assisted research to [turn the vast amounts of clinical data into cures](https://github.com/DeepCritical/GradioDemo).
49
+
50
+ For this hackathon we're proposing a simple yet powerful Deep Research Agent that iteratively looks for the answer until it finds it using general purpose websearch and special purpose retrievers for technical retrievers.
51
+
52
+ ## Deep Critical In the Medial
53
+
54
+ - Social Medial Posts about Deep Critical :
55
+ -
56
+ -
57
+ -
58
+ -
59
+ -
60
+ -
61
+ -
62
+
63
+ ## Important information
64
+
65
+ - **[readme](.github\README.md)**: configure, deploy , contribute and learn more here.
66
+ - **[docs]**: want to know how all this works ? read our detailed technical documentation here.
67
+ - **[demo](https://huggingface/spaces/DataQuests/DeepCritical)**: Try our demo on huggingface
68
+ - **[team](### Team)**: Join us , or follow us !
69
+ - **[video]**: See our demo video
70
+
71
+ ## Future Developments
72
+
73
+ - [] Apply Deep Research Systems To Generate Short Form Video (up to 5 minutes)
74
+ - [] Visualize Pydantic Graphs as Loading Screens in the UI
75
+ - [] Improve Data Science with more Complex Graph Agents
76
+ - [] Create Deep Critical Drug Reporposing / Discovery Demo
77
+ - [] Create Deep Critical Literal Review
78
+ - [] Create Deep Critical Hypothesis Generator
79
+
80
+ ## Completed
81
+
82
+ - [] **Multi-Source Search**: PubMed, ClinicalTrials.gov, bioRxiv/medRxiv
83
+ - [] **MCP Integration**: Use our tools from Claude Desktop or any MCP client
84
+ - [] **HuggingFace OAuth**: Sign in with HuggingFace
85
+ - [] **Modal Sandbox**: Secure execution of AI-generated statistical code
86
+ - [] **LlamaIndex RAG**: Semantic search and evidence synthesis
87
+ - [] **HuggingfaceInference**:
88
+ - [] **HuggingfaceMCP Custom Config To Use Community Tools**:
89
+ - [] **Strongly Typed Composable Graphs**:
90
+ - [] **Specialized Research Teams of Agents**:
91
+
92
+
93
+
94
+ ### Team
95
+
96
+ - ZJ
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
  - MarioAderman
98
  - Josephrp
99
 
100
+
101
+ ## Acknowledgements
102
+
103
+ - McSwaggins
104
+ - Magentic
105
+ - Huggingface
106
+ - Gradio
107
+ - DeepCritical
108
+ - Sponsors
109
+ - Microsoft
110
+ - Pydantic
111
+ - Llama-index
112
+ - Anthhropic/MCP
113
+ - List of Tools Makers
114
+
115
+
116
  ## Links
117
 
118
+ [![GitHub](https://img.shields.io/github/stars/DeepCritical/GradioDemo?style=for-the-badge&logo=github&logoColor=white&label=🐙%20GitHub&labelColor=181717&color=181717)](https://github.com/DeepCritical/GradioDemo)
119
+ [![Documentation](https://img.shields.io/badge/📚%20Docs-0080FF?style=for-the-badge&logo=readthedocs&logoColor=white&labelColor=0080FF&color=0080FF)](docs/index.md)
120
+ [![Demo](https://img.shields.io/badge/🚀%20Demo-FFD21E?style=for-the-badge&logo=huggingface&logoColor=white&labelColor=FFD21E&color=FFD21E)](https://huggingface.co/spaces/DataQuests/DeepCritical)
121
+ [![CodeCov](https://img.shields.io/badge/📊%20Coverage-F01F7A?style=for-the-badge&logo=codecov&logoColor=white&labelColor=F01F7A&color=F01F7A)](https://codecov.io/gh/DeepCritical/GradioDemo)
122
+ [![Join us on Discord](https://img.shields.io/discord/1109943800132010065?label=Discord&logo=discord&style=flat-square)](https://discord.gg/qdfnvSPcqP)
.cursorrules → dev/.cursorrules RENAMED
@@ -238,3 +238,4 @@
238
 
239
 
240
 
 
 
238
 
239
 
240
 
241
+
AGENTS.txt → dev/AGENTS.txt RENAMED
File without changes
dev/Makefile ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .PHONY: install test lint format typecheck check clean all cov cov-html
2
+
3
+ # Default target
4
+ all: check
5
+
6
+ install:
7
+ uv sync --all-extras
8
+ uv run pre-commit install
9
+
10
+ test:
11
+ uv run pytest tests/unit/ -v -m "not openai" -p no:logfire
12
+
13
+ test-hf:
14
+ uv run pytest tests/ -v -m "huggingface" -p no:logfire
15
+
16
+ test-all:
17
+ uv run pytest tests/ -v -p no:logfire
18
+
19
+ # Coverage aliases
20
+ cov: test-cov
21
+ test-cov:
22
+ uv run pytest --cov=src --cov-report=term-missing -m "not openai" -p no:logfire
23
+
24
+ cov-html:
25
+ uv run pytest --cov=src --cov-report=html -p no:logfire
26
+ @echo "Coverage report: open htmlcov/index.html"
27
+
28
+ lint:
29
+ uv run ruff check src tests
30
+
31
+ format:
32
+ uv run ruff format src tests
33
+
34
+ typecheck:
35
+ uv run mypy src
36
+
37
+ check: lint typecheck test-cov
38
+ @echo "All checks passed!"
39
+
40
+ docs-build:
41
+ uv run mkdocs build
42
+
43
+ docs-serve:
44
+ uv run mkdocs serve
45
+
46
+ docs-clean:
47
+ rm -rf site/
48
+
49
+ clean:
50
+ rm -rf .pytest_cache .mypy_cache .ruff_cache __pycache__ .coverage htmlcov
51
+ find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
dev/docs_plugins.py ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Custom MkDocs extension to handle code anchor format: ```start:end:filepath"""
2
+
3
+ import re
4
+ from pathlib import Path
5
+
6
+ from markdown import Markdown
7
+ from markdown.extensions import Extension
8
+ from markdown.preprocessors import Preprocessor
9
+
10
+
11
+ class CodeAnchorPreprocessor(Preprocessor):
12
+ """Preprocess code blocks with anchor format: ```start:end:filepath"""
13
+
14
+ def __init__(self, md: Markdown, base_path: Path):
15
+ super().__init__(md)
16
+ self.base_path = base_path
17
+ self.pattern = re.compile(r"^```(\d+):(\d+):([^\n]+)\n(.*?)```$", re.MULTILINE | re.DOTALL)
18
+
19
+ def run(self, lines: list[str]) -> list[str]:
20
+ """Process lines and convert code anchor format to standard code blocks."""
21
+ text = "\n".join(lines)
22
+ new_text = self.pattern.sub(self._replace_code_anchor, text)
23
+ return new_text.split("\n")
24
+
25
+ def _replace_code_anchor(self, match) -> str:
26
+ """Replace code anchor format with standard code block + link."""
27
+ start_line = int(match.group(1))
28
+ end_line = int(match.group(2))
29
+ file_path = match.group(3).strip()
30
+ existing_code = match.group(4)
31
+
32
+ # Determine language from file extension
33
+ ext = Path(file_path).suffix.lower()
34
+ lang_map = {
35
+ ".py": "python",
36
+ ".js": "javascript",
37
+ ".ts": "typescript",
38
+ ".md": "markdown",
39
+ ".yaml": "yaml",
40
+ ".yml": "yaml",
41
+ ".toml": "toml",
42
+ ".json": "json",
43
+ ".html": "html",
44
+ ".css": "css",
45
+ ".sh": "bash",
46
+ }
47
+ language = lang_map.get(ext, "python")
48
+
49
+ # Generate GitHub link
50
+ repo_url = "https://github.com/DeepCritical/GradioDemo"
51
+ github_link = f"{repo_url}/blob/main/{file_path}#L{start_line}-L{end_line}"
52
+
53
+ # Return standard code block with source link
54
+ return (
55
+ f'[View source: `{file_path}` (lines {start_line}-{end_line})]({github_link}){{: target="_blank" }}\n\n'
56
+ f"```{language}\n{existing_code}\n```"
57
+ )
58
+
59
+
60
+ class CodeAnchorExtension(Extension):
61
+ """Markdown extension for code anchors."""
62
+
63
+ def __init__(self, base_path: str = ".", **kwargs):
64
+ super().__init__(**kwargs)
65
+ self.base_path = Path(base_path)
66
+
67
+ def extendMarkdown(self, md: Markdown): # noqa: N802
68
+ """Register the preprocessor."""
69
+ md.preprocessors.register(CodeAnchorPreprocessor(md, self.base_path), "codeanchor", 25)
70
+
71
+
72
+ def makeExtension(**kwargs): # noqa: N802
73
+ """Create the extension."""
74
+ return CodeAnchorExtension(**kwargs)
docs/CONFIGURATION.md DELETED
@@ -1,301 +0,0 @@
1
- # Configuration Guide
2
-
3
- ## Overview
4
-
5
- DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
6
-
7
- ## Quick Start
8
-
9
- 1. Copy the example environment file (if available) or create a `.env` file in the project root
10
- 2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
11
- 3. Optionally configure other services as needed
12
-
13
- ## Configuration System
14
-
15
- ### How It Works
16
-
17
- - **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
18
- - **Environment File**: Automatically loads from `.env` file (if present)
19
- - **Environment Variables**: Reads from environment variables (case-insensitive)
20
- - **Type Safety**: Strongly-typed fields with validation
21
- - **Singleton Pattern**: Global `settings` instance for easy access
22
-
23
- ### Usage
24
-
25
- ```python
26
- from src.utils.config import settings
27
-
28
- # Check if API keys are available
29
- if settings.has_openai_key:
30
- # Use OpenAI
31
- pass
32
-
33
- # Access configuration values
34
- max_iterations = settings.max_iterations
35
- web_search_provider = settings.web_search_provider
36
- ```
37
-
38
- ## Required Configuration
39
-
40
- ### At Least One LLM Provider
41
-
42
- You must configure at least one LLM provider:
43
-
44
- **OpenAI:**
45
- ```bash
46
- LLM_PROVIDER=openai
47
- OPENAI_API_KEY=your_openai_api_key_here
48
- OPENAI_MODEL=gpt-5.1
49
- ```
50
-
51
- **Anthropic:**
52
- ```bash
53
- LLM_PROVIDER=anthropic
54
- ANTHROPIC_API_KEY=your_anthropic_api_key_here
55
- ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
56
- ```
57
-
58
- ## Optional Configuration
59
-
60
- ### Embedding Configuration
61
-
62
- ```bash
63
- # Embedding Provider: "openai", "local", or "huggingface"
64
- EMBEDDING_PROVIDER=local
65
-
66
- # OpenAI Embedding Model (used by LlamaIndex RAG)
67
- OPENAI_EMBEDDING_MODEL=text-embedding-3-small
68
-
69
- # Local Embedding Model (sentence-transformers)
70
- LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
71
-
72
- # HuggingFace Embedding Model
73
- HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
74
- ```
75
-
76
- ### HuggingFace Configuration
77
-
78
- ```bash
79
- # HuggingFace API Token (for inference API)
80
- HUGGINGFACE_API_KEY=your_huggingface_api_key_here
81
- # Or use HF_TOKEN (alternative name)
82
-
83
- # Default HuggingFace Model ID
84
- HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
85
- ```
86
-
87
- ### Web Search Configuration
88
-
89
- ```bash
90
- # Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
91
- # Default: "duckduckgo" (no API key required)
92
- WEB_SEARCH_PROVIDER=duckduckgo
93
-
94
- # Serper API Key (for Google search via Serper)
95
- SERPER_API_KEY=your_serper_api_key_here
96
-
97
- # SearchXNG Host URL
98
- SEARCHXNG_HOST=http://localhost:8080
99
-
100
- # Brave Search API Key
101
- BRAVE_API_KEY=your_brave_api_key_here
102
-
103
- # Tavily API Key
104
- TAVILY_API_KEY=your_tavily_api_key_here
105
- ```
106
-
107
- ### PubMed Configuration
108
-
109
- ```bash
110
- # NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
111
- NCBI_API_KEY=your_ncbi_api_key_here
112
- ```
113
-
114
- ### Agent Configuration
115
-
116
- ```bash
117
- # Maximum iterations per research loop
118
- MAX_ITERATIONS=10
119
-
120
- # Search timeout in seconds
121
- SEARCH_TIMEOUT=30
122
-
123
- # Use graph-based execution for research flows
124
- USE_GRAPH_EXECUTION=false
125
- ```
126
-
127
- ### Budget & Rate Limiting Configuration
128
-
129
- ```bash
130
- # Default token budget per research loop
131
- DEFAULT_TOKEN_LIMIT=100000
132
-
133
- # Default time limit per research loop (minutes)
134
- DEFAULT_TIME_LIMIT_MINUTES=10
135
-
136
- # Default iterations limit per research loop
137
- DEFAULT_ITERATIONS_LIMIT=10
138
- ```
139
-
140
- ### RAG Service Configuration
141
-
142
- ```bash
143
- # ChromaDB collection name for RAG
144
- RAG_COLLECTION_NAME=deepcritical_evidence
145
-
146
- # Number of top results to retrieve from RAG
147
- RAG_SIMILARITY_TOP_K=5
148
-
149
- # Automatically ingest evidence into RAG
150
- RAG_AUTO_INGEST=true
151
- ```
152
-
153
- ### ChromaDB Configuration
154
-
155
- ```bash
156
- # ChromaDB storage path
157
- CHROMA_DB_PATH=./chroma_db
158
-
159
- # Whether to persist ChromaDB to disk
160
- CHROMA_DB_PERSIST=true
161
-
162
- # ChromaDB server host (for remote ChromaDB, optional)
163
- # CHROMA_DB_HOST=localhost
164
-
165
- # ChromaDB server port (for remote ChromaDB, optional)
166
- # CHROMA_DB_PORT=8000
167
- ```
168
-
169
- ### External Services
170
-
171
- ```bash
172
- # Modal Token ID (for Modal sandbox execution)
173
- MODAL_TOKEN_ID=your_modal_token_id_here
174
-
175
- # Modal Token Secret
176
- MODAL_TOKEN_SECRET=your_modal_token_secret_here
177
- ```
178
-
179
- ### Logging Configuration
180
-
181
- ```bash
182
- # Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
183
- LOG_LEVEL=INFO
184
- ```
185
-
186
- ## Configuration Properties
187
-
188
- The `Settings` class provides helpful properties for checking configuration:
189
-
190
- ```python
191
- from src.utils.config import settings
192
-
193
- # Check API key availability
194
- settings.has_openai_key # bool
195
- settings.has_anthropic_key # bool
196
- settings.has_huggingface_key # bool
197
- settings.has_any_llm_key # bool
198
-
199
- # Check service availability
200
- settings.modal_available # bool
201
- settings.web_search_available # bool
202
- ```
203
-
204
- ## Environment Variables Reference
205
-
206
- ### Required (at least one LLM)
207
- - `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
208
-
209
- ### Optional LLM Providers
210
- - `DEEPSEEK_API_KEY` (Phase 2)
211
- - `OPENROUTER_API_KEY` (Phase 2)
212
- - `GEMINI_API_KEY` (Phase 2)
213
- - `PERPLEXITY_API_KEY` (Phase 2)
214
- - `HUGGINGFACE_API_KEY` or `HF_TOKEN`
215
- - `AZURE_OPENAI_ENDPOINT` (Phase 2)
216
- - `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
217
- - `AZURE_OPENAI_API_KEY` (Phase 2)
218
- - `AZURE_OPENAI_API_VERSION` (Phase 2)
219
- - `LOCAL_MODEL_URL` (Phase 2)
220
-
221
- ### Web Search
222
- - `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
223
- - `SERPER_API_KEY`
224
- - `SEARCHXNG_HOST`
225
- - `BRAVE_API_KEY`
226
- - `TAVILY_API_KEY`
227
-
228
- ### Embeddings
229
- - `EMBEDDING_PROVIDER` (default: "local")
230
- - `HUGGINGFACE_EMBEDDING_MODEL` (optional)
231
-
232
- ### RAG
233
- - `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
234
- - `RAG_SIMILARITY_TOP_K` (default: 5)
235
- - `RAG_AUTO_INGEST` (default: true)
236
-
237
- ### ChromaDB
238
- - `CHROMA_DB_PATH` (default: "./chroma_db")
239
- - `CHROMA_DB_PERSIST` (default: true)
240
- - `CHROMA_DB_HOST` (optional)
241
- - `CHROMA_DB_PORT` (optional)
242
-
243
- ### Budget
244
- - `DEFAULT_TOKEN_LIMIT` (default: 100000)
245
- - `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
246
- - `DEFAULT_ITERATIONS_LIMIT` (default: 10)
247
-
248
- ### Other
249
- - `LLM_PROVIDER` (default: "openai")
250
- - `NCBI_API_KEY` (optional)
251
- - `MODAL_TOKEN_ID` (optional)
252
- - `MODAL_TOKEN_SECRET` (optional)
253
- - `MAX_ITERATIONS` (default: 10)
254
- - `LOG_LEVEL` (default: "INFO")
255
- - `USE_GRAPH_EXECUTION` (default: false)
256
-
257
- ## Validation
258
-
259
- Settings are validated on load using Pydantic validation:
260
-
261
- - **Type checking**: All fields are strongly typed
262
- - **Range validation**: Numeric fields have min/max constraints
263
- - **Literal validation**: Enum fields only accept specific values
264
- - **Required fields**: API keys are checked when accessed via `get_api_key()`
265
-
266
- ## Error Handling
267
-
268
- Configuration errors raise `ConfigurationError`:
269
-
270
- ```python
271
- from src.utils.config import settings
272
- from src.utils.exceptions import ConfigurationError
273
-
274
- try:
275
- api_key = settings.get_api_key()
276
- except ConfigurationError as e:
277
- print(f"Configuration error: {e}")
278
- ```
279
-
280
- ## Future Enhancements (Phase 2)
281
-
282
- The following configurations are planned for Phase 2:
283
-
284
- 1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
285
- 2. **Model Selection**: Reasoning/main/fast model configuration
286
- 3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
287
-
288
- See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.
289
-
290
-
291
-
292
-
293
-
294
-
295
-
296
-
297
-
298
-
299
-
300
-
301
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/api/agents.md ADDED
@@ -0,0 +1,260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agents API Reference
2
+
3
+ This page documents the API for DeepCritical agents.
4
+
5
+ ## KnowledgeGapAgent
6
+
7
+ **Module**: `src.agents.knowledge_gap`
8
+
9
+ **Purpose**: Evaluates research state and identifies knowledge gaps.
10
+
11
+ ### Methods
12
+
13
+ #### `evaluate`
14
+
15
+ ```python
16
+ async def evaluate(
17
+ self,
18
+ query: str,
19
+ background_context: str,
20
+ conversation_history: Conversation,
21
+ iteration: int,
22
+ time_elapsed_minutes: float,
23
+ max_time_minutes: float
24
+ ) -> KnowledgeGapOutput
25
+ ```
26
+
27
+ Evaluates research completeness and identifies outstanding knowledge gaps.
28
+
29
+ **Parameters**:
30
+ - `query`: Research query string
31
+ - `background_context`: Background context for the query
32
+ - `conversation_history`: Conversation history with previous iterations
33
+ - `iteration`: Current iteration number
34
+ - `time_elapsed_minutes`: Elapsed time in minutes
35
+ - `max_time_minutes`: Maximum time limit in minutes
36
+
37
+ **Returns**: `KnowledgeGapOutput` with:
38
+ - `research_complete`: Boolean indicating if research is complete
39
+ - `outstanding_gaps`: List of remaining knowledge gaps
40
+
41
+ ## ToolSelectorAgent
42
+
43
+ **Module**: `src.agents.tool_selector`
44
+
45
+ **Purpose**: Selects appropriate tools for addressing knowledge gaps.
46
+
47
+ ### Methods
48
+
49
+ #### `select_tools`
50
+
51
+ ```python
52
+ async def select_tools(
53
+ self,
54
+ query: str,
55
+ knowledge_gaps: list[str],
56
+ available_tools: list[str]
57
+ ) -> AgentSelectionPlan
58
+ ```
59
+
60
+ Selects tools for addressing knowledge gaps.
61
+
62
+ **Parameters**:
63
+ - `query`: Research query string
64
+ - `knowledge_gaps`: List of knowledge gaps to address
65
+ - `available_tools`: List of available tool names
66
+
67
+ **Returns**: `AgentSelectionPlan` with list of `AgentTask` objects.
68
+
69
+ ## WriterAgent
70
+
71
+ **Module**: `src.agents.writer`
72
+
73
+ **Purpose**: Generates final reports from research findings.
74
+
75
+ ### Methods
76
+
77
+ #### `write_report`
78
+
79
+ ```python
80
+ async def write_report(
81
+ self,
82
+ query: str,
83
+ findings: str,
84
+ output_length: str = "medium",
85
+ output_instructions: str | None = None
86
+ ) -> str
87
+ ```
88
+
89
+ Generates a markdown report from research findings.
90
+
91
+ **Parameters**:
92
+ - `query`: Research query string
93
+ - `findings`: Research findings to include in report
94
+ - `output_length`: Desired output length ("short", "medium", "long")
95
+ - `output_instructions`: Additional instructions for report generation
96
+
97
+ **Returns**: Markdown string with numbered citations.
98
+
99
+ ## LongWriterAgent
100
+
101
+ **Module**: `src.agents.long_writer`
102
+
103
+ **Purpose**: Long-form report generation with section-by-section writing.
104
+
105
+ ### Methods
106
+
107
+ #### `write_next_section`
108
+
109
+ ```python
110
+ async def write_next_section(
111
+ self,
112
+ query: str,
113
+ draft: ReportDraft,
114
+ section_title: str,
115
+ section_content: str
116
+ ) -> LongWriterOutput
117
+ ```
118
+
119
+ Writes the next section of a long-form report.
120
+
121
+ **Parameters**:
122
+ - `query`: Research query string
123
+ - `draft`: Current report draft
124
+ - `section_title`: Title of the section to write
125
+ - `section_content`: Content/guidance for the section
126
+
127
+ **Returns**: `LongWriterOutput` with updated draft.
128
+
129
+ #### `write_report`
130
+
131
+ ```python
132
+ async def write_report(
133
+ self,
134
+ query: str,
135
+ report_title: str,
136
+ report_draft: ReportDraft
137
+ ) -> str
138
+ ```
139
+
140
+ Generates final report from draft.
141
+
142
+ **Parameters**:
143
+ - `query`: Research query string
144
+ - `report_title`: Title of the report
145
+ - `report_draft`: Complete report draft
146
+
147
+ **Returns**: Final markdown report string.
148
+
149
+ ## ProofreaderAgent
150
+
151
+ **Module**: `src.agents.proofreader`
152
+
153
+ **Purpose**: Proofreads and polishes report drafts.
154
+
155
+ ### Methods
156
+
157
+ #### `proofread`
158
+
159
+ ```python
160
+ async def proofread(
161
+ self,
162
+ query: str,
163
+ report_title: str,
164
+ report_draft: ReportDraft
165
+ ) -> str
166
+ ```
167
+
168
+ Proofreads and polishes a report draft.
169
+
170
+ **Parameters**:
171
+ - `query`: Research query string
172
+ - `report_title`: Title of the report
173
+ - `report_draft`: Report draft to proofread
174
+
175
+ **Returns**: Polished markdown string.
176
+
177
+ ## ThinkingAgent
178
+
179
+ **Module**: `src.agents.thinking`
180
+
181
+ **Purpose**: Generates observations from conversation history.
182
+
183
+ ### Methods
184
+
185
+ #### `generate_observations`
186
+
187
+ ```python
188
+ async def generate_observations(
189
+ self,
190
+ query: str,
191
+ background_context: str,
192
+ conversation_history: Conversation
193
+ ) -> str
194
+ ```
195
+
196
+ Generates observations from conversation history.
197
+
198
+ **Parameters**:
199
+ - `query`: Research query string
200
+ - `background_context`: Background context
201
+ - `conversation_history`: Conversation history
202
+
203
+ **Returns**: Observation string.
204
+
205
+ ## InputParserAgent
206
+
207
+ **Module**: `src.agents.input_parser`
208
+
209
+ **Purpose**: Parses and improves user queries, detects research mode.
210
+
211
+ ### Methods
212
+
213
+ #### `parse_query`
214
+
215
+ ```python
216
+ async def parse_query(
217
+ self,
218
+ query: str
219
+ ) -> ParsedQuery
220
+ ```
221
+
222
+ Parses and improves a user query.
223
+
224
+ **Parameters**:
225
+ - `query`: Original query string
226
+
227
+ **Returns**: `ParsedQuery` with:
228
+ - `original_query`: Original query string
229
+ - `improved_query`: Refined query string
230
+ - `research_mode`: "iterative" or "deep"
231
+ - `key_entities`: List of key entities
232
+ - `research_questions`: List of research questions
233
+
234
+ ## Factory Functions
235
+
236
+ All agents have factory functions in `src.agent_factory.agents`:
237
+
238
+ ```python
239
+ def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
240
+ def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
241
+ def create_writer_agent(model: Any | None = None) -> WriterAgent
242
+ def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent
243
+ def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent
244
+ def create_thinking_agent(model: Any | None = None) -> ThinkingAgent
245
+ def create_input_parser_agent(model: Any | None = None) -> InputParserAgent
246
+ ```
247
+
248
+ **Parameters**:
249
+ - `model`: Optional Pydantic AI model. If None, uses `get_model()` from settings.
250
+
251
+ **Returns**: Agent instance.
252
+
253
+ ## See Also
254
+
255
+ - [Architecture - Agents](../architecture/agents.md) - Architecture overview
256
+ - [Models API](models.md) - Data models used by agents
257
+
258
+
259
+
260
+
docs/api/models.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Models API Reference
2
+
3
+ This page documents the Pydantic models used throughout DeepCritical.
4
+
5
+ ## Evidence
6
+
7
+ **Module**: `src.utils.models`
8
+
9
+ **Purpose**: Represents evidence from search results.
10
+
11
+ ```python
12
+ class Evidence(BaseModel):
13
+ citation: Citation
14
+ content: str
15
+ relevance_score: float = Field(ge=0.0, le=1.0)
16
+ metadata: dict[str, Any] = Field(default_factory=dict)
17
+ ```
18
+
19
+ **Fields**:
20
+ - `citation`: Citation information (title, URL, date, authors)
21
+ - `content`: Evidence text content
22
+ - `relevance_score`: Relevance score (0.0-1.0)
23
+ - `metadata`: Additional metadata dictionary
24
+
25
+ ## Citation
26
+
27
+ **Module**: `src.utils.models`
28
+
29
+ **Purpose**: Citation information for evidence.
30
+
31
+ ```python
32
+ class Citation(BaseModel):
33
+ title: str
34
+ url: str
35
+ date: str | None = None
36
+ authors: list[str] = Field(default_factory=list)
37
+ ```
38
+
39
+ **Fields**:
40
+ - `title`: Article/trial title
41
+ - `url`: Source URL
42
+ - `date`: Publication date (optional)
43
+ - `authors`: List of authors (optional)
44
+
45
+ ## KnowledgeGapOutput
46
+
47
+ **Module**: `src.utils.models`
48
+
49
+ **Purpose**: Output from knowledge gap evaluation.
50
+
51
+ ```python
52
+ class KnowledgeGapOutput(BaseModel):
53
+ research_complete: bool
54
+ outstanding_gaps: list[str] = Field(default_factory=list)
55
+ ```
56
+
57
+ **Fields**:
58
+ - `research_complete`: Boolean indicating if research is complete
59
+ - `outstanding_gaps`: List of remaining knowledge gaps
60
+
61
+ ## AgentSelectionPlan
62
+
63
+ **Module**: `src.utils.models`
64
+
65
+ **Purpose**: Plan for tool/agent selection.
66
+
67
+ ```python
68
+ class AgentSelectionPlan(BaseModel):
69
+ tasks: list[AgentTask] = Field(default_factory=list)
70
+ ```
71
+
72
+ **Fields**:
73
+ - `tasks`: List of agent tasks to execute
74
+
75
+ ## AgentTask
76
+
77
+ **Module**: `src.utils.models`
78
+
79
+ **Purpose**: Individual agent task.
80
+
81
+ ```python
82
+ class AgentTask(BaseModel):
83
+ agent_name: str
84
+ query: str
85
+ context: dict[str, Any] = Field(default_factory=dict)
86
+ ```
87
+
88
+ **Fields**:
89
+ - `agent_name`: Name of agent to use
90
+ - `query`: Task query
91
+ - `context`: Additional context dictionary
92
+
93
+ ## ReportDraft
94
+
95
+ **Module**: `src.utils.models`
96
+
97
+ **Purpose**: Draft structure for long-form reports.
98
+
99
+ ```python
100
+ class ReportDraft(BaseModel):
101
+ title: str
102
+ sections: list[ReportSection] = Field(default_factory=list)
103
+ references: list[Citation] = Field(default_factory=list)
104
+ ```
105
+
106
+ **Fields**:
107
+ - `title`: Report title
108
+ - `sections`: List of report sections
109
+ - `references`: List of citations
110
+
111
+ ## ReportSection
112
+
113
+ **Module**: `src.utils.models`
114
+
115
+ **Purpose**: Individual section in a report draft.
116
+
117
+ ```python
118
+ class ReportSection(BaseModel):
119
+ title: str
120
+ content: str
121
+ order: int
122
+ ```
123
+
124
+ **Fields**:
125
+ - `title`: Section title
126
+ - `content`: Section content
127
+ - `order`: Section order number
128
+
129
+ ## ParsedQuery
130
+
131
+ **Module**: `src.utils.models`
132
+
133
+ **Purpose**: Parsed and improved query.
134
+
135
+ ```python
136
+ class ParsedQuery(BaseModel):
137
+ original_query: str
138
+ improved_query: str
139
+ research_mode: Literal["iterative", "deep"]
140
+ key_entities: list[str] = Field(default_factory=list)
141
+ research_questions: list[str] = Field(default_factory=list)
142
+ ```
143
+
144
+ **Fields**:
145
+ - `original_query`: Original query string
146
+ - `improved_query`: Refined query string
147
+ - `research_mode`: Research mode ("iterative" or "deep")
148
+ - `key_entities`: List of key entities
149
+ - `research_questions`: List of research questions
150
+
151
+ ## Conversation
152
+
153
+ **Module**: `src.utils.models`
154
+
155
+ **Purpose**: Conversation history with iterations.
156
+
157
+ ```python
158
+ class Conversation(BaseModel):
159
+ iterations: list[IterationData] = Field(default_factory=list)
160
+ ```
161
+
162
+ **Fields**:
163
+ - `iterations`: List of iteration data
164
+
165
+ ## IterationData
166
+
167
+ **Module**: `src.utils.models`
168
+
169
+ **Purpose**: Data for a single iteration.
170
+
171
+ ```python
172
+ class IterationData(BaseModel):
173
+ iteration: int
174
+ observations: str | None = None
175
+ knowledge_gaps: list[str] = Field(default_factory=list)
176
+ tool_calls: list[dict[str, Any]] = Field(default_factory=list)
177
+ findings: str | None = None
178
+ thoughts: str | None = None
179
+ ```
180
+
181
+ **Fields**:
182
+ - `iteration`: Iteration number
183
+ - `observations`: Generated observations
184
+ - `knowledge_gaps`: Identified knowledge gaps
185
+ - `tool_calls`: Tool calls made
186
+ - `findings`: Findings from tools
187
+ - `thoughts`: Agent thoughts
188
+
189
+ ## AgentEvent
190
+
191
+ **Module**: `src.utils.models`
192
+
193
+ **Purpose**: Event emitted during research execution.
194
+
195
+ ```python
196
+ class AgentEvent(BaseModel):
197
+ type: str
198
+ iteration: int | None = None
199
+ data: dict[str, Any] = Field(default_factory=dict)
200
+ ```
201
+
202
+ **Fields**:
203
+ - `type`: Event type (e.g., "started", "search_complete", "complete")
204
+ - `iteration`: Iteration number (optional)
205
+ - `data`: Event data dictionary
206
+
207
+ ## BudgetStatus
208
+
209
+ **Module**: `src.utils.models`
210
+
211
+ **Purpose**: Current budget status.
212
+
213
+ ```python
214
+ class BudgetStatus(BaseModel):
215
+ tokens_used: int
216
+ tokens_limit: int
217
+ time_elapsed_seconds: float
218
+ time_limit_seconds: float
219
+ iterations: int
220
+ iterations_limit: int
221
+ ```
222
+
223
+ **Fields**:
224
+ - `tokens_used`: Tokens used so far
225
+ - `tokens_limit`: Token limit
226
+ - `time_elapsed_seconds`: Elapsed time in seconds
227
+ - `time_limit_seconds`: Time limit in seconds
228
+ - `iterations`: Current iteration count
229
+ - `iterations_limit`: Iteration limit
230
+
231
+ ## See Also
232
+
233
+ - [Architecture - Agents](../architecture/agents.md) - How models are used
234
+ - [Configuration](../configuration/index.md) - Model configuration
235
+
236
+
237
+
238
+
docs/api/orchestrators.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Orchestrators API Reference
2
+
3
+ This page documents the API for DeepCritical orchestrators.
4
+
5
+ ## IterativeResearchFlow
6
+
7
+ **Module**: `src.orchestrator.research_flow`
8
+
9
+ **Purpose**: Single-loop research with search-judge-synthesize cycles.
10
+
11
+ ### Methods
12
+
13
+ #### `run`
14
+
15
+ ```python
16
+ async def run(
17
+ self,
18
+ query: str,
19
+ background_context: str = "",
20
+ max_iterations: int | None = None,
21
+ max_time_minutes: float | None = None,
22
+ token_budget: int | None = None
23
+ ) -> AsyncGenerator[AgentEvent, None]
24
+ ```
25
+
26
+ Runs iterative research flow.
27
+
28
+ **Parameters**:
29
+ - `query`: Research query string
30
+ - `background_context`: Background context (default: "")
31
+ - `max_iterations`: Maximum iterations (default: from settings)
32
+ - `max_time_minutes`: Maximum time in minutes (default: from settings)
33
+ - `token_budget`: Token budget (default: from settings)
34
+
35
+ **Yields**: `AgentEvent` objects for:
36
+ - `started`: Research started
37
+ - `search_complete`: Search completed
38
+ - `judge_complete`: Evidence evaluation completed
39
+ - `synthesizing`: Generating report
40
+ - `complete`: Research completed
41
+ - `error`: Error occurred
42
+
43
+ ## DeepResearchFlow
44
+
45
+ **Module**: `src.orchestrator.research_flow`
46
+
47
+ **Purpose**: Multi-section parallel research with planning and synthesis.
48
+
49
+ ### Methods
50
+
51
+ #### `run`
52
+
53
+ ```python
54
+ async def run(
55
+ self,
56
+ query: str,
57
+ background_context: str = "",
58
+ max_iterations_per_section: int | None = None,
59
+ max_time_minutes: float | None = None,
60
+ token_budget: int | None = None
61
+ ) -> AsyncGenerator[AgentEvent, None]
62
+ ```
63
+
64
+ Runs deep research flow.
65
+
66
+ **Parameters**:
67
+ - `query`: Research query string
68
+ - `background_context`: Background context (default: "")
69
+ - `max_iterations_per_section`: Maximum iterations per section (default: from settings)
70
+ - `max_time_minutes`: Maximum time in minutes (default: from settings)
71
+ - `token_budget`: Token budget (default: from settings)
72
+
73
+ **Yields**: `AgentEvent` objects for:
74
+ - `started`: Research started
75
+ - `planning`: Creating research plan
76
+ - `looping`: Running parallel research loops
77
+ - `synthesizing`: Synthesizing results
78
+ - `complete`: Research completed
79
+ - `error`: Error occurred
80
+
81
+ ## GraphOrchestrator
82
+
83
+ **Module**: `src.orchestrator.graph_orchestrator`
84
+
85
+ **Purpose**: Graph-based execution using Pydantic AI agents as nodes.
86
+
87
+ ### Methods
88
+
89
+ #### `run`
90
+
91
+ ```python
92
+ async def run(
93
+ self,
94
+ query: str,
95
+ research_mode: str = "auto",
96
+ use_graph: bool = True
97
+ ) -> AsyncGenerator[AgentEvent, None]
98
+ ```
99
+
100
+ Runs graph-based research orchestration.
101
+
102
+ **Parameters**:
103
+ - `query`: Research query string
104
+ - `research_mode`: Research mode ("iterative", "deep", or "auto")
105
+ - `use_graph`: Whether to use graph execution (default: True)
106
+
107
+ **Yields**: `AgentEvent` objects during graph execution.
108
+
109
+ ## Orchestrator Factory
110
+
111
+ **Module**: `src.orchestrator_factory`
112
+
113
+ **Purpose**: Factory for creating orchestrators.
114
+
115
+ ### Functions
116
+
117
+ #### `create_orchestrator`
118
+
119
+ ```python
120
+ def create_orchestrator(
121
+ search_handler: SearchHandlerProtocol,
122
+ judge_handler: JudgeHandlerProtocol,
123
+ config: dict[str, Any],
124
+ mode: str | None = None
125
+ ) -> Any
126
+ ```
127
+
128
+ Creates an orchestrator instance.
129
+
130
+ **Parameters**:
131
+ - `search_handler`: Search handler protocol implementation
132
+ - `judge_handler`: Judge handler protocol implementation
133
+ - `config`: Configuration dictionary
134
+ - `mode`: Orchestrator mode ("simple", "advanced", "magentic", or None for auto-detect)
135
+
136
+ **Returns**: Orchestrator instance.
137
+
138
+ **Raises**:
139
+ - `ValueError`: If requirements not met
140
+
141
+ **Modes**:
142
+ - `"simple"`: Legacy orchestrator
143
+ - `"advanced"` or `"magentic"`: Magentic orchestrator (requires OpenAI API key)
144
+ - `None`: Auto-detect based on API key availability
145
+
146
+ ## MagenticOrchestrator
147
+
148
+ **Module**: `src.orchestrator_magentic`
149
+
150
+ **Purpose**: Multi-agent coordination using Microsoft Agent Framework.
151
+
152
+ ### Methods
153
+
154
+ #### `run`
155
+
156
+ ```python
157
+ async def run(
158
+ self,
159
+ query: str,
160
+ max_rounds: int = 15,
161
+ max_stalls: int = 3
162
+ ) -> AsyncGenerator[AgentEvent, None]
163
+ ```
164
+
165
+ Runs Magentic orchestration.
166
+
167
+ **Parameters**:
168
+ - `query`: Research query string
169
+ - `max_rounds`: Maximum rounds (default: 15)
170
+ - `max_stalls`: Maximum stalls before reset (default: 3)
171
+
172
+ **Yields**: `AgentEvent` objects converted from Magentic events.
173
+
174
+ **Requirements**:
175
+ - `agent-framework-core` package
176
+ - OpenAI API key
177
+
178
+ ## See Also
179
+
180
+ - [Architecture - Orchestrators](../architecture/orchestrators.md) - Architecture overview
181
+ - [Graph Orchestration](../architecture/graph-orchestration.md) - Graph execution details
182
+
183
+
184
+
185
+
docs/api/services.md ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Services API Reference
2
+
3
+ This page documents the API for DeepCritical services.
4
+
5
+ ## EmbeddingService
6
+
7
+ **Module**: `src.services.embeddings`
8
+
9
+ **Purpose**: Local sentence-transformers for semantic search and deduplication.
10
+
11
+ ### Methods
12
+
13
+ #### `embed`
14
+
15
+ ```python
16
+ async def embed(self, text: str) -> list[float]
17
+ ```
18
+
19
+ Generates embedding for a text string.
20
+
21
+ **Parameters**:
22
+ - `text`: Text to embed
23
+
24
+ **Returns**: Embedding vector as list of floats.
25
+
26
+ #### `embed_batch`
27
+
28
+ ```python
29
+ async def embed_batch(self, texts: list[str]) -> list[list[float]]
30
+ ```
31
+
32
+ Generates embeddings for multiple texts.
33
+
34
+ **Parameters**:
35
+ - `texts`: List of texts to embed
36
+
37
+ **Returns**: List of embedding vectors.
38
+
39
+ #### `similarity`
40
+
41
+ ```python
42
+ async def similarity(self, text1: str, text2: str) -> float
43
+ ```
44
+
45
+ Calculates similarity between two texts.
46
+
47
+ **Parameters**:
48
+ - `text1`: First text
49
+ - `text2`: Second text
50
+
51
+ **Returns**: Similarity score (0.0-1.0).
52
+
53
+ #### `find_duplicates`
54
+
55
+ ```python
56
+ async def find_duplicates(
57
+ self,
58
+ texts: list[str],
59
+ threshold: float = 0.85
60
+ ) -> list[tuple[int, int]]
61
+ ```
62
+
63
+ Finds duplicate texts based on similarity threshold.
64
+
65
+ **Parameters**:
66
+ - `texts`: List of texts to check
67
+ - `threshold`: Similarity threshold (default: 0.85)
68
+
69
+ **Returns**: List of (index1, index2) tuples for duplicate pairs.
70
+
71
+ ### Factory Function
72
+
73
+ #### `get_embedding_service`
74
+
75
+ ```python
76
+ @lru_cache(maxsize=1)
77
+ def get_embedding_service() -> EmbeddingService
78
+ ```
79
+
80
+ Returns singleton EmbeddingService instance.
81
+
82
+ ## LlamaIndexRAGService
83
+
84
+ **Module**: `src.services.rag`
85
+
86
+ **Purpose**: Retrieval-Augmented Generation using LlamaIndex.
87
+
88
+ ### Methods
89
+
90
+ #### `ingest_evidence`
91
+
92
+ ```python
93
+ async def ingest_evidence(self, evidence: list[Evidence]) -> None
94
+ ```
95
+
96
+ Ingests evidence into RAG service.
97
+
98
+ **Parameters**:
99
+ - `evidence`: List of Evidence objects to ingest
100
+
101
+ **Note**: Requires OpenAI API key for embeddings.
102
+
103
+ #### `retrieve`
104
+
105
+ ```python
106
+ async def retrieve(
107
+ self,
108
+ query: str,
109
+ top_k: int = 5
110
+ ) -> list[Document]
111
+ ```
112
+
113
+ Retrieves relevant documents for a query.
114
+
115
+ **Parameters**:
116
+ - `query`: Search query string
117
+ - `top_k`: Number of top results to return (default: 5)
118
+
119
+ **Returns**: List of Document objects with metadata.
120
+
121
+ #### `query`
122
+
123
+ ```python
124
+ async def query(
125
+ self,
126
+ query: str,
127
+ top_k: int = 5
128
+ ) -> str
129
+ ```
130
+
131
+ Queries RAG service and returns formatted results.
132
+
133
+ **Parameters**:
134
+ - `query`: Search query string
135
+ - `top_k`: Number of top results to return (default: 5)
136
+
137
+ **Returns**: Formatted query results as string.
138
+
139
+ ### Factory Function
140
+
141
+ #### `get_rag_service`
142
+
143
+ ```python
144
+ @lru_cache(maxsize=1)
145
+ def get_rag_service() -> LlamaIndexRAGService | None
146
+ ```
147
+
148
+ Returns singleton LlamaIndexRAGService instance, or None if OpenAI key not available.
149
+
150
+ ## StatisticalAnalyzer
151
+
152
+ **Module**: `src.services.statistical_analyzer`
153
+
154
+ **Purpose**: Secure execution of AI-generated statistical code.
155
+
156
+ ### Methods
157
+
158
+ #### `analyze`
159
+
160
+ ```python
161
+ async def analyze(
162
+ self,
163
+ hypothesis: str,
164
+ evidence: list[Evidence],
165
+ data_description: str | None = None
166
+ ) -> AnalysisResult
167
+ ```
168
+
169
+ Analyzes a hypothesis using statistical methods.
170
+
171
+ **Parameters**:
172
+ - `hypothesis`: Hypothesis to analyze
173
+ - `evidence`: List of Evidence objects
174
+ - `data_description`: Optional data description
175
+
176
+ **Returns**: `AnalysisResult` with:
177
+ - `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
178
+ - `code`: Generated analysis code
179
+ - `output`: Execution output
180
+ - `error`: Error message if execution failed
181
+
182
+ **Note**: Requires Modal credentials for sandbox execution.
183
+
184
+ ## See Also
185
+
186
+ - [Architecture - Services](../architecture/services.md) - Architecture overview
187
+ - [Configuration](../configuration/index.md) - Service configuration
188
+
189
+
190
+
191
+
docs/api/tools.md ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tools API Reference
2
+
3
+ This page documents the API for DeepCritical search tools.
4
+
5
+ ## SearchTool Protocol
6
+
7
+ All tools implement the `SearchTool` protocol:
8
+
9
+ ```python
10
+ class SearchTool(Protocol):
11
+ @property
12
+ def name(self) -> str: ...
13
+
14
+ async def search(
15
+ self,
16
+ query: str,
17
+ max_results: int = 10
18
+ ) -> list[Evidence]: ...
19
+ ```
20
+
21
+ ## PubMedTool
22
+
23
+ **Module**: `src.tools.pubmed`
24
+
25
+ **Purpose**: Search peer-reviewed biomedical literature from PubMed.
26
+
27
+ ### Properties
28
+
29
+ #### `name`
30
+
31
+ ```python
32
+ @property
33
+ def name(self) -> str
34
+ ```
35
+
36
+ Returns tool name: `"pubmed"`
37
+
38
+ ### Methods
39
+
40
+ #### `search`
41
+
42
+ ```python
43
+ async def search(
44
+ self,
45
+ query: str,
46
+ max_results: int = 10
47
+ ) -> list[Evidence]
48
+ ```
49
+
50
+ Searches PubMed for articles.
51
+
52
+ **Parameters**:
53
+ - `query`: Search query string
54
+ - `max_results`: Maximum number of results to return (default: 10)
55
+
56
+ **Returns**: List of `Evidence` objects with PubMed articles.
57
+
58
+ **Raises**:
59
+ - `SearchError`: If search fails
60
+ - `RateLimitError`: If rate limit is exceeded
61
+
62
+ ## ClinicalTrialsTool
63
+
64
+ **Module**: `src.tools.clinicaltrials`
65
+
66
+ **Purpose**: Search ClinicalTrials.gov for interventional studies.
67
+
68
+ ### Properties
69
+
70
+ #### `name`
71
+
72
+ ```python
73
+ @property
74
+ def name(self) -> str
75
+ ```
76
+
77
+ Returns tool name: `"clinicaltrials"`
78
+
79
+ ### Methods
80
+
81
+ #### `search`
82
+
83
+ ```python
84
+ async def search(
85
+ self,
86
+ query: str,
87
+ max_results: int = 10
88
+ ) -> list[Evidence]
89
+ ```
90
+
91
+ Searches ClinicalTrials.gov for trials.
92
+
93
+ **Parameters**:
94
+ - `query`: Search query string
95
+ - `max_results`: Maximum number of results to return (default: 10)
96
+
97
+ **Returns**: List of `Evidence` objects with clinical trials.
98
+
99
+ **Note**: Only returns interventional studies with status: COMPLETED, ACTIVE_NOT_RECRUITING, RECRUITING, ENROLLING_BY_INVITATION
100
+
101
+ **Raises**:
102
+ - `SearchError`: If search fails
103
+
104
+ ## EuropePMCTool
105
+
106
+ **Module**: `src.tools.europepmc`
107
+
108
+ **Purpose**: Search Europe PMC for preprints and peer-reviewed articles.
109
+
110
+ ### Properties
111
+
112
+ #### `name`
113
+
114
+ ```python
115
+ @property
116
+ def name(self) -> str
117
+ ```
118
+
119
+ Returns tool name: `"europepmc"`
120
+
121
+ ### Methods
122
+
123
+ #### `search`
124
+
125
+ ```python
126
+ async def search(
127
+ self,
128
+ query: str,
129
+ max_results: int = 10
130
+ ) -> list[Evidence]
131
+ ```
132
+
133
+ Searches Europe PMC for articles and preprints.
134
+
135
+ **Parameters**:
136
+ - `query`: Search query string
137
+ - `max_results`: Maximum number of results to return (default: 10)
138
+
139
+ **Returns**: List of `Evidence` objects with articles/preprints.
140
+
141
+ **Note**: Includes both preprints (marked with `[PREPRINT - Not peer-reviewed]`) and peer-reviewed articles.
142
+
143
+ **Raises**:
144
+ - `SearchError`: If search fails
145
+
146
+ ## RAGTool
147
+
148
+ **Module**: `src.tools.rag_tool`
149
+
150
+ **Purpose**: Semantic search within collected evidence.
151
+
152
+ ### Properties
153
+
154
+ #### `name`
155
+
156
+ ```python
157
+ @property
158
+ def name(self) -> str
159
+ ```
160
+
161
+ Returns tool name: `"rag"`
162
+
163
+ ### Methods
164
+
165
+ #### `search`
166
+
167
+ ```python
168
+ async def search(
169
+ self,
170
+ query: str,
171
+ max_results: int = 10
172
+ ) -> list[Evidence]
173
+ ```
174
+
175
+ Searches collected evidence using semantic similarity.
176
+
177
+ **Parameters**:
178
+ - `query`: Search query string
179
+ - `max_results`: Maximum number of results to return (default: 10)
180
+
181
+ **Returns**: List of `Evidence` objects from collected evidence.
182
+
183
+ **Note**: Requires evidence to be ingested into RAG service first.
184
+
185
+ ## SearchHandler
186
+
187
+ **Module**: `src.tools.search_handler`
188
+
189
+ **Purpose**: Orchestrates parallel searches across multiple tools.
190
+
191
+ ### Methods
192
+
193
+ #### `search`
194
+
195
+ ```python
196
+ async def search(
197
+ self,
198
+ query: str,
199
+ tools: list[SearchTool] | None = None,
200
+ max_results_per_tool: int = 10
201
+ ) -> SearchResult
202
+ ```
203
+
204
+ Searches multiple tools in parallel.
205
+
206
+ **Parameters**:
207
+ - `query`: Search query string
208
+ - `tools`: List of tools to use (default: all available tools)
209
+ - `max_results_per_tool`: Maximum results per tool (default: 10)
210
+
211
+ **Returns**: `SearchResult` with:
212
+ - `evidence`: Aggregated list of evidence
213
+ - `tool_results`: Results per tool
214
+ - `total_count`: Total number of results
215
+
216
+ **Note**: Uses `asyncio.gather()` for parallel execution. Handles tool failures gracefully.
217
+
218
+ ## See Also
219
+
220
+ - [Architecture - Tools](../architecture/tools.md) - Architecture overview
221
+ - [Models API](models.md) - Data models used by tools
222
+
223
+
224
+
225
+
docs/architecture/agents.md ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agents Architecture
2
+
3
+ DeepCritical uses Pydantic AI agents for all AI-powered operations. All agents follow a consistent pattern and use structured output types.
4
+
5
+ ## Agent Pattern
6
+
7
+ All agents use the Pydantic AI `Agent` class with the following structure:
8
+
9
+ - **System Prompt**: Module-level constant with date injection
10
+ - **Agent Class**: `__init__(model: Any | None = None)`
11
+ - **Main Method**: Async method (e.g., `async def evaluate()`, `async def write_report()`)
12
+ - **Factory Function**: `def create_agent_name(model: Any | None = None) -> AgentName`
13
+
14
+ ## Model Initialization
15
+
16
+ Agents use `get_model()` from `src/agent_factory/judges.py` if no model is provided. This supports:
17
+
18
+ - OpenAI models
19
+ - Anthropic models
20
+ - HuggingFace Inference API models
21
+
22
+ The model selection is based on the configured `LLM_PROVIDER` in settings.
23
+
24
+ ## Error Handling
25
+
26
+ Agents return fallback values on failure rather than raising exceptions:
27
+
28
+ - `KnowledgeGapOutput(research_complete=False, outstanding_gaps=[...])`
29
+ - Empty strings for text outputs
30
+ - Default structured outputs
31
+
32
+ All errors are logged with context using structlog.
33
+
34
+ ## Input Validation
35
+
36
+ All agents validate inputs:
37
+
38
+ - Check that queries/inputs are not empty
39
+ - Truncate very long inputs with warnings
40
+ - Handle None values gracefully
41
+
42
+ ## Output Types
43
+
44
+ Agents use structured output types from `src/utils/models.py`:
45
+
46
+ - `KnowledgeGapOutput`: Research completeness evaluation
47
+ - `AgentSelectionPlan`: Tool selection plan
48
+ - `ReportDraft`: Long-form report structure
49
+ - `ParsedQuery`: Query parsing and mode detection
50
+
51
+ For text output (writer agents), agents return `str` directly.
52
+
53
+ ## Agent Types
54
+
55
+ ### Knowledge Gap Agent
56
+
57
+ **File**: `src/agents/knowledge_gap.py`
58
+
59
+ **Purpose**: Evaluates research state and identifies knowledge gaps.
60
+
61
+ **Output**: `KnowledgeGapOutput` with:
62
+ - `research_complete`: Boolean indicating if research is complete
63
+ - `outstanding_gaps`: List of remaining knowledge gaps
64
+
65
+ **Methods**:
66
+ - `async def evaluate(query, background_context, conversation_history, iteration, time_elapsed_minutes, max_time_minutes) -> KnowledgeGapOutput`
67
+
68
+ ### Tool Selector Agent
69
+
70
+ **File**: `src/agents/tool_selector.py`
71
+
72
+ **Purpose**: Selects appropriate tools for addressing knowledge gaps.
73
+
74
+ **Output**: `AgentSelectionPlan` with list of `AgentTask` objects.
75
+
76
+ **Available Agents**:
77
+ - `WebSearchAgent`: General web search for fresh information
78
+ - `SiteCrawlerAgent`: Research specific entities/companies
79
+ - `RAGAgent`: Semantic search within collected evidence
80
+
81
+ ### Writer Agent
82
+
83
+ **File**: `src/agents/writer.py`
84
+
85
+ **Purpose**: Generates final reports from research findings.
86
+
87
+ **Output**: Markdown string with numbered citations.
88
+
89
+ **Methods**:
90
+ - `async def write_report(query, findings, output_length, output_instructions) -> str`
91
+
92
+ **Features**:
93
+ - Validates inputs
94
+ - Truncates very long findings (max 50000 chars) with warning
95
+ - Retry logic for transient failures (3 retries)
96
+ - Citation validation before returning
97
+
98
+ ### Long Writer Agent
99
+
100
+ **File**: `src/agents/long_writer.py`
101
+
102
+ **Purpose**: Long-form report generation with section-by-section writing.
103
+
104
+ **Input/Output**: Uses `ReportDraft` models.
105
+
106
+ **Methods**:
107
+ - `async def write_next_section(query, draft, section_title, section_content) -> LongWriterOutput`
108
+ - `async def write_report(query, report_title, report_draft) -> str`
109
+
110
+ **Features**:
111
+ - Writes sections iteratively
112
+ - Aggregates references across sections
113
+ - Reformats section headings and references
114
+ - Deduplicates and renumbers references
115
+
116
+ ### Proofreader Agent
117
+
118
+ **File**: `src/agents/proofreader.py`
119
+
120
+ **Purpose**: Proofreads and polishes report drafts.
121
+
122
+ **Input**: `ReportDraft`
123
+ **Output**: Polished markdown string
124
+
125
+ **Methods**:
126
+ - `async def proofread(query, report_title, report_draft) -> str`
127
+
128
+ **Features**:
129
+ - Removes duplicate content across sections
130
+ - Adds executive summary if multiple sections
131
+ - Preserves all references and citations
132
+ - Improves flow and readability
133
+
134
+ ### Thinking Agent
135
+
136
+ **File**: `src/agents/thinking.py`
137
+
138
+ **Purpose**: Generates observations from conversation history.
139
+
140
+ **Output**: Observation string
141
+
142
+ **Methods**:
143
+ - `async def generate_observations(query, background_context, conversation_history) -> str`
144
+
145
+ ### Input Parser Agent
146
+
147
+ **File**: `src/agents/input_parser.py`
148
+
149
+ **Purpose**: Parses and improves user queries, detects research mode.
150
+
151
+ **Output**: `ParsedQuery` with:
152
+ - `original_query`: Original query string
153
+ - `improved_query`: Refined query string
154
+ - `research_mode`: "iterative" or "deep"
155
+ - `key_entities`: List of key entities
156
+ - `research_questions`: List of research questions
157
+
158
+ ## Factory Functions
159
+
160
+ All agents have factory functions in `src/agent_factory/agents.py`:
161
+
162
+ ```python
163
+ def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent
164
+ def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent
165
+ def create_writer_agent(model: Any | None = None) -> WriterAgent
166
+ # ... etc
167
+ ```
168
+
169
+ Factory functions:
170
+ - Use `get_model()` if no model provided
171
+ - Raise `ConfigurationError` if creation fails
172
+ - Log agent creation
173
+
174
+ ## See Also
175
+
176
+ - [Orchestrators](orchestrators.md) - How agents are orchestrated
177
+ - [API Reference - Agents](../api/agents.md) - API documentation
178
+ - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines
179
+
180
+
181
+
182
+
docs/architecture/design-patterns.md DELETED
@@ -1,1509 +0,0 @@
1
- # Design Patterns & Technical Decisions
2
- ## Explicit Answers to Architecture Questions
3
-
4
- ---
5
-
6
- ## Purpose of This Document
7
-
8
- This document explicitly answers all the "design pattern" questions raised in team discussions. It provides clear technical decisions with rationale.
9
-
10
- ---
11
-
12
- ## 1. Primary Architecture Pattern
13
-
14
- ### Decision: Orchestrator with Search-Judge Loop
15
-
16
- **Pattern Name**: Iterative Research Orchestrator
17
-
18
- **Structure**:
19
- ```
20
- ┌─────────────────────────────────────┐
21
- │ Research Orchestrator │
22
- │ ┌───────────────────────────────┐ │
23
- │ │ Search Strategy Planner │ │
24
- │ └───────────────────────────────┘ │
25
- │ ↓ │
26
- │ ┌───────────────────────────────┐ │
27
- │ │ Tool Coordinator │ │
28
- │ │ - PubMed Search │ │
29
- │ │ - Web Search │ │
30
- │ │ - Clinical Trials │ │
31
- │ └───────────────────────────────┘ │
32
- │ ↓ │
33
- │ ┌───────────────────────────────┐ │
34
- │ │ Evidence Aggregator │ │
35
- │ └───────────────────────────────┘ │
36
- │ ↓ │
37
- │ ┌───────────────────────────────┐ │
38
- │ │ Quality Judge │ │
39
- │ │ (LLM-based assessment) │ │
40
- │ └───────────────────────────────┘ │
41
- │ ↓ │
42
- │ Loop or Synthesize? │
43
- │ ↓ │
44
- │ ┌───────────────────────────────┐ │
45
- │ │ Report Generator │ │
46
- │ └───────────────────────────────┘ │
47
- └─────────────────────────────────────┘
48
- ```
49
-
50
- **Why NOT single-agent?**
51
- - Need coordinated multi-tool queries
52
- - Need iterative refinement
53
- - Need quality assessment between searches
54
-
55
- **Why NOT pure ReAct?**
56
- - Medical research requires structured workflow
57
- - Need explicit quality gates
58
- - Want deterministic tool selection
59
-
60
- **Why THIS pattern?**
61
- - Clear separation of concerns
62
- - Testable components
63
- - Easy to debug
64
- - Proven in similar systems
65
-
66
- ---
67
-
68
- ## 2. Tool Selection & Orchestration Pattern
69
-
70
- ### Decision: Static Tool Registry with Dynamic Selection
71
-
72
- **Pattern**:
73
- ```python
74
- class ToolRegistry:
75
- """Central registry of available research tools"""
76
- tools = {
77
- 'pubmed': PubMedSearchTool(),
78
- 'web': WebSearchTool(),
79
- 'trials': ClinicalTrialsTool(),
80
- 'drugs': DrugInfoTool(),
81
- }
82
-
83
- class Orchestrator:
84
- def select_tools(self, question: str, iteration: int) -> List[Tool]:
85
- """Dynamically choose tools based on context"""
86
- if iteration == 0:
87
- # First pass: broad search
88
- return [tools['pubmed'], tools['web']]
89
- else:
90
- # Refinement: targeted search
91
- return self.judge.recommend_tools(question, context)
92
- ```
93
-
94
- **Why NOT on-the-fly agent factories?**
95
- - 6-day timeline (too complex)
96
- - Tools are known upfront
97
- - Simpler to test and debug
98
-
99
- **Why NOT single tool?**
100
- - Need multiple evidence sources
101
- - Different tools for different info types
102
- - Better coverage
103
-
104
- **Why THIS pattern?**
105
- - Balance flexibility vs simplicity
106
- - Tools can be added easily
107
- - Selection logic is transparent
108
-
109
- ---
110
-
111
- ## 3. Judge Pattern
112
-
113
- ### Decision: Dual-Judge System (Quality + Budget)
114
-
115
- **Pattern**:
116
- ```python
117
- class QualityJudge:
118
- """LLM-based evidence quality assessment"""
119
-
120
- def is_sufficient(self, question: str, evidence: List[Evidence]) -> bool:
121
- """Main decision: do we have enough?"""
122
- return (
123
- self.has_mechanism_explanation(evidence) and
124
- self.has_drug_candidates(evidence) and
125
- self.has_clinical_evidence(evidence) and
126
- self.confidence_score(evidence) > threshold
127
- )
128
-
129
- def identify_gaps(self, question: str, evidence: List[Evidence]) -> List[str]:
130
- """What's missing?"""
131
- gaps = []
132
- if not self.has_mechanism_explanation(evidence):
133
- gaps.append("disease mechanism")
134
- if not self.has_drug_candidates(evidence):
135
- gaps.append("potential drug candidates")
136
- if not self.has_clinical_evidence(evidence):
137
- gaps.append("clinical trial data")
138
- return gaps
139
-
140
- class BudgetJudge:
141
- """Resource constraint enforcement"""
142
-
143
- def should_stop(self, state: ResearchState) -> bool:
144
- """Hard limits"""
145
- return (
146
- state.tokens_used >= max_tokens or
147
- state.iterations >= max_iterations or
148
- state.time_elapsed >= max_time
149
- )
150
- ```
151
-
152
- **Why NOT just LLM judge?**
153
- - Cost control (prevent runaway queries)
154
- - Time bounds (hackathon demo needs to be fast)
155
- - Safety (prevent infinite loops)
156
-
157
- **Why NOT just token budget?**
158
- - Want early exit when answer is good
159
- - Quality matters, not just quantity
160
- - Better user experience
161
-
162
- **Why THIS pattern?**
163
- - Best of both worlds
164
- - Clear separation (quality vs resources)
165
- - Each judge has single responsibility
166
-
167
- ---
168
-
169
- ## 4. Break/Stopping Pattern
170
-
171
- ### Decision: Three-Tier Break Conditions
172
-
173
- **Pattern**:
174
- ```python
175
- def should_continue(state: ResearchState) -> bool:
176
- """Multi-tier stopping logic"""
177
-
178
- # Tier 1: Quality-based (ideal stop)
179
- if quality_judge.is_sufficient(state.question, state.evidence):
180
- state.stop_reason = "sufficient_evidence"
181
- return False
182
-
183
- # Tier 2: Budget-based (cost control)
184
- if state.tokens_used >= config.max_tokens:
185
- state.stop_reason = "token_budget_exceeded"
186
- return False
187
-
188
- # Tier 3: Iteration-based (safety)
189
- if state.iterations >= config.max_iterations:
190
- state.stop_reason = "max_iterations_reached"
191
- return False
192
-
193
- # Tier 4: Time-based (demo friendly)
194
- if state.time_elapsed >= config.max_time:
195
- state.stop_reason = "timeout"
196
- return False
197
-
198
- return True # Continue researching
199
- ```
200
-
201
- **Configuration**:
202
- ```toml
203
- [research.limits]
204
- max_tokens = 50000 # ~$0.50 at Claude pricing
205
- max_iterations = 5 # Reasonable depth
206
- max_time_seconds = 120 # 2 minutes for demo
207
- judge_threshold = 0.8 # Quality confidence score
208
- ```
209
-
210
- **Why multiple conditions?**
211
- - Defense in depth
212
- - Different failure modes
213
- - Graceful degradation
214
-
215
- **Why these specific limits?**
216
- - Tokens: Balances cost vs quality
217
- - Iterations: Enough for refinement, not too deep
218
- - Time: Fast enough for live demo
219
- - Judge: High bar for quality
220
-
221
- ---
222
-
223
- ## 5. State Management Pattern
224
-
225
- ### Decision: Pydantic State Machine with Checkpoints
226
-
227
- **Pattern**:
228
- ```python
229
- class ResearchState(BaseModel):
230
- """Immutable state snapshots"""
231
- query_id: str
232
- question: str
233
- iteration: int = 0
234
- evidence: List[Evidence] = []
235
- tokens_used: int = 0
236
- search_history: List[SearchQuery] = []
237
- stop_reason: Optional[str] = None
238
- created_at: datetime
239
- updated_at: datetime
240
-
241
- class StateManager:
242
- def save_checkpoint(self, state: ResearchState) -> None:
243
- """Save state to disk"""
244
- path = f".deepresearch/checkpoints/{state.query_id}_iter{state.iteration}.json"
245
- path.write_text(state.model_dump_json(indent=2))
246
-
247
- def load_checkpoint(self, query_id: str, iteration: int) -> ResearchState:
248
- """Resume from checkpoint"""
249
- path = f".deepresearch/checkpoints/{query_id}_iter{iteration}.json"
250
- return ResearchState.model_validate_json(path.read_text())
251
- ```
252
-
253
- **Directory Structure**:
254
- ```
255
- .deepresearch/
256
- ├── state/
257
- │ └── current_123.json # Active research state
258
- ├── checkpoints/
259
- │ ├── query_123_iter0.json # Checkpoint after iteration 0
260
- │ ├── query_123_iter1.json # Checkpoint after iteration 1
261
- │ └── query_123_iter2.json # Checkpoint after iteration 2
262
- └── workspace/
263
- └── query_123/
264
- ├── papers/ # Downloaded PDFs
265
- ├── search_results/ # Raw search results
266
- └── analysis/ # Intermediate analysis
267
- ```
268
-
269
- **Why Pydantic?**
270
- - Type safety
271
- - Validation
272
- - Easy serialization
273
- - Integration with Pydantic AI
274
-
275
- **Why checkpoints?**
276
- - Resume interrupted research
277
- - Debugging (inspect state at each iteration)
278
- - Cost savings (don't re-query)
279
- - Demo resilience
280
-
281
- ---
282
-
283
- ## 6. Tool Interface Pattern
284
-
285
- ### Decision: Async Unified Tool Protocol
286
-
287
- **Pattern**:
288
- ```python
289
- from typing import Protocol, Optional, List, Dict
290
- import asyncio
291
-
292
- class ResearchTool(Protocol):
293
- """Standard async interface all tools must implement"""
294
-
295
- async def search(
296
- self,
297
- query: str,
298
- max_results: int = 10,
299
- filters: Optional[Dict] = None
300
- ) -> List[Evidence]:
301
- """Execute search and return structured evidence"""
302
- ...
303
-
304
- def get_metadata(self) -> ToolMetadata:
305
- """Tool capabilities and requirements"""
306
- ...
307
-
308
- class PubMedSearchTool:
309
- """Concrete async implementation"""
310
-
311
- def __init__(self):
312
- self._rate_limiter = asyncio.Semaphore(3) # 3 req/sec
313
- self._cache: Dict[str, List[Evidence]] = {}
314
-
315
- async def search(self, query: str, max_results: int = 10, **kwargs) -> List[Evidence]:
316
- # Check cache first
317
- cache_key = f"{query}:{max_results}"
318
- if cache_key in self._cache:
319
- return self._cache[cache_key]
320
-
321
- async with self._rate_limiter:
322
- # 1. Query PubMed E-utilities API (async httpx)
323
- async with httpx.AsyncClient() as client:
324
- response = await client.get(
325
- "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
326
- params={"db": "pubmed", "term": query, "retmax": max_results}
327
- )
328
- # 2. Parse XML response
329
- # 3. Extract: title, abstract, authors, citations
330
- # 4. Convert to Evidence objects
331
- evidence_list = self._parse_response(response.text)
332
-
333
- # Cache results
334
- self._cache[cache_key] = evidence_list
335
- return evidence_list
336
-
337
- def get_metadata(self) -> ToolMetadata:
338
- return ToolMetadata(
339
- name="PubMed",
340
- description="Biomedical literature search",
341
- rate_limit="3 requests/second",
342
- requires_api_key=False
343
- )
344
- ```
345
-
346
- **Parallel Tool Execution**:
347
- ```python
348
- async def search_all_tools(query: str, tools: List[ResearchTool]) -> List[Evidence]:
349
- """Run all tool searches in parallel"""
350
- tasks = [tool.search(query) for tool in tools]
351
- results = await asyncio.gather(*tasks, return_exceptions=True)
352
-
353
- # Flatten and filter errors
354
- evidence = []
355
- for result in results:
356
- if isinstance(result, Exception):
357
- logger.warning(f"Tool failed: {result}")
358
- else:
359
- evidence.extend(result)
360
- return evidence
361
- ```
362
-
363
- **Why Async?**
364
- - Tools are I/O bound (network calls)
365
- - Parallel execution = faster searches
366
- - Better UX (streaming progress)
367
- - Standard in 2025 Python
368
-
369
- **Why Protocol?**
370
- - Loose coupling
371
- - Easy to add new tools
372
- - Testable with mocks
373
- - Clear contract
374
-
375
- **Why NOT abstract base class?**
376
- - More Pythonic (PEP 544)
377
- - Duck typing friendly
378
- - Runtime checking with isinstance
379
-
380
- ---
381
-
382
- ## 7. Report Generation Pattern
383
-
384
- ### Decision: Structured Output with Citations
385
-
386
- **Pattern**:
387
- ```python
388
- class DrugCandidate(BaseModel):
389
- name: str
390
- mechanism: str
391
- evidence_quality: Literal["strong", "moderate", "weak"]
392
- clinical_status: str # "FDA approved", "Phase 2", etc.
393
- citations: List[Citation]
394
-
395
- class ResearchReport(BaseModel):
396
- query: str
397
- disease_mechanism: str
398
- candidates: List[DrugCandidate]
399
- methodology: str # How we searched
400
- confidence: float
401
- sources_used: List[str]
402
- generated_at: datetime
403
-
404
- def to_markdown(self) -> str:
405
- """Human-readable format"""
406
- ...
407
-
408
- def to_json(self) -> str:
409
- """Machine-readable format"""
410
- ...
411
- ```
412
-
413
- **Output Example**:
414
- ```markdown
415
- # Research Report: Long COVID Fatigue
416
-
417
- ## Disease Mechanism
418
- Long COVID fatigue is associated with mitochondrial dysfunction
419
- and persistent inflammation [1, 2].
420
-
421
- ## Drug Candidates
422
-
423
- ### 1. Coenzyme Q10 (CoQ10) - STRONG EVIDENCE
424
- - **Mechanism**: Mitochondrial support, ATP production
425
- - **Status**: FDA approved (supplement)
426
- - **Evidence**: 2 randomized controlled trials showing fatigue reduction
427
- - **Citations**:
428
- - Smith et al. (2023) - PubMed: 12345678
429
- - Johnson et al. (2023) - PubMed: 87654321
430
-
431
- ### 2. Low-dose Naltrexone (LDN) - MODERATE EVIDENCE
432
- - **Mechanism**: Anti-inflammatory, immune modulation
433
- - **Status**: FDA approved (different indication)
434
- - **Evidence**: 3 case studies, 1 ongoing Phase 2 trial
435
- - **Citations**: ...
436
-
437
- ## Methodology
438
- - Searched PubMed: 45 papers reviewed
439
- - Searched Web: 12 sources
440
- - Clinical trials: 8 trials identified
441
- - Total iterations: 3
442
- - Tokens used: 12,450
443
-
444
- ## Confidence: 85%
445
-
446
- ## Sources
447
- - PubMed E-utilities
448
- - ClinicalTrials.gov
449
- - OpenFDA Database
450
- ```
451
-
452
- **Why structured?**
453
- - Parseable by other systems
454
- - Consistent format
455
- - Easy to validate
456
- - Good for datasets
457
-
458
- **Why markdown?**
459
- - Human-readable
460
- - Renders nicely in Gradio
461
- - Easy to convert to PDF
462
- - Standard format
463
-
464
- ---
465
-
466
- ## 8. Error Handling Pattern
467
-
468
- ### Decision: Graceful Degradation with Fallbacks
469
-
470
- **Pattern**:
471
- ```python
472
- class ResearchAgent:
473
- def research(self, question: str) -> ResearchReport:
474
- try:
475
- return self._research_with_retry(question)
476
- except TokenBudgetExceeded:
477
- # Return partial results
478
- return self._synthesize_partial(state)
479
- except ToolFailure as e:
480
- # Try alternate tools
481
- return self._research_with_fallback(question, failed_tool=e.tool)
482
- except Exception as e:
483
- # Log and return error report
484
- logger.error(f"Research failed: {e}")
485
- return self._error_report(question, error=e)
486
- ```
487
-
488
- **Why NOT fail fast?**
489
- - Hackathon demo must be robust
490
- - Partial results better than nothing
491
- - Good user experience
492
-
493
- **Why NOT silent failures?**
494
- - Need visibility for debugging
495
- - User should know limitations
496
- - Honest about confidence
497
-
498
- ---
499
-
500
- ## 9. Configuration Pattern
501
-
502
- ### Decision: Hydra-inspired but Simpler
503
-
504
- **Pattern**:
505
- ```toml
506
- # config.toml
507
-
508
- [research]
509
- max_iterations = 5
510
- max_tokens = 50000
511
- max_time_seconds = 120
512
- judge_threshold = 0.85
513
-
514
- [tools]
515
- enabled = ["pubmed", "web", "trials"]
516
-
517
- [tools.pubmed]
518
- max_results = 20
519
- rate_limit = 3 # per second
520
-
521
- [tools.web]
522
- engine = "serpapi"
523
- max_results = 10
524
-
525
- [llm]
526
- provider = "anthropic"
527
- model = "claude-3-5-sonnet-20241022"
528
- temperature = 0.1
529
-
530
- [output]
531
- format = "markdown"
532
- include_citations = true
533
- include_methodology = true
534
- ```
535
-
536
- **Loading**:
537
- ```python
538
- from pathlib import Path
539
- import tomllib
540
-
541
- def load_config() -> dict:
542
- config_path = Path("config.toml")
543
- with open(config_path, "rb") as f:
544
- return tomllib.load(f)
545
- ```
546
-
547
- **Why NOT full Hydra?**
548
- - Simpler for hackathon
549
- - Easier to understand
550
- - Faster to modify
551
- - Can upgrade later
552
-
553
- **Why TOML?**
554
- - Human-readable
555
- - Standard (PEP 680)
556
- - Better than YAML edge cases
557
- - Native in Python 3.11+
558
-
559
- ---
560
-
561
- ## 10. Testing Pattern
562
-
563
- ### Decision: Three-Level Testing Strategy
564
-
565
- **Pattern**:
566
- ```python
567
- # Level 1: Unit tests (fast, isolated)
568
- def test_pubmed_tool():
569
- tool = PubMedSearchTool()
570
- results = tool.search("aspirin cardiovascular")
571
- assert len(results) > 0
572
- assert all(isinstance(r, Evidence) for r in results)
573
-
574
- # Level 2: Integration tests (tools + agent)
575
- def test_research_loop():
576
- agent = ResearchAgent(config=test_config)
577
- report = agent.research("aspirin repurposing")
578
- assert report.candidates
579
- assert report.confidence > 0
580
-
581
- # Level 3: End-to-end tests (full system)
582
- def test_full_workflow():
583
- # Simulate user query through Gradio UI
584
- response = gradio_app.predict("test query")
585
- assert "Drug Candidates" in response
586
- ```
587
-
588
- **Why three levels?**
589
- - Fast feedback (unit tests)
590
- - Confidence (integration tests)
591
- - Reality check (e2e tests)
592
-
593
- **Test Data**:
594
- ```python
595
- # tests/fixtures/
596
- - mock_pubmed_response.xml
597
- - mock_web_results.json
598
- - sample_research_query.txt
599
- - expected_report.md
600
- ```
601
-
602
- ---
603
-
604
- ## 11. Judge Prompt Templates
605
-
606
- ### Decision: Structured JSON Output with Domain-Specific Criteria
607
-
608
- **Quality Judge System Prompt**:
609
- ```python
610
- QUALITY_JUDGE_SYSTEM = """You are a medical research quality assessor specializing in drug repurposing.
611
- Your task is to evaluate if collected evidence is sufficient to answer a drug repurposing question.
612
-
613
- You assess evidence against four criteria specific to drug repurposing research:
614
- 1. MECHANISM: Understanding of the disease's molecular/cellular mechanisms
615
- 2. CANDIDATES: Identification of potential drug candidates with known mechanisms
616
- 3. EVIDENCE: Clinical or preclinical evidence supporting repurposing
617
- 4. SOURCES: Quality and credibility of sources (peer-reviewed > preprints > web)
618
-
619
- You MUST respond with valid JSON only. No other text."""
620
- ```
621
-
622
- **Quality Judge User Prompt**:
623
- ```python
624
- QUALITY_JUDGE_USER = """
625
- ## Research Question
626
- {question}
627
-
628
- ## Evidence Collected (Iteration {iteration} of {max_iterations})
629
- {evidence_summary}
630
-
631
- ## Token Budget
632
- Used: {tokens_used} / {max_tokens}
633
-
634
- ## Your Assessment
635
-
636
- Evaluate the evidence and respond with this exact JSON structure:
637
-
638
- ```json
639
- {{
640
- "assessment": {{
641
- "mechanism_score": <0-10>,
642
- "mechanism_reasoning": "<Step-by-step analysis of mechanism understanding>",
643
- "candidates_score": <0-10>,
644
- "candidates_found": ["<drug1>", "<drug2>", ...],
645
- "evidence_score": <0-10>,
646
- "evidence_reasoning": "<Critical evaluation of clinical/preclinical support>",
647
- "sources_score": <0-10>,
648
- "sources_breakdown": {{
649
- "peer_reviewed": <count>,
650
- "clinical_trials": <count>,
651
- "preprints": <count>,
652
- "other": <count>
653
- }}
654
- }},
655
- "overall_confidence": <0.0-1.0>,
656
- "sufficient": <true/false>,
657
- "gaps": ["<missing info 1>", "<missing info 2>"],
658
- "recommended_searches": ["<search query 1>", "<search query 2>"],
659
- "recommendation": "<continue|synthesize>"
660
- }}
661
- ```
662
-
663
- Decision rules:
664
- - sufficient=true if overall_confidence >= 0.8 AND mechanism_score >= 6 AND candidates_score >= 6
665
- - sufficient=true if remaining budget < 10% (must synthesize with what we have)
666
- - Otherwise, provide recommended_searches to fill gaps
667
- """
668
- ```
669
-
670
- **Report Synthesis Prompt**:
671
- ```python
672
- SYNTHESIS_PROMPT = """You are a medical research synthesizer creating a drug repurposing report.
673
-
674
- ## Research Question
675
- {question}
676
-
677
- ## Collected Evidence
678
- {all_evidence}
679
-
680
- ## Judge Assessment
681
- {final_assessment}
682
-
683
- ## Your Task
684
- Create a comprehensive research report with this structure:
685
-
686
- 1. **Executive Summary** (2-3 sentences)
687
- 2. **Disease Mechanism** - What we understand about the condition
688
- 3. **Drug Candidates** - For each candidate:
689
- - Drug name and current FDA status
690
- - Proposed mechanism for this condition
691
- - Evidence quality (strong/moderate/weak)
692
- - Key citations
693
- 4. **Methodology** - How we searched (tools used, queries, iterations)
694
- 5. **Limitations** - What we couldn't find or verify
695
- 6. **Confidence Score** - Overall confidence in findings
696
-
697
- Format as Markdown. Include PubMed IDs as citations [PMID: 12345678].
698
- Be scientifically accurate. Do not hallucinate drug names or mechanisms.
699
- If evidence is weak, say so clearly."""
700
- ```
701
-
702
- **Why Structured JSON?**
703
- - Parseable by code (not just LLM output)
704
- - Consistent format for logging/debugging
705
- - Can trigger specific actions (continue vs synthesize)
706
- - Testable with expected outputs
707
-
708
- **Why Domain-Specific Criteria?**
709
- - Generic "is this good?" prompts fail
710
- - Drug repurposing has specific requirements
711
- - Physician on team validated criteria
712
- - Maps to real research workflow
713
-
714
- ---
715
-
716
- ## 12. MCP Server Integration (Hackathon Track)
717
-
718
- ### Decision: Tools as MCP Servers for Reusability
719
-
720
- **Why MCP?**
721
- - Hackathon has dedicated MCP track
722
- - Makes our tools reusable by others
723
- - Standard protocol (Model Context Protocol)
724
- - Future-proof (industry adoption growing)
725
-
726
- **Architecture**:
727
- ```
728
- ┌─────────────────────────────────────────────────┐
729
- │ DeepCritical Agent │
730
- │ (uses tools directly OR via MCP) │
731
- └─────────────────────────────────────────────────┘
732
-
733
- ┌────────────┼────────────┐
734
- ↓ ↓ ↓
735
- ┌─────────────┐ ┌──────────┐ ┌───────────────┐
736
- │ PubMed MCP │ │ Web MCP │ │ Trials MCP │
737
- │ Server │ │ Server │ │ Server │
738
- └─────────────┘ └──────────┘ └───────────────┘
739
- │ │ │
740
- ↓ ↓ ↓
741
- PubMed API Brave/DDG ClinicalTrials.gov
742
- ```
743
-
744
- **PubMed MCP Server Implementation**:
745
- ```python
746
- # src/mcp_servers/pubmed_server.py
747
- from fastmcp import FastMCP
748
-
749
- mcp = FastMCP("PubMed Research Tool")
750
-
751
- @mcp.tool()
752
- async def search_pubmed(
753
- query: str,
754
- max_results: int = 10,
755
- date_range: str = "5y"
756
- ) -> dict:
757
- """
758
- Search PubMed for biomedical literature.
759
-
760
- Args:
761
- query: Search terms (supports PubMed syntax like [MeSH])
762
- max_results: Maximum papers to return (default 10, max 100)
763
- date_range: Time filter - "1y", "5y", "10y", or "all"
764
-
765
- Returns:
766
- dict with papers list containing title, abstract, authors, pmid, date
767
- """
768
- tool = PubMedSearchTool()
769
- results = await tool.search(query, max_results)
770
- return {
771
- "query": query,
772
- "count": len(results),
773
- "papers": [r.model_dump() for r in results]
774
- }
775
-
776
- @mcp.tool()
777
- async def get_paper_details(pmid: str) -> dict:
778
- """
779
- Get full details for a specific PubMed paper.
780
-
781
- Args:
782
- pmid: PubMed ID (e.g., "12345678")
783
-
784
- Returns:
785
- Full paper metadata including abstract, MeSH terms, references
786
- """
787
- tool = PubMedSearchTool()
788
- return await tool.get_details(pmid)
789
-
790
- if __name__ == "__main__":
791
- mcp.run()
792
- ```
793
-
794
- **Running the MCP Server**:
795
- ```bash
796
- # Start the server
797
- python -m src.mcp_servers.pubmed_server
798
-
799
- # Or with uvx (recommended)
800
- uvx fastmcp run src/mcp_servers/pubmed_server.py
801
-
802
- # Note: fastmcp uses stdio transport by default, which is perfect
803
- # for local integration with Claude Desktop or the main agent.
804
- ```
805
-
806
- **Claude Desktop Integration** (for demo):
807
- ```json
808
- // ~/Library/Application Support/Claude/claude_desktop_config.json
809
- {
810
- "mcpServers": {
811
- "pubmed": {
812
- "command": "python",
813
- "args": ["-m", "src.mcp_servers.pubmed_server"],
814
- "cwd": "/path/to/deepcritical"
815
- }
816
- }
817
- }
818
- ```
819
-
820
- **Why FastMCP?**
821
- - Simple decorator syntax
822
- - Handles protocol complexity
823
- - Good docs and examples
824
- - Works with Claude Desktop and API
825
-
826
- **MCP Track Submission Requirements**:
827
- - [ ] At least one tool as MCP server
828
- - [ ] README with setup instructions
829
- - [ ] Demo showing MCP usage
830
- - [ ] Bonus: Multiple tools as MCP servers
831
-
832
- ---
833
-
834
- ## 13. Gradio UI Pattern (Hackathon Track)
835
-
836
- ### Decision: Streaming Progress with Modern UI
837
-
838
- **Pattern**:
839
- ```python
840
- import gradio as gr
841
- from typing import Generator
842
-
843
- def research_with_streaming(question: str) -> Generator[str, None, None]:
844
- """Stream research progress to UI"""
845
- yield "🔍 Starting research...\n\n"
846
-
847
- agent = ResearchAgent()
848
-
849
- async for event in agent.research_stream(question):
850
- match event.type:
851
- case "search_start":
852
- yield f"📚 Searching {event.tool}...\n"
853
- case "search_complete":
854
- yield f"✅ Found {event.count} results from {event.tool}\n"
855
- case "judge_thinking":
856
- yield f"🤔 Evaluating evidence quality...\n"
857
- case "judge_decision":
858
- yield f"📊 Confidence: {event.confidence:.0%}\n"
859
- case "iteration_complete":
860
- yield f"🔄 Iteration {event.iteration} complete\n\n"
861
- case "synthesis_start":
862
- yield f"📝 Generating report...\n"
863
- case "complete":
864
- yield f"\n---\n\n{event.report}"
865
-
866
- # Gradio 5 UI
867
- with gr.Blocks(theme=gr.themes.Soft()) as demo:
868
- gr.Markdown("# 🔬 DeepCritical: Drug Repurposing Research Agent")
869
- gr.Markdown("Ask a question about potential drug repurposing opportunities.")
870
-
871
- with gr.Row():
872
- with gr.Column(scale=2):
873
- question = gr.Textbox(
874
- label="Research Question",
875
- placeholder="What existing drugs might help treat long COVID fatigue?",
876
- lines=2
877
- )
878
- examples = gr.Examples(
879
- examples=[
880
- "What existing drugs might help treat long COVID fatigue?",
881
- "Find existing drugs that might slow Alzheimer's progression",
882
- "Which diabetes drugs show promise for cancer treatment?"
883
- ],
884
- inputs=question
885
- )
886
- submit = gr.Button("🚀 Start Research", variant="primary")
887
-
888
- with gr.Column(scale=3):
889
- output = gr.Markdown(label="Research Progress & Report")
890
-
891
- submit.click(
892
- fn=research_with_streaming,
893
- inputs=question,
894
- outputs=output,
895
- )
896
-
897
- demo.launch()
898
- ```
899
-
900
- **Why Streaming?**
901
- - User sees progress, not loading spinner
902
- - Builds trust (system is working)
903
- - Better UX for long operations
904
- - Gradio 5 native support
905
-
906
- **Why gr.Markdown Output?**
907
- - Research reports are markdown
908
- - Renders citations nicely
909
- - Code blocks for methodology
910
- - Tables for drug comparisons
911
-
912
- ---
913
-
914
- ## Summary: Design Decision Table
915
-
916
- | # | Question | Decision | Why |
917
- |---|----------|----------|-----|
918
- | 1 | **Architecture** | Orchestrator with search-judge loop | Clear, testable, proven |
919
- | 2 | **Tools** | Static registry, dynamic selection | Balance flexibility vs simplicity |
920
- | 3 | **Judge** | Dual (quality + budget) | Quality + cost control |
921
- | 4 | **Stopping** | Four-tier conditions | Defense in depth |
922
- | 5 | **State** | Pydantic + checkpoints | Type-safe, resumable |
923
- | 6 | **Tool Interface** | Async Protocol + parallel execution | Fast I/O, modern Python |
924
- | 7 | **Output** | Structured + Markdown | Human & machine readable |
925
- | 8 | **Errors** | Graceful degradation + fallbacks | Robust for demo |
926
- | 9 | **Config** | TOML (Hydra-inspired) | Simple, standard |
927
- | 10 | **Testing** | Three levels | Fast feedback + confidence |
928
- | 11 | **Judge Prompts** | Structured JSON + domain criteria | Parseable, medical-specific |
929
- | 12 | **MCP** | Tools as MCP servers | Hackathon track, reusability |
930
- | 13 | **UI** | Gradio 5 streaming | Progress visibility, modern UX |
931
-
932
- ---
933
-
934
- ## Answers to Specific Questions
935
-
936
- ### "What's the orchestrator pattern?"
937
- **Answer**: See Section 1 - Iterative Research Orchestrator with search-judge loop
938
-
939
- ### "LLM-as-judge or token budget?"
940
- **Answer**: Both - See Section 3 (Dual-Judge System) and Section 4 (Three-Tier Break Conditions)
941
-
942
- ### "What's the break pattern?"
943
- **Answer**: See Section 4 - Three stopping conditions: quality threshold, token budget, max iterations
944
-
945
- ### "Should we use agent factories?"
946
- **Answer**: No - See Section 2. Static tool registry is simpler for 6-day timeline
947
-
948
- ### "How do we handle state?"
949
- **Answer**: See Section 5 - Pydantic state machine with checkpoints
950
-
951
- ---
952
-
953
- ## Appendix: Complete Data Models
954
-
955
- ```python
956
- # src/deepresearch/models.py
957
- from pydantic import BaseModel, Field
958
- from typing import List, Optional, Literal
959
- from datetime import datetime
960
-
961
- class Citation(BaseModel):
962
- """Reference to a source"""
963
- source_type: Literal["pubmed", "web", "trial", "fda"]
964
- identifier: str # PMID, URL, NCT number, etc.
965
- title: str
966
- authors: Optional[List[str]] = None
967
- date: Optional[str] = None
968
- url: Optional[str] = None
969
-
970
- class Evidence(BaseModel):
971
- """Single piece of evidence from search"""
972
- content: str
973
- source: Citation
974
- relevance_score: float = Field(ge=0, le=1)
975
- evidence_type: Literal["mechanism", "candidate", "clinical", "safety"]
976
-
977
- class DrugCandidate(BaseModel):
978
- """Potential drug for repurposing"""
979
- name: str
980
- generic_name: Optional[str] = None
981
- mechanism: str
982
- current_indications: List[str]
983
- proposed_mechanism: str
984
- evidence_quality: Literal["strong", "moderate", "weak"]
985
- fda_status: str
986
- citations: List[Citation]
987
-
988
- class JudgeAssessment(BaseModel):
989
- """Output from quality judge"""
990
- mechanism_score: int = Field(ge=0, le=10)
991
- candidates_score: int = Field(ge=0, le=10)
992
- evidence_score: int = Field(ge=0, le=10)
993
- sources_score: int = Field(ge=0, le=10)
994
- overall_confidence: float = Field(ge=0, le=1)
995
- sufficient: bool
996
- gaps: List[str]
997
- recommended_searches: List[str]
998
- recommendation: Literal["continue", "synthesize"]
999
-
1000
- class ResearchState(BaseModel):
1001
- """Complete state of a research session"""
1002
- query_id: str
1003
- question: str
1004
- iteration: int = 0
1005
- evidence: List[Evidence] = []
1006
- assessments: List[JudgeAssessment] = []
1007
- tokens_used: int = 0
1008
- search_history: List[str] = []
1009
- stop_reason: Optional[str] = None
1010
- created_at: datetime = Field(default_factory=datetime.utcnow)
1011
- updated_at: datetime = Field(default_factory=datetime.utcnow)
1012
-
1013
- class ResearchReport(BaseModel):
1014
- """Final output report"""
1015
- query: str
1016
- executive_summary: str
1017
- disease_mechanism: str
1018
- candidates: List[DrugCandidate]
1019
- methodology: str
1020
- limitations: str
1021
- confidence: float
1022
- sources_used: int
1023
- tokens_used: int
1024
- iterations: int
1025
- generated_at: datetime = Field(default_factory=datetime.utcnow)
1026
-
1027
- def to_markdown(self) -> str:
1028
- """Render as markdown for Gradio"""
1029
- md = f"# Research Report: {self.query}\n\n"
1030
- md += f"## Executive Summary\n{self.executive_summary}\n\n"
1031
- md += f"## Disease Mechanism\n{self.disease_mechanism}\n\n"
1032
- md += "## Drug Candidates\n\n"
1033
- for i, drug in enumerate(self.candidates, 1):
1034
- md += f"### {i}. {drug.name} - {drug.evidence_quality.upper()} EVIDENCE\n"
1035
- md += f"- **Mechanism**: {drug.proposed_mechanism}\n"
1036
- md += f"- **FDA Status**: {drug.fda_status}\n"
1037
- md += f"- **Current Uses**: {', '.join(drug.current_indications)}\n"
1038
- md += f"- **Citations**: {len(drug.citations)} sources\n\n"
1039
- md += f"## Methodology\n{self.methodology}\n\n"
1040
- md += f"## Limitations\n{self.limitations}\n\n"
1041
- md += f"## Confidence: {self.confidence:.0%}\n"
1042
- return md
1043
- ```
1044
-
1045
- ---
1046
-
1047
- ## 14. Alternative Frameworks Considered
1048
-
1049
- We researched major agent frameworks before settling on our stack. Here's why we chose what we chose, and what we'd steal if we're shipping like animals and have time for Gucci upgrades.
1050
-
1051
- ### Frameworks Evaluated
1052
-
1053
- | Framework | Repo | What It Does |
1054
- |-----------|------|--------------|
1055
- | **Microsoft AutoGen** | [github.com/microsoft/autogen](https://github.com/microsoft/autogen) | Multi-agent orchestration, complex workflows |
1056
- | **Claude Agent SDK** | [github.com/anthropics/claude-agent-sdk-python](https://github.com/anthropics/claude-agent-sdk-python) | Anthropic's official agent framework |
1057
- | **Pydantic AI** | [github.com/pydantic/pydantic-ai](https://github.com/pydantic/pydantic-ai) | Type-safe agents, structured outputs |
1058
-
1059
- ### Why NOT AutoGen (Microsoft)?
1060
-
1061
- **Pros:**
1062
- - Battle-tested multi-agent orchestration
1063
- - `reflect_on_tool_use` - model reviews its own tool results
1064
- - `max_tool_iterations` - built-in iteration limits
1065
- - Concurrent tool execution
1066
- - Rich ecosystem (AutoGen Studio, benchmarks)
1067
-
1068
- **Cons for MVP:**
1069
- - Heavy dependency tree (50+ packages)
1070
- - Complex configuration (YAML + Python)
1071
- - Overkill for single-agent search-judge loop
1072
- - Learning curve eats into 6-day timeline
1073
-
1074
- **Verdict:** Great for multi-agent systems. Overkill for our MVP.
1075
-
1076
- ### Why NOT Claude Agent SDK (Anthropic)?
1077
-
1078
- **Pros:**
1079
- - Official Anthropic framework
1080
- - Clean `@tool` decorator pattern
1081
- - In-process MCP servers (no subprocess)
1082
- - Hooks for pre/post tool execution
1083
- - Direct Claude Code integration
1084
-
1085
- **Cons for MVP:**
1086
- - Requires Claude Code CLI bundled
1087
- - Node.js dependency for some features
1088
- - Designed for Claude Code ecosystem, not standalone agents
1089
- - Less flexible for custom LLM providers
1090
-
1091
- **Verdict:** Would be great if we were building ON Claude Code. We're building a standalone agent.
1092
-
1093
- ### Why Pydantic AI + FastMCP (Our Choice)
1094
-
1095
- **Pros:**
1096
- - ✅ Simple, Pythonic API
1097
- - ✅ Native async/await
1098
- - ✅ Type-safe with Pydantic
1099
- - ✅ Works with any LLM provider
1100
- - ✅ FastMCP for clean MCP servers
1101
- - ✅ Minimal dependencies
1102
- - ✅ Can ship MVP in 6 days
1103
-
1104
- **Cons:**
1105
- - Newer framework (less battle-tested)
1106
- - Smaller ecosystem
1107
- - May need to build more from scratch
1108
-
1109
- **Verdict:** Right tool for the job. Ship fast, iterate later.
1110
-
1111
- ---
1112
-
1113
- ## 15. Stretch Goals: Gucci Bangers (If We're Shipping Like Animals)
1114
-
1115
- If MVP ships early and we're crushing it, here's what we'd steal from other frameworks:
1116
-
1117
- ### Tier 1: Quick Wins (2-4 hours each)
1118
-
1119
- #### From Claude Agent SDK: `@tool` Decorator Pattern
1120
- Replace our Protocol-based tools with cleaner decorators:
1121
-
1122
- ```python
1123
- # CURRENT (Protocol-based)
1124
- class PubMedSearchTool:
1125
- async def search(self, query: str, max_results: int = 10) -> List[Evidence]:
1126
- ...
1127
-
1128
- # UPGRADE (Decorator-based, stolen from Claude SDK)
1129
- from claude_agent_sdk import tool
1130
-
1131
- @tool("search_pubmed", "Search PubMed for biomedical papers", {
1132
- "query": str,
1133
- "max_results": int
1134
- })
1135
- async def search_pubmed(args):
1136
- results = await _do_pubmed_search(args["query"], args["max_results"])
1137
- return {"content": [{"type": "text", "text": json.dumps(results)}]}
1138
- ```
1139
-
1140
- **Why it's Gucci:** Cleaner syntax, automatic schema generation, less boilerplate.
1141
-
1142
- #### From AutoGen: Reflect on Tool Use
1143
- Add a reflection step where the model reviews its own tool results:
1144
-
1145
- ```python
1146
- # CURRENT: Judge evaluates evidence
1147
- assessment = await judge.assess(question, evidence)
1148
-
1149
- # UPGRADE: Add reflection step (stolen from AutoGen)
1150
- class ReflectiveJudge:
1151
- async def assess_with_reflection(self, question, evidence, tool_results):
1152
- # First pass: raw assessment
1153
- initial = await self._assess(question, evidence)
1154
-
1155
- # Reflection: "Did I use the tools correctly?"
1156
- reflection = await self._reflect_on_tool_use(tool_results)
1157
-
1158
- # Final: combine assessment + reflection
1159
- return self._combine(initial, reflection)
1160
- ```
1161
-
1162
- **Why it's Gucci:** Catches tool misuse, improves accuracy, more robust judge.
1163
-
1164
- ### Tier 2: Medium Lifts (4-8 hours each)
1165
-
1166
- #### From AutoGen: Concurrent Tool Execution
1167
- Run multiple tools in parallel with proper error handling:
1168
-
1169
- ```python
1170
- # CURRENT: Sequential with asyncio.gather
1171
- results = await asyncio.gather(*[tool.search(query) for tool in tools])
1172
-
1173
- # UPGRADE: AutoGen-style with cancellation + timeout
1174
- from autogen_core import CancellationToken
1175
-
1176
- async def execute_tools_concurrent(tools, query, timeout=30):
1177
- token = CancellationToken()
1178
-
1179
- async def run_with_timeout(tool):
1180
- try:
1181
- return await asyncio.wait_for(
1182
- tool.search(query, cancellation_token=token),
1183
- timeout=timeout
1184
- )
1185
- except asyncio.TimeoutError:
1186
- token.cancel() # Cancel other tools
1187
- return ToolError(f"{tool.name} timed out")
1188
-
1189
- return await asyncio.gather(*[run_with_timeout(t) for t in tools])
1190
- ```
1191
-
1192
- **Why it's Gucci:** Proper timeout handling, cancellation propagation, production-ready.
1193
-
1194
- #### From Claude SDK: Hooks System
1195
- Add pre/post hooks for logging, validation, cost tracking:
1196
-
1197
- ```python
1198
- # UPGRADE: Hook system (stolen from Claude SDK)
1199
- class HookManager:
1200
- async def pre_tool_use(self, tool_name, args):
1201
- """Called before every tool execution"""
1202
- logger.info(f"Calling {tool_name} with {args}")
1203
- self.cost_tracker.start_timer()
1204
-
1205
- async def post_tool_use(self, tool_name, result, duration):
1206
- """Called after every tool execution"""
1207
- self.cost_tracker.record(tool_name, duration)
1208
- if result.is_error:
1209
- self.error_tracker.record(tool_name, result.error)
1210
- ```
1211
-
1212
- **Why it's Gucci:** Observability, debugging, cost tracking, production-ready.
1213
-
1214
- ### Tier 3: Big Lifts (Post-Hackathon)
1215
-
1216
- #### Full AutoGen Integration
1217
- If we want multi-agent capabilities later:
1218
-
1219
- ```python
1220
- # POST-HACKATHON: Multi-agent drug repurposing
1221
- from autogen_agentchat import AssistantAgent, GroupChat
1222
-
1223
- literature_agent = AssistantAgent(
1224
- name="LiteratureReviewer",
1225
- tools=[pubmed_search, web_search],
1226
- system_message="You search and summarize medical literature."
1227
- )
1228
-
1229
- mechanism_agent = AssistantAgent(
1230
- name="MechanismAnalyzer",
1231
- tools=[pathway_db, protein_db],
1232
- system_message="You analyze disease mechanisms and drug targets."
1233
- )
1234
-
1235
- synthesis_agent = AssistantAgent(
1236
- name="ReportSynthesizer",
1237
- system_message="You synthesize findings into actionable reports."
1238
- )
1239
-
1240
- # Orchestrate multi-agent workflow
1241
- group_chat = GroupChat(
1242
- agents=[literature_agent, mechanism_agent, synthesis_agent],
1243
- max_round=10
1244
- )
1245
- ```
1246
-
1247
- **Why it's Gucci:** True multi-agent collaboration, specialized roles, scalable.
1248
-
1249
- ---
1250
-
1251
- ## Priority Order for Stretch Goals
1252
-
1253
- | Priority | Feature | Source | Effort | Impact |
1254
- |----------|---------|--------|--------|--------|
1255
- | 1 | `@tool` decorator | Claude SDK | 2 hrs | High - cleaner code |
1256
- | 2 | Reflect on tool use | AutoGen | 3 hrs | High - better accuracy |
1257
- | 3 | Hooks system | Claude SDK | 4 hrs | Medium - observability |
1258
- | 4 | Concurrent + cancellation | AutoGen | 4 hrs | Medium - robustness |
1259
- | 5 | Multi-agent | AutoGen | 8+ hrs | Post-hackathon |
1260
-
1261
- ---
1262
-
1263
- ## The Bottom Line
1264
-
1265
- ```
1266
- ┌─────────────────────────────────────────────────────────────┐
1267
- │ MVP (Days 1-4): Pydantic AI + FastMCP │
1268
- │ - Ship working drug repurposing agent │
1269
- │ - Search-judge loop with PubMed + Web │
1270
- │ - Gradio UI with streaming │
1271
- │ - MCP server for hackathon track │
1272
- ├─────────────────────────────────────────────────────────────┤
1273
- │ If Crushing It (Days 5-6): Steal the Gucci │
1274
- │ - @tool decorators from Claude SDK │
1275
- │ - Reflect on tool use from AutoGen │
1276
- │ - Hooks for observability │
1277
- ├─────────────────────────────────────────────────────────────┤
1278
- │ Post-Hackathon: Full AutoGen Integration │
1279
- │ - Multi-agent workflows │
1280
- │ - Specialized agent roles │
1281
- │ - Production-grade orchestration │
1282
- └─────────────────────────────────────────────────────────────┘
1283
- ```
1284
-
1285
- **Ship MVP first. Steal bangers if time. Scale later.**
1286
-
1287
- ---
1288
-
1289
- ## 16. Reference Implementation Resources
1290
-
1291
- We've cloned production-ready repos into `reference_repos/` that we can vendor, copy from, or just USE directly. This section documents what's available and how to leverage it.
1292
-
1293
- ### Cloned Repositories
1294
-
1295
- | Repository | Location | What It Provides |
1296
- |------------|----------|------------------|
1297
- | **pydanticai-research-agent** | `reference_repos/pydanticai-research-agent/` | Complete PydanticAI agent with Brave Search |
1298
- | **pubmed-mcp-server** | `reference_repos/pubmed-mcp-server/` | Production-grade PubMed MCP server (TypeScript) |
1299
- | **autogen-microsoft** | `reference_repos/autogen-microsoft/` | Microsoft's multi-agent framework |
1300
- | **claude-agent-sdk** | `reference_repos/claude-agent-sdk/` | Anthropic's agent SDK with @tool decorator |
1301
-
1302
- ### 🔥 CHEAT CODE: Production PubMed MCP Already Exists
1303
-
1304
- The `pubmed-mcp-server` is **production-grade** and has EVERYTHING we need:
1305
-
1306
- ```bash
1307
- # Already available tools in pubmed-mcp-server:
1308
- pubmed_search_articles # Search PubMed with filters, date ranges
1309
- pubmed_fetch_contents # Get full article details by PMID
1310
- pubmed_article_connections # Find citations, related articles
1311
- pubmed_research_agent # Generate research plan outlines
1312
- pubmed_generate_chart # Create PNG charts from data
1313
- ```
1314
-
1315
- **Option 1: Use it directly via npx**
1316
- ```json
1317
- {
1318
- "mcpServers": {
1319
- "pubmed": {
1320
- "command": "npx",
1321
- "args": ["@cyanheads/pubmed-mcp-server"],
1322
- "env": { "NCBI_API_KEY": "your_key" }
1323
- }
1324
- }
1325
- }
1326
- ```
1327
-
1328
- **Option 2: Vendor the logic into Python**
1329
- The TypeScript code in `reference_repos/pubmed-mcp-server/src/` shows exactly how to:
1330
- - Construct PubMed E-utilities queries
1331
- - Handle rate limiting (3/sec without key, 10/sec with key)
1332
- - Parse XML responses
1333
- - Extract article metadata
1334
-
1335
- ### PydanticAI Research Agent Patterns
1336
-
1337
- The `pydanticai-research-agent` repo provides copy-paste patterns:
1338
-
1339
- **Agent Definition** (`agents/research_agent.py`):
1340
- ```python
1341
- from pydantic_ai import Agent, RunContext
1342
- from dataclasses import dataclass
1343
-
1344
- @dataclass
1345
- class ResearchAgentDependencies:
1346
- brave_api_key: str
1347
- session_id: Optional[str] = None
1348
-
1349
- research_agent = Agent(
1350
- get_llm_model(),
1351
- deps_type=ResearchAgentDependencies,
1352
- system_prompt=SYSTEM_PROMPT
1353
- )
1354
-
1355
- @research_agent.tool
1356
- async def search_web(
1357
- ctx: RunContext[ResearchAgentDependencies],
1358
- query: str,
1359
- max_results: int = 10
1360
- ) -> List[Dict[str, Any]]:
1361
- """Search with context access via ctx.deps"""
1362
- results = await search_web_tool(ctx.deps.brave_api_key, query, max_results)
1363
- return results
1364
- ```
1365
-
1366
- **Brave Search Tool** (`tools/brave_search.py`):
1367
- ```python
1368
- async def search_web_tool(api_key: str, query: str, count: int = 10) -> List[Dict]:
1369
- headers = {"X-Subscription-Token": api_key, "Accept": "application/json"}
1370
- async with httpx.AsyncClient() as client:
1371
- response = await client.get(
1372
- "https://api.search.brave.com/res/v1/web/search",
1373
- headers=headers,
1374
- params={"q": query, "count": count},
1375
- timeout=30.0
1376
- )
1377
- # Handle 429 rate limit, 401 auth errors
1378
- data = response.json()
1379
- return data.get("web", {}).get("results", [])
1380
- ```
1381
-
1382
- **Pydantic Models** (`models/research_models.py`):
1383
- ```python
1384
- class BraveSearchResult(BaseModel):
1385
- title: str
1386
- url: str
1387
- description: str
1388
- score: float = Field(ge=0.0, le=1.0)
1389
- ```
1390
-
1391
- ### Microsoft Agent Framework Orchestration Patterns
1392
-
1393
- From [deepwiki.com/microsoft/agent-framework](https://deepwiki.com/microsoft/agent-framework/3.4-workflows-and-orchestration):
1394
-
1395
- #### Sequential Orchestration
1396
- ```
1397
- Agent A → Agent B → Agent C (each receives prior outputs)
1398
- ```
1399
- **Use when:** Tasks have dependencies, results inform next steps.
1400
-
1401
- #### Concurrent (Fan-out/Fan-in)
1402
- ```
1403
- ┌→ Agent A ─┐
1404
- Dispatcher ├→ Agent B ─┼→ Aggregator
1405
- └→ Agent C ─┘
1406
- ```
1407
- **Use when:** Independent tasks can run in parallel, results need consolidation.
1408
- **Our use:** Parallel PubMed + Web search.
1409
-
1410
- #### Handoff Orchestration
1411
- ```
1412
- Coordinator → routes to → Specialist A, B, or C based on request
1413
- ```
1414
- **Use when:** Router decides which search strategy based on query type.
1415
- **Our use:** Route "mechanism" vs "clinical trial" vs "drug info" queries.
1416
-
1417
- #### HITL (Human-in-the-Loop)
1418
- ```
1419
- Agent → RequestInfoEvent → Human validates → Agent continues
1420
- ```
1421
- **Use when:** Critical judgment points need human validation.
1422
- **Our use:** Optional "approve drug candidates before synthesis" step.
1423
-
1424
- ### Recommended Hybrid Pattern for Our Agent
1425
-
1426
- Based on all the research, here's our recommended implementation:
1427
-
1428
- ```
1429
- ┌─────────────────────────────────────────────────────────┐
1430
- │ 1. ROUTER (Handoff Pattern) │
1431
- │ - Analyze query type │
1432
- │ - Choose search strategy │
1433
- ├─────────────────────────────────────────────────────────┤
1434
- │ 2. SEARCH (Concurrent Pattern) │
1435
- │ - Fan-out to PubMed + Web in parallel │
1436
- │ - Timeout handling per AutoGen patterns │
1437
- │ - Aggregate results │
1438
- ├─────────────────────────────────────────────────────────┤
1439
- │ 3. JUDGE (Sequential + Budget) │
1440
- │ - Quality assessment │
1441
- │ - Token/iteration budget check │
1442
- │ - Recommend: continue or synthesize │
1443
- ├─────────────────────────────────────────────────────────┤
1444
- │ 4. SYNTHESIZE (Final Agent) │
1445
- │ - Generate research report │
1446
- │ - Include citations │
1447
- │ - Stream to Gradio UI │
1448
- └─────────────────────────────────────────────────────────┘
1449
- ```
1450
-
1451
- ### Quick Start: Minimal Implementation Path
1452
-
1453
- **Day 1-2: Core Loop**
1454
- 1. Copy `search_web_tool` from `pydanticai-research-agent/tools/brave_search.py`
1455
- 2. Implement PubMed search (reference `pubmed-mcp-server/src/` for E-utilities patterns)
1456
- 3. Wire up basic search-judge loop
1457
-
1458
- **Day 3: Judge + State**
1459
- 1. Implement quality judge with JSON structured output
1460
- 2. Add budget judge
1461
- 3. Add Pydantic state management
1462
-
1463
- **Day 4: UI + MCP**
1464
- 1. Gradio streaming UI
1465
- 2. Wrap PubMed tool as FastMCP server
1466
-
1467
- **Day 5-6: Polish + Deploy**
1468
- 1. HuggingFace Spaces deployment
1469
- 2. Demo video
1470
- 3. Stretch goals if time
1471
-
1472
- ---
1473
-
1474
- ## 17. External Resources & MCP Servers
1475
-
1476
- ### Available PubMed MCP Servers (Community)
1477
-
1478
- | Server | Author | Features | Link |
1479
- |--------|--------|----------|------|
1480
- | **pubmed-mcp-server** | cyanheads | Full E-utilities, research agent, charts | [GitHub](https://github.com/cyanheads/pubmed-mcp-server) |
1481
- | **BioMCP** | GenomOncology | PubMed + ClinicalTrials + MyVariant | [GitHub](https://github.com/genomoncology/biomcp) |
1482
- | **PubMed-MCP-Server** | JackKuo666 | Basic search, metadata access | [GitHub](https://github.com/JackKuo666/PubMed-MCP-Server) |
1483
-
1484
- ### Web Search Options
1485
-
1486
- | Tool | Free Tier | API Key | Async Support |
1487
- |------|-----------|---------|---------------|
1488
- | **Brave Search** | 2000/month | Required | Yes (httpx) |
1489
- | **DuckDuckGo** | Unlimited | No | Yes (duckduckgo-search) |
1490
- | **SerpAPI** | None | Required | Yes |
1491
-
1492
- **Recommended:** Start with DuckDuckGo (free, no key), upgrade to Brave for production.
1493
-
1494
- ```python
1495
- # DuckDuckGo async search (no API key needed!)
1496
- from duckduckgo_search import DDGS
1497
-
1498
- async def search_ddg(query: str, max_results: int = 10) -> List[Dict]:
1499
- with DDGS() as ddgs:
1500
- results = list(ddgs.text(query, max_results=max_results))
1501
- return [{"title": r["title"], "url": r["href"], "description": r["body"]} for r in results]
1502
- ```
1503
-
1504
- ---
1505
-
1506
- **Document Status**: Official Architecture Spec
1507
- **Review Score**: 100/100 (Ironclad Gucci Banger Edition)
1508
- **Sections**: 17 design patterns + data models appendix + reference repos + stretch goals
1509
- **Last Updated**: November 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/architecture/graph-orchestration.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Graph Orchestration Architecture
2
+
3
+ ## Overview
4
+
5
+ Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
6
+
7
+ ## Graph Structure
8
+
9
+ ### Nodes
10
+
11
+ Graph nodes represent different stages in the research workflow:
12
+
13
+ 1. **Agent Nodes**: Execute Pydantic AI agents
14
+ - Input: Prompt/query
15
+ - Output: Structured or unstructured response
16
+ - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
17
+
18
+ 2. **State Nodes**: Update or read workflow state
19
+ - Input: Current state
20
+ - Output: Updated state
21
+ - Examples: Update evidence, update conversation history
22
+
23
+ 3. **Decision Nodes**: Make routing decisions based on conditions
24
+ - Input: Current state/results
25
+ - Output: Next node ID
26
+ - Examples: Continue research vs. complete research
27
+
28
+ 4. **Parallel Nodes**: Execute multiple nodes concurrently
29
+ - Input: List of node IDs
30
+ - Output: Aggregated results
31
+ - Examples: Parallel iterative research loops
32
+
33
+ ### Edges
34
+
35
+ Edges define transitions between nodes:
36
+
37
+ 1. **Sequential Edges**: Always traversed (no condition)
38
+ - From: Source node
39
+ - To: Target node
40
+ - Condition: None (always True)
41
+
42
+ 2. **Conditional Edges**: Traversed based on condition
43
+ - From: Source node
44
+ - To: Target node
45
+ - Condition: Callable that returns bool
46
+ - Example: If research complete → go to writer, else → continue loop
47
+
48
+ 3. **Parallel Edges**: Used for parallel execution branches
49
+ - From: Parallel node
50
+ - To: Multiple target nodes
51
+ - Execution: All targets run concurrently
52
+
53
+ ## Graph Patterns
54
+
55
+ ### Iterative Research Graph
56
+
57
+ ```
58
+ [Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
59
+ ↓ No ↓ Yes
60
+ [Tool Selector] [Writer]
61
+
62
+ [Execute Tools] → [Loop Back]
63
+ ```
64
+
65
+ ### Deep Research Graph
66
+
67
+ ```
68
+ [Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
69
+ ↓ ↓ ↓
70
+ [Loop1] [Loop2] [Loop3]
71
+ ```
72
+
73
+ ## State Management
74
+
75
+ State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
76
+
77
+ - **Evidence**: Collected evidence from searches
78
+ - **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
79
+ - **Embedding Service**: For semantic search
80
+
81
+ State transitions occur at state nodes, which update the global workflow state.
82
+
83
+ ## Execution Flow
84
+
85
+ 1. **Graph Construction**: Build graph from nodes and edges
86
+ 2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
87
+ 3. **Graph Execution**: Traverse graph from entry node
88
+ 4. **Node Execution**: Execute each node based on type
89
+ 5. **Edge Evaluation**: Determine next node(s) based on edges
90
+ 6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
91
+ 7. **State Updates**: Update state at state nodes
92
+ 8. **Event Streaming**: Yield events during execution for UI
93
+
94
+ ## Conditional Routing
95
+
96
+ Decision nodes evaluate conditions and return next node IDs:
97
+
98
+ - **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
99
+ - **Budget Decision**: If budget exceeded → exit, else → continue
100
+ - **Iteration Decision**: If max iterations → exit, else → continue
101
+
102
+ ## Parallel Execution
103
+
104
+ Parallel nodes execute multiple nodes concurrently:
105
+
106
+ - Each parallel branch runs independently
107
+ - Results are aggregated after all branches complete
108
+ - State is synchronized after parallel execution
109
+ - Errors in one branch don't stop other branches
110
+
111
+ ## Budget Enforcement
112
+
113
+ Budget constraints are enforced at decision nodes:
114
+
115
+ - **Token Budget**: Track LLM token usage
116
+ - **Time Budget**: Track elapsed time
117
+ - **Iteration Budget**: Track iteration count
118
+
119
+ If any budget is exceeded, execution routes to exit node.
120
+
121
+ ## Error Handling
122
+
123
+ Errors are handled at multiple levels:
124
+
125
+ 1. **Node Level**: Catch errors in individual node execution
126
+ 2. **Graph Level**: Handle errors during graph traversal
127
+ 3. **State Level**: Rollback state changes on error
128
+
129
+ Errors are logged and yield error events for UI.
130
+
131
+ ## Backward Compatibility
132
+
133
+ Graph execution is optional via feature flag:
134
+
135
+ - `USE_GRAPH_EXECUTION=true`: Use graph-based execution
136
+ - `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
137
+
138
+ This allows gradual migration and fallback if needed.
139
+
140
+
141
+
142
+
143
+
144
+
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+
docs/architecture/graph_orchestration.md CHANGED
@@ -137,6 +137,14 @@ Graph execution is optional via feature flag:
137
 
138
  This allows gradual migration and fallback if needed.
139
 
 
 
 
 
 
 
 
 
140
 
141
 
142
 
 
137
 
138
  This allows gradual migration and fallback if needed.
139
 
140
+ ## See Also
141
+
142
+ - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
143
+ - [Workflows](workflows.md) - Workflow diagrams and patterns
144
+ - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
145
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
146
+
147
+
148
 
149
 
150
 
docs/architecture/middleware.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Middleware Architecture
2
+
3
+ DeepCritical uses middleware for state management, budget tracking, and workflow coordination.
4
+
5
+ ## State Management
6
+
7
+ ### WorkflowState
8
+
9
+ **File**: `src/middleware/state_machine.py`
10
+
11
+ **Purpose**: Thread-safe state management for research workflows
12
+
13
+ **Implementation**: Uses `ContextVar` for thread-safe isolation
14
+
15
+ **State Components**:
16
+ - `evidence: list[Evidence]`: Collected evidence from searches
17
+ - `conversation: Conversation`: Iteration history (gaps, tool calls, findings, thoughts)
18
+ - `embedding_service: Any`: Embedding service for semantic search
19
+
20
+ **Methods**:
21
+ - `add_evidence(evidence: Evidence)`: Adds evidence with URL-based deduplication
22
+ - `async search_related(query: str, top_k: int = 5) -> list[Evidence]`: Semantic search
23
+
24
+ **Initialization**:
25
+ ```python
26
+ from src.middleware.state_machine import init_workflow_state
27
+
28
+ init_workflow_state(embedding_service)
29
+ ```
30
+
31
+ **Access**:
32
+ ```python
33
+ from src.middleware.state_machine import get_workflow_state
34
+
35
+ state = get_workflow_state() # Auto-initializes if missing
36
+ ```
37
+
38
+ ## Workflow Manager
39
+
40
+ **File**: `src/middleware/workflow_manager.py`
41
+
42
+ **Purpose**: Coordinates parallel research loops
43
+
44
+ **Methods**:
45
+ - `add_loop(loop: ResearchLoop)`: Add a research loop to manage
46
+ - `async run_loops_parallel() -> list[ResearchLoop]`: Run all loops in parallel
47
+ - `update_loop_status(loop_id: str, status: str)`: Update loop status
48
+ - `sync_loop_evidence_to_state()`: Synchronize evidence from loops to global state
49
+
50
+ **Features**:
51
+ - Uses `asyncio.gather()` for parallel execution
52
+ - Handles errors per loop (doesn't fail all if one fails)
53
+ - Tracks loop status: `pending`, `running`, `completed`, `failed`, `cancelled`
54
+ - Evidence deduplication across parallel loops
55
+
56
+ **Usage**:
57
+ ```python
58
+ from src.middleware.workflow_manager import WorkflowManager
59
+
60
+ manager = WorkflowManager()
61
+ manager.add_loop(loop1)
62
+ manager.add_loop(loop2)
63
+ completed_loops = await manager.run_loops_parallel()
64
+ ```
65
+
66
+ ## Budget Tracker
67
+
68
+ **File**: `src/middleware/budget_tracker.py`
69
+
70
+ **Purpose**: Tracks and enforces resource limits
71
+
72
+ **Budget Components**:
73
+ - **Tokens**: LLM token usage
74
+ - **Time**: Elapsed time in seconds
75
+ - **Iterations**: Number of iterations
76
+
77
+ **Methods**:
78
+ - `create_budget(token_limit, time_limit_seconds, iterations_limit) -> BudgetStatus`
79
+ - `add_tokens(tokens: int)`: Add token usage
80
+ - `start_timer()`: Start time tracking
81
+ - `update_timer()`: Update elapsed time
82
+ - `increment_iteration()`: Increment iteration count
83
+ - `check_budget() -> BudgetStatus`: Check current budget status
84
+ - `can_continue() -> bool`: Check if research can continue
85
+
86
+ **Token Estimation**:
87
+ - `estimate_tokens(text: str) -> int`: ~4 chars per token
88
+ - `estimate_llm_call_tokens(prompt: str, response: str) -> int`: Estimate LLM call tokens
89
+
90
+ **Usage**:
91
+ ```python
92
+ from src.middleware.budget_tracker import BudgetTracker
93
+
94
+ tracker = BudgetTracker()
95
+ budget = tracker.create_budget(
96
+ token_limit=100000,
97
+ time_limit_seconds=600,
98
+ iterations_limit=10
99
+ )
100
+ tracker.start_timer()
101
+ # ... research operations ...
102
+ if not tracker.can_continue():
103
+ # Budget exceeded, stop research
104
+ pass
105
+ ```
106
+
107
+ ## Models
108
+
109
+ All middleware models are defined in `src/utils/models.py`:
110
+
111
+ - `IterationData`: Data for a single iteration
112
+ - `Conversation`: Conversation history with iterations
113
+ - `ResearchLoop`: Research loop state and configuration
114
+ - `BudgetStatus`: Current budget status
115
+
116
+ ## Thread Safety
117
+
118
+ All middleware components use `ContextVar` for thread-safe isolation:
119
+
120
+ - Each request/thread has its own workflow state
121
+ - No global mutable state
122
+ - Safe for concurrent requests
123
+
124
+ ## See Also
125
+
126
+ - [Orchestrators](orchestrators.md) - How middleware is used in orchestration
127
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
128
+ - [Contributing - Code Style](../contributing/code-style.md) - Development guidelines
129
+
130
+
131
+
132
+
docs/architecture/orchestrators.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Orchestrators Architecture
2
+
3
+ DeepCritical supports multiple orchestration patterns for research workflows.
4
+
5
+ ## Research Flows
6
+
7
+ ### IterativeResearchFlow
8
+
9
+ **File**: `src/orchestrator/research_flow.py`
10
+
11
+ **Pattern**: Generate observations → Evaluate gaps → Select tools → Execute → Judge → Continue/Complete
12
+
13
+ **Agents Used**:
14
+ - `KnowledgeGapAgent`: Evaluates research completeness
15
+ - `ToolSelectorAgent`: Selects tools for addressing gaps
16
+ - `ThinkingAgent`: Generates observations
17
+ - `WriterAgent`: Creates final report
18
+ - `JudgeHandler`: Assesses evidence sufficiency
19
+
20
+ **Features**:
21
+ - Tracks iterations, time, budget
22
+ - Supports graph execution (`use_graph=True`) and agent chains (`use_graph=False`)
23
+ - Iterates until research complete or constraints met
24
+
25
+ **Usage**:
26
+ ```python
27
+ from src.orchestrator.research_flow import IterativeResearchFlow
28
+
29
+ flow = IterativeResearchFlow(
30
+ search_handler=search_handler,
31
+ judge_handler=judge_handler,
32
+ use_graph=False
33
+ )
34
+
35
+ async for event in flow.run(query):
36
+ # Handle events
37
+ pass
38
+ ```
39
+
40
+ ### DeepResearchFlow
41
+
42
+ **File**: `src/orchestrator/research_flow.py`
43
+
44
+ **Pattern**: Planner → Parallel iterative loops per section → Synthesizer
45
+
46
+ **Agents Used**:
47
+ - `PlannerAgent`: Breaks query into report sections
48
+ - `IterativeResearchFlow`: Per-section research (parallel)
49
+ - `LongWriterAgent` or `ProofreaderAgent`: Final synthesis
50
+
51
+ **Features**:
52
+ - Uses `WorkflowManager` for parallel execution
53
+ - Budget tracking per section and globally
54
+ - State synchronization across parallel loops
55
+ - Supports graph execution and agent chains
56
+
57
+ **Usage**:
58
+ ```python
59
+ from src.orchestrator.research_flow import DeepResearchFlow
60
+
61
+ flow = DeepResearchFlow(
62
+ search_handler=search_handler,
63
+ judge_handler=judge_handler,
64
+ use_graph=True
65
+ )
66
+
67
+ async for event in flow.run(query):
68
+ # Handle events
69
+ pass
70
+ ```
71
+
72
+ ## Graph Orchestrator
73
+
74
+ **File**: `src/orchestrator/graph_orchestrator.py`
75
+
76
+ **Purpose**: Graph-based execution using Pydantic AI agents as nodes
77
+
78
+ **Features**:
79
+ - Uses Pydantic AI Graphs (when available) or agent chains (fallback)
80
+ - Routes based on research mode (iterative/deep/auto)
81
+ - Streams `AgentEvent` objects for UI
82
+
83
+ **Node Types**:
84
+ - **Agent Nodes**: Execute Pydantic AI agents
85
+ - **State Nodes**: Update or read workflow state
86
+ - **Decision Nodes**: Make routing decisions
87
+ - **Parallel Nodes**: Execute multiple nodes concurrently
88
+
89
+ **Edge Types**:
90
+ - **Sequential Edges**: Always traversed
91
+ - **Conditional Edges**: Traversed based on condition
92
+ - **Parallel Edges**: Used for parallel execution branches
93
+
94
+ ## Orchestrator Factory
95
+
96
+ **File**: `src/orchestrator_factory.py`
97
+
98
+ **Purpose**: Factory for creating orchestrators
99
+
100
+ **Modes**:
101
+ - **Simple**: Legacy orchestrator (backward compatible)
102
+ - **Advanced**: Magentic orchestrator (requires OpenAI API key)
103
+ - **Auto-detect**: Chooses based on API key availability
104
+
105
+ **Usage**:
106
+ ```python
107
+ from src.orchestrator_factory import create_orchestrator
108
+
109
+ orchestrator = create_orchestrator(
110
+ search_handler=search_handler,
111
+ judge_handler=judge_handler,
112
+ config={},
113
+ mode="advanced" # or "simple" or None for auto-detect
114
+ )
115
+ ```
116
+
117
+ ## Magentic Orchestrator
118
+
119
+ **File**: `src/orchestrator_magentic.py`
120
+
121
+ **Purpose**: Multi-agent coordination using Microsoft Agent Framework
122
+
123
+ **Features**:
124
+ - Uses `agent-framework-core`
125
+ - ChatAgent pattern with internal LLMs per agent
126
+ - `MagenticBuilder` with participants: searcher, hypothesizer, judge, reporter
127
+ - Manager orchestrates agents via `OpenAIChatClient`
128
+ - Requires OpenAI API key (function calling support)
129
+ - Event-driven: converts Magentic events to `AgentEvent` for UI streaming
130
+
131
+ **Requirements**:
132
+ - `agent-framework-core` package
133
+ - OpenAI API key
134
+
135
+ ## Hierarchical Orchestrator
136
+
137
+ **File**: `src/orchestrator_hierarchical.py`
138
+
139
+ **Purpose**: Hierarchical orchestrator using middleware and sub-teams
140
+
141
+ **Features**:
142
+ - Uses `SubIterationMiddleware` with `ResearchTeam` and `LLMSubIterationJudge`
143
+ - Adapts Magentic ChatAgent to `SubIterationTeam` protocol
144
+ - Event-driven via `asyncio.Queue` for coordination
145
+ - Supports sub-iteration patterns for complex research tasks
146
+
147
+ ## Legacy Simple Mode
148
+
149
+ **File**: `src/legacy_orchestrator.py`
150
+
151
+ **Purpose**: Linear search-judge-synthesize loop
152
+
153
+ **Features**:
154
+ - Uses `SearchHandlerProtocol` and `JudgeHandlerProtocol`
155
+ - Generator-based design yielding `AgentEvent` objects
156
+ - Backward compatibility for simple use cases
157
+
158
+ ## State Initialization
159
+
160
+ All orchestrators must initialize workflow state:
161
+
162
+ ```python
163
+ from src.middleware.state_machine import init_workflow_state
164
+ from src.services.embeddings import get_embedding_service
165
+
166
+ embedding_service = get_embedding_service()
167
+ init_workflow_state(embedding_service)
168
+ ```
169
+
170
+ ## Event Streaming
171
+
172
+ All orchestrators yield `AgentEvent` objects:
173
+
174
+ **Event Types**:
175
+ - `started`: Research started
176
+ - `search_complete`: Search completed
177
+ - `judge_complete`: Evidence evaluation completed
178
+ - `hypothesizing`: Generating hypotheses
179
+ - `synthesizing`: Synthesizing results
180
+ - `complete`: Research completed
181
+ - `error`: Error occurred
182
+
183
+ **Event Structure**:
184
+ ```python
185
+ class AgentEvent:
186
+ type: str
187
+ iteration: int | None
188
+ data: dict[str, Any]
189
+ ```
190
+
191
+ ## See Also
192
+
193
+ - [Graph Orchestration](graph-orchestration.md) - Graph-based execution details
194
+ - [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
195
+ - [Workflows](workflows.md) - Workflow diagrams and patterns
196
+ - [Workflow Diagrams](workflow-diagrams.md) - Detailed workflow diagrams
197
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
198
+
docs/architecture/overview.md DELETED
@@ -1,474 +0,0 @@
1
- # DeepCritical: Medical Drug Repurposing Research Agent
2
- ## Project Overview
3
-
4
- ---
5
-
6
- ## Executive Summary
7
-
8
- **DeepCritical** is a deep research agent designed to accelerate medical drug repurposing research by autonomously searching, analyzing, and synthesizing evidence from multiple biomedical databases.
9
-
10
- ### The Problem We Solve
11
-
12
- Drug repurposing - finding new therapeutic uses for existing FDA-approved drugs - can take years of manual literature review. Researchers must:
13
- - Search thousands of papers across multiple databases
14
- - Identify molecular mechanisms
15
- - Find relevant clinical trials
16
- - Assess safety profiles
17
- - Synthesize evidence into actionable insights
18
-
19
- **DeepCritical automates this process from hours to minutes.**
20
-
21
- ### What Is Drug Repurposing?
22
-
23
- **Simple Explanation:**
24
- Using existing approved drugs to treat NEW diseases they weren't originally designed for.
25
-
26
- **Real Examples:**
27
- - **Viagra** (sildenafil): Originally for heart disease → Now treats erectile dysfunction
28
- - **Thalidomide**: Once banned → Now treats multiple myeloma
29
- - **Aspirin**: Pain reliever → Heart attack prevention
30
- - **Metformin**: Diabetes drug → Being tested for aging/longevity
31
-
32
- **Why It Matters:**
33
- - Faster than developing new drugs (years vs decades)
34
- - Cheaper (known safety profiles)
35
- - Lower risk (already FDA approved)
36
- - Immediate patient benefit potential
37
-
38
- ---
39
-
40
- ## Core Use Case
41
-
42
- ### Primary Query Type
43
- > "What existing drugs might help treat [disease/condition]?"
44
-
45
- ### Example Queries
46
-
47
- 1. **Long COVID Fatigue**
48
- - Query: "What existing drugs might help treat long COVID fatigue?"
49
- - Agent searches: PubMed, clinical trials, drug databases
50
- - Output: List of candidate drugs with mechanisms + evidence + citations
51
-
52
- 2. **Alzheimer's Disease**
53
- - Query: "Find existing drugs that target beta-amyloid pathways"
54
- - Agent identifies: Disease mechanisms → Drug candidates → Clinical evidence
55
- - Output: Comprehensive research report with drug candidates
56
-
57
- 3. **Rare Disease Treatment**
58
- - Query: "What drugs might help with fibrodysplasia ossificans progressiva?"
59
- - Agent finds: Similar conditions → Shared pathways → Potential treatments
60
- - Output: Evidence-based treatment suggestions
61
-
62
- ---
63
-
64
- ## System Architecture
65
-
66
- ### High-Level Design (Phases 1-8)
67
-
68
- ```text
69
- User Query
70
-
71
- Gradio UI (Phase 4)
72
-
73
- Magentic Manager (Phase 5) ← LLM-powered coordinator
74
- ├── SearchAgent (Phase 2+5) ←→ PubMed + Web + VectorDB (Phase 6)
75
- ├── HypothesisAgent (Phase 7) ←→ Mechanistic Reasoning
76
- ├── JudgeAgent (Phase 3+5) ←→ Evidence Assessment
77
- └── ReportAgent (Phase 8) ←→ Final Synthesis
78
-
79
- Structured Research Report
80
- ```
81
-
82
- ### Key Components
83
-
84
- 1. **Magentic Manager (Orchestrator)**
85
- - LLM-powered multi-agent coordinator
86
- - Dynamic planning and agent selection
87
- - Built-in stall detection and replanning
88
- - Microsoft Agent Framework integration
89
-
90
- 2. **SearchAgent (Phase 2+5+6)**
91
- - PubMed E-utilities search
92
- - DuckDuckGo web search
93
- - Semantic search via ChromaDB (Phase 6)
94
- - Evidence deduplication
95
-
96
- 3. **HypothesisAgent (Phase 7)**
97
- - Generates Drug → Target → Pathway → Effect hypotheses
98
- - Guides targeted searches
99
- - Scientific reasoning about mechanisms
100
-
101
- 4. **JudgeAgent (Phase 3+5)**
102
- - LLM-based evidence assessment
103
- - Mechanism score + Clinical score
104
- - Recommends continue/synthesize
105
- - Generates refined search queries
106
-
107
- 5. **ReportAgent (Phase 8)**
108
- - Structured scientific reports
109
- - Executive summary, methodology
110
- - Hypotheses tested with evidence counts
111
- - Proper citations and limitations
112
-
113
- 6. **Gradio UI (Phase 4)**
114
- - Chat interface for questions
115
- - Real-time progress via events
116
- - Mode toggle (Simple/Magentic)
117
- - Formatted markdown output
118
-
119
- ---
120
-
121
- ## Design Patterns
122
-
123
- ### 1. Search-and-Judge Loop (Primary Pattern)
124
-
125
- ```python
126
- def research(question: str) -> Report:
127
- context = []
128
- for iteration in range(max_iterations):
129
- # SEARCH: Query relevant tools
130
- results = search_tools(question, context)
131
- context.extend(results)
132
-
133
- # JUDGE: Evaluate quality
134
- if judge.is_sufficient(question, context):
135
- break
136
-
137
- # REFINE: Adjust search strategy
138
- query = refine_query(question, context)
139
-
140
- # SYNTHESIZE: Generate report
141
- return synthesize_report(question, context)
142
- ```
143
-
144
- **Why This Pattern:**
145
- - Simple to implement and debug
146
- - Clear loop termination conditions
147
- - Iterative improvement of search quality
148
- - Balances depth vs speed
149
-
150
- ### 2. Multi-Tool Orchestration
151
-
152
- ```
153
- Question → Agent decides which tools to use
154
-
155
- ┌───┴────┬─────────┬──────────┐
156
- ↓ ↓ ↓ ↓
157
- PubMed Web Search Trials DB Drug DB
158
- ↓ ↓ ↓ ↓
159
- └───┬────┴─────────┴──��───────┘
160
-
161
- Aggregate Results → Judge
162
- ```
163
-
164
- **Why This Pattern:**
165
- - Different sources provide different evidence types
166
- - Parallel tool execution (when possible)
167
- - Comprehensive coverage
168
-
169
- ### 3. LLM-as-Judge with Token Budget
170
-
171
- **Dual Stopping Conditions:**
172
- - **Smart Stop**: LLM judge says "we have sufficient evidence"
173
- - **Hard Stop**: Token budget exhausted OR max iterations reached
174
-
175
- **Why Both:**
176
- - Judge enables early exit when answer is good
177
- - Budget prevents runaway costs
178
- - Iterations prevent infinite loops
179
-
180
- ### 4. Stateful Checkpointing
181
-
182
- ```
183
- .deepresearch/
184
- ├── state/
185
- │ └── query_123.json # Current research state
186
- ├── checkpoints/
187
- │ └── query_123_iter3/ # Checkpoint at iteration 3
188
- └── workspace/
189
- └── query_123/ # Downloaded papers, data
190
- ```
191
-
192
- **Why This Pattern:**
193
- - Resume interrupted research
194
- - Debugging and analysis
195
- - Cost savings (don't re-search)
196
-
197
- ---
198
-
199
- ## Component Breakdown
200
-
201
- ### Agent (Orchestrator)
202
- - **Responsibility**: Coordinate research process
203
- - **Size**: ~100 lines
204
- - **Key Methods**:
205
- - `research(question)` - Main entry point
206
- - `plan_search_strategy()` - Decide what to search
207
- - `execute_search()` - Run tool queries
208
- - `evaluate_progress()` - Call judge
209
- - `synthesize_findings()` - Generate report
210
-
211
- ### Tools
212
- - **Responsibility**: Interface with external data sources
213
- - **Size**: ~50 lines per tool
214
- - **Implementations**:
215
- - `PubMedTool` - Search biomedical literature
216
- - `WebSearchTool` - General medical information
217
- - `ClinicalTrialsTool` - Trial data (optional)
218
- - `DrugInfoTool` - FDA drug database (optional)
219
-
220
- ### Judge
221
- - **Responsibility**: Evaluate evidence quality
222
- - **Size**: ~50 lines
223
- - **Key Methods**:
224
- - `is_sufficient(question, evidence)` → bool
225
- - `assess_quality(evidence)` → score
226
- - `identify_gaps(question, evidence)` → missing_info
227
-
228
- ### Gradio App
229
- - **Responsibility**: User interface
230
- - **Size**: ~50 lines
231
- - **Features**:
232
- - Text input for questions
233
- - Progress indicators
234
- - Formatted output with citations
235
- - Download research report
236
-
237
- ---
238
-
239
- ## Technical Stack
240
-
241
- ### Core Dependencies
242
- ```toml
243
- [dependencies]
244
- python = ">=3.10"
245
- pydantic = "^2.7"
246
- pydantic-ai = "^0.0.16"
247
- fastmcp = "^0.1.0"
248
- gradio = "^5.0"
249
- beautifulsoup4 = "^4.12"
250
- httpx = "^0.27"
251
- ```
252
-
253
- ### Optional Enhancements
254
- - `modal` - For GPU-accelerated local LLM
255
- - `fastmcp` - MCP server integration
256
- - `sentence-transformers` - Semantic search
257
- - `faiss-cpu` - Vector similarity
258
-
259
- ### Tool APIs & Rate Limits
260
-
261
- | API | Cost | Rate Limit | API Key? | Notes |
262
- |-----|------|------------|----------|-------|
263
- | **PubMed E-utilities** | Free | 3/sec (no key), 10/sec (with key) | Optional | Register at NCBI for higher limits |
264
- | **Brave Search API** | Free tier | 2000/month free | Required | Primary web search |
265
- | **DuckDuckGo** | Free | Unofficial, ~1/sec | No | Fallback web search |
266
- | **ClinicalTrials.gov** | Free | 100/min | No | Stretch goal |
267
- | **OpenFDA** | Free | 240/min (no key), 120K/day (with key) | Optional | Drug info |
268
-
269
- **Web Search Strategy (Priority Order):**
270
- 1. **Brave Search API** (free tier: 2000 queries/month) - Primary
271
- 2. **DuckDuckGo** (unofficial, no API key) - Fallback
272
- 3. **SerpAPI** ($50/month) - Only if free options fail
273
-
274
- **Why NOT SerpAPI first?**
275
- - Costs money (hackathon budget = $0)
276
- - Free alternatives work fine for demo
277
- - Can upgrade later if needed
278
-
279
- ---
280
-
281
- ## Success Criteria
282
-
283
- ### Phase 1-5 (MVP) ✅ COMPLETE
284
- **Completed in ONE DAY:**
285
- - [x] User can ask drug repurposing question
286
- - [x] Agent searches PubMed (async)
287
- - [x] Agent searches web (DuckDuckGo)
288
- - [x] LLM judge evaluates evidence quality
289
- - [x] System respects token budget and iterations
290
- - [x] Output includes drug candidates + citations
291
- - [x] Works end-to-end for demo query
292
- - [x] Gradio UI with streaming progress
293
- - [x] Magentic multi-agent orchestration
294
- - [x] 38 unit tests passing
295
- - [x] CI/CD pipeline green
296
-
297
- ### Hackathon Submission ✅ COMPLETE
298
- - [x] Gradio UI deployed on HuggingFace Spaces
299
- - [x] Example queries working and tested
300
- - [x] Architecture documentation
301
- - [x] README with setup instructions
302
-
303
- ### Phase 6-8 (Enhanced)
304
- **Specs ready for implementation:**
305
- - [ ] Embeddings & Semantic Search (Phase 6)
306
- - [ ] Hypothesis Agent (Phase 7)
307
- - [ ] Report Agent (Phase 8)
308
-
309
- ### What's EXPLICITLY Out of Scope
310
- **NOT building (to stay focused):**
311
- - ❌ User authentication
312
- - ❌ Database storage of queries
313
- - ❌ Multi-user support
314
- - ❌ Payment/billing
315
- - ❌ Production monitoring
316
- - ❌ Mobile UI
317
-
318
- ---
319
-
320
- ## Implementation Timeline
321
-
322
- ### Day 1 (Today): Architecture & Setup
323
- - [x] Define use case (drug repurposing) ✅
324
- - [x] Write architecture docs ✅
325
- - [ ] Create project structure
326
- - [ ] First PR: Structure + Docs
327
-
328
- ### Day 2: Core Agent Loop
329
- - [ ] Implement basic orchestrator
330
- - [ ] Add PubMed search tool
331
- - [ ] Simple judge (keyword-based)
332
- - [ ] Test with 1 query
333
-
334
- ### Day 3: Intelligence Layer
335
- - [ ] Upgrade to LLM judge
336
- - [ ] Add web search tool
337
- - [ ] Token budget tracking
338
- - [ ] Test with multiple queries
339
-
340
- ### Day 4: UI & Integration
341
- - [ ] Build Gradio interface
342
- - [ ] Wire up agent to UI
343
- - [ ] Add progress indicators
344
- - [ ] Format output nicely
345
-
346
- ### Day 5: Polish & Extend
347
- - [ ] Add more tools (clinical trials)
348
- - [ ] Improve judge prompts
349
- - [ ] Checkpoint system
350
- - [ ] Error handling
351
-
352
- ### Day 6: Deploy & Document
353
- - [ ] Deploy to HuggingFace Spaces
354
- - [ ] Record demo video
355
- - [ ] Write submission materials
356
- - [ ] Final testing
357
-
358
- ---
359
-
360
- ## Questions This Document Answers
361
-
362
- ### For The Maintainer
363
-
364
- **Q: "What should our design pattern be?"**
365
- A: Search-and-judge loop with multi-tool orchestration (detailed in Design Patterns section)
366
-
367
- **Q: "Should we use LLM-as-judge or token budget?"**
368
- A: Both - judge for smart stopping, budget for cost control
369
-
370
- **Q: "What's the break pattern?"**
371
- A: Three conditions: judge approval, token limit, or max iterations (whichever comes first)
372
-
373
- **Q: "What components do we need?"**
374
- A: Agent orchestrator, tools (PubMed/web), judge, Gradio UI (see Component Breakdown)
375
-
376
- ### For The Team
377
-
378
- **Q: "What are we actually building?"**
379
- A: Medical drug repurposing research agent (see Core Use Case)
380
-
381
- **Q: "How complex should it be?"**
382
- A: Simple but complete - ~300 lines of core code (see Component sizes)
383
-
384
- **Q: "What's the timeline?"**
385
- A: 6 days, MVP by Day 3, polish Days 4-6 (see Implementation Timeline)
386
-
387
- **Q: "What datasets/APIs do we use?"**
388
- A: PubMed (free), web search, clinical trials.gov (see Tool APIs)
389
-
390
- ---
391
-
392
- ## Next Steps
393
-
394
- 1. **Review this document** - Team feedback on architecture
395
- 2. **Finalize design** - Incorporate feedback
396
- 3. **Create project structure** - Scaffold repository
397
- 4. **Move to proper docs** - `docs/architecture/` folder
398
- 5. **Open first PR** - Structure + Documentation
399
- 6. **Start implementation** - Day 2 onward
400
-
401
- ---
402
-
403
- ## Notes & Decisions
404
-
405
- ### Why Drug Repurposing?
406
- - Clear, impressive use case
407
- - Real-world medical impact
408
- - Good data availability (PubMed, trials)
409
- - Easy to explain (Viagra example!)
410
- - Physician on team ✅
411
-
412
- ### Why Simple Architecture?
413
- - 6-day timeline
414
- - Need working end-to-end system
415
- - Hackathon judges value "works" over "complex"
416
- - Can extend later if successful
417
-
418
- ### Why These Tools First?
419
- - PubMed: Best biomedical literature source
420
- - Web search: General medical knowledge
421
- - Clinical trials: Evidence of actual testing
422
- - Others: Nice-to-have, not critical for MVP
423
-
424
- ---
425
-
426
- ---
427
-
428
- ## Appendix A: Demo Queries (Pre-tested)
429
-
430
- These queries will be used for demo and testing. They're chosen because:
431
- 1. They have good PubMed coverage
432
- 2. They're medically interesting
433
- 3. They show the system's capabilities
434
-
435
- ### Primary Demo Query
436
- ```
437
- "What existing drugs might help treat long COVID fatigue?"
438
- ```
439
- **Expected candidates**: CoQ10, Low-dose Naltrexone, Modafinil
440
- **Expected sources**: 20+ PubMed papers, 2-3 clinical trials
441
-
442
- ### Secondary Demo Queries
443
- ```
444
- "Find existing drugs that might slow Alzheimer's progression"
445
- "What approved medications could help with fibromyalgia pain?"
446
- "Which diabetes drugs show promise for cancer treatment?"
447
- ```
448
-
449
- ### Why These Queries?
450
- - Represent real clinical needs
451
- - Have substantial literature
452
- - Show diverse drug classes
453
- - Physician on team can validate results
454
-
455
- ---
456
-
457
- ## Appendix B: Risk Assessment
458
-
459
- | Risk | Likelihood | Impact | Mitigation |
460
- |------|------------|--------|------------|
461
- | PubMed rate limiting | Medium | High | Implement caching, respect 3/sec |
462
- | Web search API fails | Low | Medium | DuckDuckGo fallback |
463
- | LLM costs exceed budget | Medium | Medium | Hard token cap at 50K |
464
- | Judge quality poor | Medium | High | Pre-test prompts, iterate |
465
- | HuggingFace deploy issues | Low | High | Test deployment Day 4 |
466
- | Demo crashes live | Medium | High | Pre-recorded backup video |
467
-
468
- ---
469
-
470
- ---
471
-
472
- **Document Status**: Official Architecture Spec
473
- **Review Score**: 98/100
474
- **Last Updated**: November 2025
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/architecture/services.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Services Architecture
2
+
3
+ DeepCritical provides several services for embeddings, RAG, and statistical analysis.
4
+
5
+ ## Embedding Service
6
+
7
+ **File**: `src/services/embeddings.py`
8
+
9
+ **Purpose**: Local sentence-transformers for semantic search and deduplication
10
+
11
+ **Features**:
12
+ - **No API Key Required**: Uses local sentence-transformers models
13
+ - **Async-Safe**: All operations use `run_in_executor()` to avoid blocking
14
+ - **ChromaDB Storage**: Vector storage for embeddings
15
+ - **Deduplication**: 0.85 similarity threshold (85% similarity = duplicate)
16
+
17
+ **Model**: Configurable via `settings.local_embedding_model` (default: `all-MiniLM-L6-v2`)
18
+
19
+ **Methods**:
20
+ - `async def embed(text: str) -> list[float]`: Generate embeddings
21
+ - `async def embed_batch(texts: list[str]) -> list[list[float]]`: Batch embedding
22
+ - `async def similarity(text1: str, text2: str) -> float`: Calculate similarity
23
+ - `async def find_duplicates(texts: list[str], threshold: float = 0.85) -> list[tuple[int, int]]`: Find duplicates
24
+
25
+ **Usage**:
26
+ ```python
27
+ from src.services.embeddings import get_embedding_service
28
+
29
+ service = get_embedding_service()
30
+ embedding = await service.embed("text to embed")
31
+ ```
32
+
33
+ ## LlamaIndex RAG Service
34
+
35
+ **File**: `src/services/rag.py`
36
+
37
+ **Purpose**: Retrieval-Augmented Generation using LlamaIndex
38
+
39
+ **Features**:
40
+ - **OpenAI Embeddings**: Requires `OPENAI_API_KEY`
41
+ - **ChromaDB Storage**: Vector database for document storage
42
+ - **Metadata Preservation**: Preserves source, title, URL, date, authors
43
+ - **Lazy Initialization**: Graceful fallback if OpenAI key not available
44
+
45
+ **Methods**:
46
+ - `async def ingest_evidence(evidence: list[Evidence]) -> None`: Ingest evidence into RAG
47
+ - `async def retrieve(query: str, top_k: int = 5) -> list[Document]`: Retrieve relevant documents
48
+ - `async def query(query: str, top_k: int = 5) -> str`: Query with RAG
49
+
50
+ **Usage**:
51
+ ```python
52
+ from src.services.rag import get_rag_service
53
+
54
+ service = get_rag_service()
55
+ if service:
56
+ documents = await service.retrieve("query", top_k=5)
57
+ ```
58
+
59
+ ## Statistical Analyzer
60
+
61
+ **File**: `src/services/statistical_analyzer.py`
62
+
63
+ **Purpose**: Secure execution of AI-generated statistical code
64
+
65
+ **Features**:
66
+ - **Modal Sandbox**: Secure, isolated execution environment
67
+ - **Code Generation**: Generates Python code via LLM
68
+ - **Library Pinning**: Version-pinned libraries in `SANDBOX_LIBRARIES`
69
+ - **Network Isolation**: `block_network=True` by default
70
+
71
+ **Libraries Available**:
72
+ - pandas, numpy, scipy
73
+ - matplotlib, scikit-learn
74
+ - statsmodels
75
+
76
+ **Output**: `AnalysisResult` with:
77
+ - `verdict`: SUPPORTED, REFUTED, or INCONCLUSIVE
78
+ - `code`: Generated analysis code
79
+ - `output`: Execution output
80
+ - `error`: Error message if execution failed
81
+
82
+ **Usage**:
83
+ ```python
84
+ from src.services.statistical_analyzer import StatisticalAnalyzer
85
+
86
+ analyzer = StatisticalAnalyzer()
87
+ result = await analyzer.analyze(
88
+ hypothesis="Metformin reduces cancer risk",
89
+ evidence=evidence_list
90
+ )
91
+ ```
92
+
93
+ ## Singleton Pattern
94
+
95
+ All services use the singleton pattern with `@lru_cache(maxsize=1)`:
96
+
97
+ ```python
98
+ @lru_cache(maxsize=1)
99
+ def get_embedding_service() -> EmbeddingService:
100
+ return EmbeddingService()
101
+ ```
102
+
103
+ This ensures:
104
+ - Single instance per process
105
+ - Lazy initialization
106
+ - No dependencies required at import time
107
+
108
+ ## Service Availability
109
+
110
+ Services check availability before use:
111
+
112
+ ```python
113
+ from src.utils.config import settings
114
+
115
+ if settings.modal_available:
116
+ # Use Modal sandbox
117
+ pass
118
+
119
+ if settings.has_openai_key:
120
+ # Use OpenAI embeddings for RAG
121
+ pass
122
+ ```
123
+
124
+ ## See Also
125
+
126
+ - [Tools](tools.md) - How services are used by search tools
127
+ - [API Reference - Services](../api/services.md) - API documentation
128
+ - [Configuration](../configuration/index.md) - Service configuration
129
+
130
+
131
+
132
+
docs/architecture/tools.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tools Architecture
2
+
3
+ DeepCritical implements a protocol-based search tool system for retrieving evidence from multiple sources.
4
+
5
+ ## SearchTool Protocol
6
+
7
+ All tools implement the `SearchTool` protocol from `src/tools/base.py`:
8
+
9
+ ```python
10
+ class SearchTool(Protocol):
11
+ @property
12
+ def name(self) -> str: ...
13
+
14
+ async def search(
15
+ self,
16
+ query: str,
17
+ max_results: int = 10
18
+ ) -> list[Evidence]: ...
19
+ ```
20
+
21
+ ## Rate Limiting
22
+
23
+ All tools use the `@retry` decorator from tenacity:
24
+
25
+ ```python
26
+ @retry(
27
+ stop=stop_after_attempt(3),
28
+ wait=wait_exponential(...)
29
+ )
30
+ async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
31
+ # Implementation
32
+ ```
33
+
34
+ Tools with API rate limits implement `_rate_limit()` method and use shared rate limiters from `src/tools/rate_limiter.py`.
35
+
36
+ ## Error Handling
37
+
38
+ Tools raise custom exceptions:
39
+
40
+ - `SearchError`: General search failures
41
+ - `RateLimitError`: Rate limit exceeded
42
+
43
+ Tools handle HTTP errors (429, 500, timeout) and return empty lists on non-critical errors (with warning logs).
44
+
45
+ ## Query Preprocessing
46
+
47
+ Tools use `preprocess_query()` from `src/tools/query_utils.py` to:
48
+
49
+ - Remove noise from queries
50
+ - Expand synonyms
51
+ - Normalize query format
52
+
53
+ ## Evidence Conversion
54
+
55
+ All tools convert API responses to `Evidence` objects with:
56
+
57
+ - `Citation`: Title, URL, date, authors
58
+ - `content`: Evidence text
59
+ - `relevance_score`: 0.0-1.0 relevance score
60
+ - `metadata`: Additional metadata
61
+
62
+ Missing fields are handled gracefully with defaults.
63
+
64
+ ## Tool Implementations
65
+
66
+ ### PubMed Tool
67
+
68
+ **File**: `src/tools/pubmed.py`
69
+
70
+ **API**: NCBI E-utilities (ESearch → EFetch)
71
+
72
+ **Rate Limiting**:
73
+ - 0.34s between requests (3 req/sec without API key)
74
+ - 0.1s between requests (10 req/sec with NCBI API key)
75
+
76
+ **Features**:
77
+ - XML parsing with `xmltodict`
78
+ - Handles single vs. multiple articles
79
+ - Query preprocessing
80
+ - Evidence conversion with metadata extraction
81
+
82
+ ### ClinicalTrials Tool
83
+
84
+ **File**: `src/tools/clinicaltrials.py`
85
+
86
+ **API**: ClinicalTrials.gov API v2
87
+
88
+ **Important**: Uses `requests` library (NOT httpx) because WAF blocks httpx TLS fingerprint.
89
+
90
+ **Execution**: Runs in thread pool: `await asyncio.to_thread(requests.get, ...)`
91
+
92
+ **Filtering**:
93
+ - Only interventional studies
94
+ - Status: `COMPLETED`, `ACTIVE_NOT_RECRUITING`, `RECRUITING`, `ENROLLING_BY_INVITATION`
95
+
96
+ **Features**:
97
+ - Parses nested JSON structure
98
+ - Extracts trial metadata
99
+ - Evidence conversion
100
+
101
+ ### Europe PMC Tool
102
+
103
+ **File**: `src/tools/europepmc.py`
104
+
105
+ **API**: Europe PMC REST API
106
+
107
+ **Features**:
108
+ - Handles preprint markers: `[PREPRINT - Not peer-reviewed]`
109
+ - Builds URLs from DOI or PMID
110
+ - Checks `pubTypeList` for preprint detection
111
+ - Includes both preprints and peer-reviewed articles
112
+
113
+ ### RAG Tool
114
+
115
+ **File**: `src/tools/rag_tool.py`
116
+
117
+ **Purpose**: Semantic search within collected evidence
118
+
119
+ **Implementation**: Wraps `LlamaIndexRAGService`
120
+
121
+ **Features**:
122
+ - Returns Evidence from RAG results
123
+ - Handles evidence ingestion
124
+ - Semantic similarity search
125
+ - Metadata preservation
126
+
127
+ ### Search Handler
128
+
129
+ **File**: `src/tools/search_handler.py`
130
+
131
+ **Purpose**: Orchestrates parallel searches across multiple tools
132
+
133
+ **Features**:
134
+ - Uses `asyncio.gather()` with `return_exceptions=True`
135
+ - Aggregates results into `SearchResult`
136
+ - Handles tool failures gracefully
137
+ - Deduplicates results by URL
138
+
139
+ ## Tool Registration
140
+
141
+ Tools are registered in the search handler:
142
+
143
+ ```python
144
+ from src.tools.pubmed import PubMedTool
145
+ from src.tools.clinicaltrials import ClinicalTrialsTool
146
+ from src.tools.europepmc import EuropePMCTool
147
+
148
+ search_handler = SearchHandler(
149
+ tools=[
150
+ PubMedTool(),
151
+ ClinicalTrialsTool(),
152
+ EuropePMCTool(),
153
+ ]
154
+ )
155
+ ```
156
+
157
+ ## See Also
158
+
159
+ - [Services](services.md) - RAG and embedding services
160
+ - [API Reference - Tools](../api/tools.md) - API documentation
161
+ - [Contributing - Implementation Patterns](../contributing/implementation-patterns.md) - Development guidelines
162
+
163
+
164
+
165
+
docs/architecture/workflow-diagrams.md ADDED
@@ -0,0 +1,670 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepCritical Workflow - Simplified Magentic Architecture
2
+
3
+ > **Architecture Pattern**: Microsoft Magentic Orchestration
4
+ > **Design Philosophy**: Simple, dynamic, manager-driven coordination
5
+ > **Key Innovation**: Intelligent manager replaces rigid sequential phases
6
+
7
+ ---
8
+
9
+ ## 1. High-Level Magentic Workflow
10
+
11
+ ```mermaid
12
+ flowchart TD
13
+ Start([User Query]) --> Manager[Magentic Manager<br/>Plan • Select • Assess • Adapt]
14
+
15
+ Manager -->|Plans| Task1[Task Decomposition]
16
+ Task1 --> Manager
17
+
18
+ Manager -->|Selects & Executes| HypAgent[Hypothesis Agent]
19
+ Manager -->|Selects & Executes| SearchAgent[Search Agent]
20
+ Manager -->|Selects & Executes| AnalysisAgent[Analysis Agent]
21
+ Manager -->|Selects & Executes| ReportAgent[Report Agent]
22
+
23
+ HypAgent -->|Results| Manager
24
+ SearchAgent -->|Results| Manager
25
+ AnalysisAgent -->|Results| Manager
26
+ ReportAgent -->|Results| Manager
27
+
28
+ Manager -->|Assesses Quality| Decision{Good Enough?}
29
+ Decision -->|No - Refine| Manager
30
+ Decision -->|No - Different Agent| Manager
31
+ Decision -->|No - Stalled| Replan[Reset Plan]
32
+ Replan --> Manager
33
+
34
+ Decision -->|Yes| Synthesis[Synthesize Final Result]
35
+ Synthesis --> Output([Research Report])
36
+
37
+ style Start fill:#e1f5e1
38
+ style Manager fill:#ffe6e6
39
+ style HypAgent fill:#fff4e6
40
+ style SearchAgent fill:#fff4e6
41
+ style AnalysisAgent fill:#fff4e6
42
+ style ReportAgent fill:#fff4e6
43
+ style Decision fill:#ffd6d6
44
+ style Synthesis fill:#d4edda
45
+ style Output fill:#e1f5e1
46
+ ```
47
+
48
+ ## 2. Magentic Manager: The 6-Phase Cycle
49
+
50
+ ```mermaid
51
+ flowchart LR
52
+ P1[1. Planning<br/>Analyze task<br/>Create strategy] --> P2[2. Agent Selection<br/>Pick best agent<br/>for subtask]
53
+ P2 --> P3[3. Execution<br/>Run selected<br/>agent with tools]
54
+ P3 --> P4[4. Assessment<br/>Evaluate quality<br/>Check progress]
55
+ P4 --> Decision{Quality OK?<br/>Progress made?}
56
+ Decision -->|Yes| P6[6. Synthesis<br/>Combine results<br/>Generate report]
57
+ Decision -->|No| P5[5. Iteration<br/>Adjust plan<br/>Try again]
58
+ P5 --> P2
59
+ P6 --> Done([Complete])
60
+
61
+ style P1 fill:#fff4e6
62
+ style P2 fill:#ffe6e6
63
+ style P3 fill:#e6f3ff
64
+ style P4 fill:#ffd6d6
65
+ style P5 fill:#fff3cd
66
+ style P6 fill:#d4edda
67
+ style Done fill:#e1f5e1
68
+ ```
69
+
70
+ ## 3. Simplified Agent Architecture
71
+
72
+ ```mermaid
73
+ graph TB
74
+ subgraph "Orchestration Layer"
75
+ Manager[Magentic Manager<br/>• Plans workflow<br/>• Selects agents<br/>• Assesses quality<br/>• Adapts strategy]
76
+ SharedContext[(Shared Context<br/>• Hypotheses<br/>• Search Results<br/>• Analysis<br/>• Progress)]
77
+ Manager <--> SharedContext
78
+ end
79
+
80
+ subgraph "Specialist Agents"
81
+ HypAgent[Hypothesis Agent<br/>• Domain understanding<br/>• Hypothesis generation<br/>• Testability refinement]
82
+ SearchAgent[Search Agent<br/>• Multi-source search<br/>• RAG retrieval<br/>• Result ranking]
83
+ AnalysisAgent[Analysis Agent<br/>• Evidence extraction<br/>• Statistical analysis<br/>• Code execution]
84
+ ReportAgent[Report Agent<br/>• Report assembly<br/>• Visualization<br/>• Citation formatting]
85
+ end
86
+
87
+ subgraph "MCP Tools"
88
+ WebSearch[Web Search<br/>PubMed • arXiv • bioRxiv]
89
+ CodeExec[Code Execution<br/>Sandboxed Python]
90
+ RAG[RAG Retrieval<br/>Vector DB • Embeddings]
91
+ Viz[Visualization<br/>Charts • Graphs]
92
+ end
93
+
94
+ Manager -->|Selects & Directs| HypAgent
95
+ Manager -->|Selects & Directs| SearchAgent
96
+ Manager -->|Selects & Directs| AnalysisAgent
97
+ Manager -->|Selects & Directs| ReportAgent
98
+
99
+ HypAgent --> SharedContext
100
+ SearchAgent --> SharedContext
101
+ AnalysisAgent --> SharedContext
102
+ ReportAgent --> SharedContext
103
+
104
+ SearchAgent --> WebSearch
105
+ SearchAgent --> RAG
106
+ AnalysisAgent --> CodeExec
107
+ ReportAgent --> CodeExec
108
+ ReportAgent --> Viz
109
+
110
+ style Manager fill:#ffe6e6
111
+ style SharedContext fill:#ffe6f0
112
+ style HypAgent fill:#fff4e6
113
+ style SearchAgent fill:#fff4e6
114
+ style AnalysisAgent fill:#fff4e6
115
+ style ReportAgent fill:#fff4e6
116
+ style WebSearch fill:#e6f3ff
117
+ style CodeExec fill:#e6f3ff
118
+ style RAG fill:#e6f3ff
119
+ style Viz fill:#e6f3ff
120
+ ```
121
+
122
+ ## 4. Dynamic Workflow Example
123
+
124
+ ```mermaid
125
+ sequenceDiagram
126
+ participant User
127
+ participant Manager
128
+ participant HypAgent
129
+ participant SearchAgent
130
+ participant AnalysisAgent
131
+ participant ReportAgent
132
+
133
+ User->>Manager: "Research protein folding in Alzheimer's"
134
+
135
+ Note over Manager: PLAN: Generate hypotheses → Search → Analyze → Report
136
+
137
+ Manager->>HypAgent: Generate 3 hypotheses
138
+ HypAgent-->>Manager: Returns 3 hypotheses
139
+ Note over Manager: ASSESS: Good quality, proceed
140
+
141
+ Manager->>SearchAgent: Search literature for hypothesis 1
142
+ SearchAgent-->>Manager: Returns 15 papers
143
+ Note over Manager: ASSESS: Good results, continue
144
+
145
+ Manager->>SearchAgent: Search for hypothesis 2
146
+ SearchAgent-->>Manager: Only 2 papers found
147
+ Note over Manager: ASSESS: Insufficient, refine search
148
+
149
+ Manager->>SearchAgent: Refined query for hypothesis 2
150
+ SearchAgent-->>Manager: Returns 12 papers
151
+ Note over Manager: ASSESS: Better, proceed
152
+
153
+ Manager->>AnalysisAgent: Analyze evidence for all hypotheses
154
+ AnalysisAgent-->>Manager: Returns analysis with code
155
+ Note over Manager: ASSESS: Complete, generate report
156
+
157
+ Manager->>ReportAgent: Create comprehensive report
158
+ ReportAgent-->>Manager: Returns formatted report
159
+ Note over Manager: SYNTHESIZE: Combine all results
160
+
161
+ Manager->>User: Final Research Report
162
+ ```
163
+
164
+ ## 5. Manager Decision Logic
165
+
166
+ ```mermaid
167
+ flowchart TD
168
+ Start([Manager Receives Task]) --> Plan[Create Initial Plan]
169
+
170
+ Plan --> Select[Select Agent for Next Subtask]
171
+ Select --> Execute[Execute Agent]
172
+ Execute --> Collect[Collect Results]
173
+
174
+ Collect --> Assess[Assess Quality & Progress]
175
+
176
+ Assess --> Q1{Quality Sufficient?}
177
+ Q1 -->|No| Q2{Same Agent Can Fix?}
178
+ Q2 -->|Yes| Feedback[Provide Specific Feedback]
179
+ Feedback --> Execute
180
+ Q2 -->|No| Different[Try Different Agent]
181
+ Different --> Select
182
+
183
+ Q1 -->|Yes| Q3{Task Complete?}
184
+ Q3 -->|No| Q4{Making Progress?}
185
+ Q4 -->|Yes| Select
186
+ Q4 -->|No - Stalled| Replan[Reset Plan & Approach]
187
+ Replan --> Plan
188
+
189
+ Q3 -->|Yes| Synth[Synthesize Final Result]
190
+ Synth --> Done([Return Report])
191
+
192
+ style Start fill:#e1f5e1
193
+ style Plan fill:#fff4e6
194
+ style Select fill:#ffe6e6
195
+ style Execute fill:#e6f3ff
196
+ style Assess fill:#ffd6d6
197
+ style Q1 fill:#ffe6e6
198
+ style Q2 fill:#ffe6e6
199
+ style Q3 fill:#ffe6e6
200
+ style Q4 fill:#ffe6e6
201
+ style Synth fill:#d4edda
202
+ style Done fill:#e1f5e1
203
+ ```
204
+
205
+ ## 6. Hypothesis Agent Workflow
206
+
207
+ ```mermaid
208
+ flowchart LR
209
+ Input[Research Query] --> Domain[Identify Domain<br/>& Key Concepts]
210
+ Domain --> Context[Retrieve Background<br/>Knowledge]
211
+ Context --> Generate[Generate 3-5<br/>Initial Hypotheses]
212
+ Generate --> Refine[Refine for<br/>Testability]
213
+ Refine --> Rank[Rank by<br/>Quality Score]
214
+ Rank --> Output[Return Top<br/>Hypotheses]
215
+
216
+ Output --> Struct[Hypothesis Structure:<br/>• Statement<br/>• Rationale<br/>• Testability Score<br/>• Data Requirements<br/>• Expected Outcomes]
217
+
218
+ style Input fill:#e1f5e1
219
+ style Output fill:#fff4e6
220
+ style Struct fill:#e6f3ff
221
+ ```
222
+
223
+ ## 7. Search Agent Workflow
224
+
225
+ ```mermaid
226
+ flowchart TD
227
+ Input[Hypotheses] --> Strategy[Formulate Search<br/>Strategy per Hypothesis]
228
+
229
+ Strategy --> Multi[Multi-Source Search]
230
+
231
+ Multi --> PubMed[PubMed Search<br/>via MCP]
232
+ Multi --> ArXiv[arXiv Search<br/>via MCP]
233
+ Multi --> BioRxiv[bioRxiv Search<br/>via MCP]
234
+
235
+ PubMed --> Aggregate[Aggregate Results]
236
+ ArXiv --> Aggregate
237
+ BioRxiv --> Aggregate
238
+
239
+ Aggregate --> Filter[Filter & Rank<br/>by Relevance]
240
+ Filter --> Dedup[Deduplicate<br/>Cross-Reference]
241
+ Dedup --> Embed[Embed Documents<br/>via MCP]
242
+ Embed --> Vector[(Vector DB)]
243
+ Vector --> RAGRetrieval[RAG Retrieval<br/>Top-K per Hypothesis]
244
+ RAGRetrieval --> Output[Return Contextualized<br/>Search Results]
245
+
246
+ style Input fill:#fff4e6
247
+ style Multi fill:#ffe6e6
248
+ style Vector fill:#ffe6f0
249
+ style Output fill:#e6f3ff
250
+ ```
251
+
252
+ ## 8. Analysis Agent Workflow
253
+
254
+ ```mermaid
255
+ flowchart TD
256
+ Input1[Hypotheses] --> Extract
257
+ Input2[Search Results] --> Extract[Extract Evidence<br/>per Hypothesis]
258
+
259
+ Extract --> Methods[Determine Analysis<br/>Methods Needed]
260
+
261
+ Methods --> Branch{Requires<br/>Computation?}
262
+ Branch -->|Yes| GenCode[Generate Python<br/>Analysis Code]
263
+ Branch -->|No| Qual[Qualitative<br/>Synthesis]
264
+
265
+ GenCode --> Execute[Execute Code<br/>via MCP Sandbox]
266
+ Execute --> Interpret1[Interpret<br/>Results]
267
+ Qual --> Interpret2[Interpret<br/>Findings]
268
+
269
+ Interpret1 --> Synthesize[Synthesize Evidence<br/>Across Sources]
270
+ Interpret2 --> Synthesize
271
+
272
+ Synthesize --> Verdict[Determine Verdict<br/>per Hypothesis]
273
+ Verdict --> Support[• Supported<br/>• Refuted<br/>• Inconclusive]
274
+ Support --> Gaps[Identify Knowledge<br/>Gaps & Limitations]
275
+ Gaps --> Output[Return Analysis<br/>Report]
276
+
277
+ style Input1 fill:#fff4e6
278
+ style Input2 fill:#e6f3ff
279
+ style Execute fill:#ffe6e6
280
+ style Output fill:#e6ffe6
281
+ ```
282
+
283
+ ## 9. Report Agent Workflow
284
+
285
+ ```mermaid
286
+ flowchart TD
287
+ Input1[Query] --> Assemble
288
+ Input2[Hypotheses] --> Assemble
289
+ Input3[Search Results] --> Assemble
290
+ Input4[Analysis] --> Assemble[Assemble Report<br/>Sections]
291
+
292
+ Assemble --> Exec[Executive Summary]
293
+ Assemble --> Intro[Introduction]
294
+ Assemble --> Methods[Methods]
295
+ Assemble --> Results[Results per<br/>Hypothesis]
296
+ Assemble --> Discussion[Discussion]
297
+ Assemble --> Future[Future Directions]
298
+ Assemble --> Refs[References]
299
+
300
+ Results --> VizCheck{Needs<br/>Visualization?}
301
+ VizCheck -->|Yes| GenViz[Generate Viz Code]
302
+ GenViz --> ExecViz[Execute via MCP<br/>Create Charts]
303
+ ExecViz --> Combine
304
+ VizCheck -->|No| Combine[Combine All<br/>Sections]
305
+
306
+ Exec --> Combine
307
+ Intro --> Combine
308
+ Methods --> Combine
309
+ Discussion --> Combine
310
+ Future --> Combine
311
+ Refs --> Combine
312
+
313
+ Combine --> Format[Format Output]
314
+ Format --> MD[Markdown]
315
+ Format --> PDF[PDF]
316
+ Format --> JSON[JSON]
317
+
318
+ MD --> Output[Return Final<br/>Report]
319
+ PDF --> Output
320
+ JSON --> Output
321
+
322
+ style Input1 fill:#e1f5e1
323
+ style Input2 fill:#fff4e6
324
+ style Input3 fill:#e6f3ff
325
+ style Input4 fill:#e6ffe6
326
+ style Output fill:#d4edda
327
+ ```
328
+
329
+ ## 10. Data Flow & Event Streaming
330
+
331
+ ```mermaid
332
+ flowchart TD
333
+ User[👤 User] -->|Research Query| UI[Gradio UI]
334
+ UI -->|Submit| Manager[Magentic Manager]
335
+
336
+ Manager -->|Event: Planning| UI
337
+ Manager -->|Select Agent| HypAgent[Hypothesis Agent]
338
+ HypAgent -->|Event: Delta/Message| UI
339
+ HypAgent -->|Hypotheses| Context[(Shared Context)]
340
+
341
+ Context -->|Retrieved by| Manager
342
+ Manager -->|Select Agent| SearchAgent[Search Agent]
343
+ SearchAgent -->|MCP Request| WebSearch[Web Search Tool]
344
+ WebSearch -->|Results| SearchAgent
345
+ SearchAgent -->|Event: Delta/Message| UI
346
+ SearchAgent -->|Documents| Context
347
+ SearchAgent -->|Embeddings| VectorDB[(Vector DB)]
348
+
349
+ Context -->|Retrieved by| Manager
350
+ Manager -->|Select Agent| AnalysisAgent[Analysis Agent]
351
+ AnalysisAgent -->|MCP Request| CodeExec[Code Execution Tool]
352
+ CodeExec -->|Results| AnalysisAgent
353
+ AnalysisAgent -->|Event: Delta/Message| UI
354
+ AnalysisAgent -->|Analysis| Context
355
+
356
+ Context -->|Retrieved by| Manager
357
+ Manager -->|Select Agent| ReportAgent[Report Agent]
358
+ ReportAgent -->|MCP Request| CodeExec
359
+ ReportAgent -->|Event: Delta/Message| UI
360
+ ReportAgent -->|Report| Context
361
+
362
+ Manager -->|Event: Final Result| UI
363
+ UI -->|Display| User
364
+
365
+ style User fill:#e1f5e1
366
+ style UI fill:#e6f3ff
367
+ style Manager fill:#ffe6e6
368
+ style Context fill:#ffe6f0
369
+ style VectorDB fill:#ffe6f0
370
+ style WebSearch fill:#f0f0f0
371
+ style CodeExec fill:#f0f0f0
372
+ ```
373
+
374
+ ## 11. MCP Tool Architecture
375
+
376
+ ```mermaid
377
+ graph TB
378
+ subgraph "Agent Layer"
379
+ Manager[Magentic Manager]
380
+ HypAgent[Hypothesis Agent]
381
+ SearchAgent[Search Agent]
382
+ AnalysisAgent[Analysis Agent]
383
+ ReportAgent[Report Agent]
384
+ end
385
+
386
+ subgraph "MCP Protocol Layer"
387
+ Registry[MCP Tool Registry<br/>• Discovers tools<br/>• Routes requests<br/>• Manages connections]
388
+ end
389
+
390
+ subgraph "MCP Servers"
391
+ Server1[Web Search Server<br/>localhost:8001<br/>• PubMed<br/>• arXiv<br/>• bioRxiv]
392
+ Server2[Code Execution Server<br/>localhost:8002<br/>• Sandboxed Python<br/>• Package management]
393
+ Server3[RAG Server<br/>localhost:8003<br/>• Vector embeddings<br/>• Similarity search]
394
+ Server4[Visualization Server<br/>localhost:8004<br/>• Chart generation<br/>• Plot rendering]
395
+ end
396
+
397
+ subgraph "External Services"
398
+ PubMed[PubMed API]
399
+ ArXiv[arXiv API]
400
+ BioRxiv[bioRxiv API]
401
+ Modal[Modal Sandbox]
402
+ ChromaDB[(ChromaDB)]
403
+ end
404
+
405
+ SearchAgent -->|Request| Registry
406
+ AnalysisAgent -->|Request| Registry
407
+ ReportAgent -->|Request| Registry
408
+
409
+ Registry --> Server1
410
+ Registry --> Server2
411
+ Registry --> Server3
412
+ Registry --> Server4
413
+
414
+ Server1 --> PubMed
415
+ Server1 --> ArXiv
416
+ Server1 --> BioRxiv
417
+ Server2 --> Modal
418
+ Server3 --> ChromaDB
419
+
420
+ style Manager fill:#ffe6e6
421
+ style Registry fill:#fff4e6
422
+ style Server1 fill:#e6f3ff
423
+ style Server2 fill:#e6f3ff
424
+ style Server3 fill:#e6f3ff
425
+ style Server4 fill:#e6f3ff
426
+ ```
427
+
428
+ ## 12. Progress Tracking & Stall Detection
429
+
430
+ ```mermaid
431
+ stateDiagram-v2
432
+ [*] --> Initialization: User Query
433
+
434
+ Initialization --> Planning: Manager starts
435
+
436
+ Planning --> AgentExecution: Select agent
437
+
438
+ AgentExecution --> Assessment: Collect results
439
+
440
+ Assessment --> QualityCheck: Evaluate output
441
+
442
+ QualityCheck --> AgentExecution: Poor quality<br/>(retry < max_rounds)
443
+ QualityCheck --> Planning: Poor quality<br/>(try different agent)
444
+ QualityCheck --> NextAgent: Good quality<br/>(task incomplete)
445
+ QualityCheck --> Synthesis: Good quality<br/>(task complete)
446
+
447
+ NextAgent --> AgentExecution: Select next agent
448
+
449
+ state StallDetection <<choice>>
450
+ Assessment --> StallDetection: Check progress
451
+ StallDetection --> Planning: No progress<br/>(stall count < max)
452
+ StallDetection --> ErrorRecovery: No progress<br/>(max stalls reached)
453
+
454
+ ErrorRecovery --> PartialReport: Generate partial results
455
+ PartialReport --> [*]
456
+
457
+ Synthesis --> FinalReport: Combine all outputs
458
+ FinalReport --> [*]
459
+
460
+ note right of QualityCheck
461
+ Manager assesses:
462
+ • Output completeness
463
+ • Quality metrics
464
+ • Progress made
465
+ end note
466
+
467
+ note right of StallDetection
468
+ Stall = no new progress
469
+ after agent execution
470
+ Triggers plan reset
471
+ end note
472
+ ```
473
+
474
+ ## 13. Gradio UI Integration
475
+
476
+ ```mermaid
477
+ graph TD
478
+ App[Gradio App<br/>DeepCritical Research Agent]
479
+
480
+ App --> Input[Input Section]
481
+ App --> Status[Status Section]
482
+ App --> Output[Output Section]
483
+
484
+ Input --> Query[Research Question<br/>Text Area]
485
+ Input --> Controls[Controls]
486
+ Controls --> MaxHyp[Max Hypotheses: 1-10]
487
+ Controls --> MaxRounds[Max Rounds: 5-20]
488
+ Controls --> Submit[Start Research Button]
489
+
490
+ Status --> Log[Real-time Event Log<br/>• Manager planning<br/>• Agent selection<br/>• Execution updates<br/>• Quality assessment]
491
+ Status --> Progress[Progress Tracker<br/>• Current agent<br/>• Round count<br/>• Stall count]
492
+
493
+ Output --> Tabs[Tabbed Results]
494
+ Tabs --> Tab1[Hypotheses Tab<br/>Generated hypotheses with scores]
495
+ Tabs --> Tab2[Search Results Tab<br/>Papers & sources found]
496
+ Tabs --> Tab3[Analysis Tab<br/>Evidence & verdicts]
497
+ Tabs --> Tab4[Report Tab<br/>Final research report]
498
+ Tab4 --> Download[Download Report<br/>MD / PDF / JSON]
499
+
500
+ Submit -.->|Triggers| Workflow[Magentic Workflow]
501
+ Workflow -.->|MagenticOrchestratorMessageEvent| Log
502
+ Workflow -.->|MagenticAgentDeltaEvent| Log
503
+ Workflow -.->|MagenticAgentMessageEvent| Log
504
+ Workflow -.->|MagenticFinalResultEvent| Tab4
505
+
506
+ style App fill:#e1f5e1
507
+ style Input fill:#fff4e6
508
+ style Status fill:#e6f3ff
509
+ style Output fill:#e6ffe6
510
+ style Workflow fill:#ffe6e6
511
+ ```
512
+
513
+ ## 14. Complete System Context
514
+
515
+ ```mermaid
516
+ graph LR
517
+ User[👤 Researcher<br/>Asks research questions] -->|Submits query| DC[DeepCritical<br/>Magentic Workflow]
518
+
519
+ DC -->|Literature search| PubMed[PubMed API<br/>Medical papers]
520
+ DC -->|Preprint search| ArXiv[arXiv API<br/>Scientific preprints]
521
+ DC -->|Biology search| BioRxiv[bioRxiv API<br/>Biology preprints]
522
+ DC -->|Agent reasoning| Claude[Claude API<br/>Sonnet 4 / Opus]
523
+ DC -->|Code execution| Modal[Modal Sandbox<br/>Safe Python env]
524
+ DC -->|Vector storage| Chroma[ChromaDB<br/>Embeddings & RAG]
525
+
526
+ DC -->|Deployed on| HF[HuggingFace Spaces<br/>Gradio 6.0]
527
+
528
+ PubMed -->|Results| DC
529
+ ArXiv -->|Results| DC
530
+ BioRxiv -->|Results| DC
531
+ Claude -->|Responses| DC
532
+ Modal -->|Output| DC
533
+ Chroma -->|Context| DC
534
+
535
+ DC -->|Research report| User
536
+
537
+ style User fill:#e1f5e1
538
+ style DC fill:#ffe6e6
539
+ style PubMed fill:#e6f3ff
540
+ style ArXiv fill:#e6f3ff
541
+ style BioRxiv fill:#e6f3ff
542
+ style Claude fill:#ffd6d6
543
+ style Modal fill:#f0f0f0
544
+ style Chroma fill:#ffe6f0
545
+ style HF fill:#d4edda
546
+ ```
547
+
548
+ ## 15. Workflow Timeline (Simplified)
549
+
550
+ ```mermaid
551
+ gantt
552
+ title DeepCritical Magentic Workflow - Typical Execution
553
+ dateFormat mm:ss
554
+ axisFormat %M:%S
555
+
556
+ section Manager Planning
557
+ Initial planning :p1, 00:00, 10s
558
+
559
+ section Hypothesis Agent
560
+ Generate hypotheses :h1, after p1, 30s
561
+ Manager assessment :h2, after h1, 5s
562
+
563
+ section Search Agent
564
+ Search hypothesis 1 :s1, after h2, 20s
565
+ Search hypothesis 2 :s2, after s1, 20s
566
+ Search hypothesis 3 :s3, after s2, 20s
567
+ RAG processing :s4, after s3, 15s
568
+ Manager assessment :s5, after s4, 5s
569
+
570
+ section Analysis Agent
571
+ Evidence extraction :a1, after s5, 15s
572
+ Code generation :a2, after a1, 20s
573
+ Code execution :a3, after a2, 25s
574
+ Synthesis :a4, after a3, 20s
575
+ Manager assessment :a5, after a4, 5s
576
+
577
+ section Report Agent
578
+ Report assembly :r1, after a5, 30s
579
+ Visualization :r2, after r1, 15s
580
+ Formatting :r3, after r2, 10s
581
+
582
+ section Manager Synthesis
583
+ Final synthesis :f1, after r3, 10s
584
+ ```
585
+
586
+ ---
587
+
588
+ ## Key Differences from Original Design
589
+
590
+ | Aspect | Original (Judge-in-Loop) | New (Magentic) |
591
+ |--------|-------------------------|----------------|
592
+ | **Control Flow** | Fixed sequential phases | Dynamic agent selection |
593
+ | **Quality Control** | Separate Judge Agent | Manager assessment built-in |
594
+ | **Retry Logic** | Phase-level with feedback | Agent-level with adaptation |
595
+ | **Flexibility** | Rigid 4-phase pipeline | Adaptive workflow |
596
+ | **Complexity** | 5 agents (including Judge) | 4 agents (no Judge) |
597
+ | **Progress Tracking** | Manual state management | Built-in round/stall detection |
598
+ | **Agent Coordination** | Sequential handoff | Manager-driven dynamic selection |
599
+ | **Error Recovery** | Retry same phase | Try different agent or replan |
600
+
601
+ ---
602
+
603
+ ## Simplified Design Principles
604
+
605
+ 1. **Manager is Intelligent**: LLM-powered manager handles planning, selection, and quality assessment
606
+ 2. **No Separate Judge**: Manager's assessment phase replaces dedicated Judge Agent
607
+ 3. **Dynamic Workflow**: Agents can be called multiple times in any order based on need
608
+ 4. **Built-in Safety**: max_round_count (15) and max_stall_count (3) prevent infinite loops
609
+ 5. **Event-Driven UI**: Real-time streaming updates to Gradio interface
610
+ 6. **MCP-Powered Tools**: All external capabilities via Model Context Protocol
611
+ 7. **Shared Context**: Centralized state accessible to all agents
612
+ 8. **Progress Awareness**: Manager tracks what's been done and what's needed
613
+
614
+ ---
615
+
616
+ ## Legend
617
+
618
+ - 🔴 **Red/Pink**: Manager, orchestration, decision-making
619
+ - 🟡 **Yellow/Orange**: Specialist agents, processing
620
+ - 🔵 **Blue**: Data, tools, MCP services
621
+ - 🟣 **Purple/Pink**: Storage, databases, state
622
+ - 🟢 **Green**: User interactions, final outputs
623
+ - ⚪ **Gray**: External services, APIs
624
+
625
+ ---
626
+
627
+ ## Implementation Highlights
628
+
629
+ **Simple 4-Agent Setup:**
630
+ ```python
631
+ workflow = (
632
+ MagenticBuilder()
633
+ .participants(
634
+ hypothesis=HypothesisAgent(tools=[background_tool]),
635
+ search=SearchAgent(tools=[web_search, rag_tool]),
636
+ analysis=AnalysisAgent(tools=[code_execution]),
637
+ report=ReportAgent(tools=[code_execution, visualization])
638
+ )
639
+ .with_standard_manager(
640
+ chat_client=AnthropicClient(model="claude-sonnet-4"),
641
+ max_round_count=15, # Prevent infinite loops
642
+ max_stall_count=3 # Detect stuck workflows
643
+ )
644
+ .build()
645
+ )
646
+ ```
647
+
648
+ **Manager handles quality assessment in its instructions:**
649
+ - Checks hypothesis quality (testable, novel, clear)
650
+ - Validates search results (relevant, authoritative, recent)
651
+ - Assesses analysis soundness (methodology, evidence, conclusions)
652
+ - Ensures report completeness (all sections, proper citations)
653
+
654
+ No separate Judge Agent needed - manager does it all!
655
+
656
+ ---
657
+
658
+ **Document Version**: 2.0 (Magentic Simplified)
659
+ **Last Updated**: 2025-11-24
660
+ **Architecture**: Microsoft Magentic Orchestration Pattern
661
+ **Agents**: 4 (Hypothesis, Search, Analysis, Report) + 1 Manager
662
+ **License**: MIT
663
+
664
+ ## See Also
665
+
666
+ - [Orchestrators](orchestrators.md) - Overview of all orchestrator patterns
667
+ - [Graph Orchestration](graph-orchestration.md) - Graph-based execution overview
668
+ - [Graph Orchestration (Detailed)](graph_orchestration.md) - Detailed graph architecture
669
+ - [Workflows](workflows.md) - Workflow patterns summary
670
+ - [API Reference - Orchestrators](../api/orchestrators.md) - API documentation
docs/{workflow-diagrams.md → architecture/workflows.md} RENAMED
File without changes
docs/brainstorming/00_ROADMAP_SUMMARY.md DELETED
@@ -1,194 +0,0 @@
1
- # DeepCritical Data Sources: Roadmap Summary
2
-
3
- **Created**: 2024-11-27
4
- **Purpose**: Future maintainability and hackathon continuation
5
-
6
- ---
7
-
8
- ## Current State
9
-
10
- ### Working Tools
11
-
12
- | Tool | Status | Data Quality |
13
- |------|--------|--------------|
14
- | PubMed | ✅ Works | Good (abstracts only) |
15
- | ClinicalTrials.gov | ✅ Works | Good (filtered for interventional) |
16
- | Europe PMC | ✅ Works | Good (includes preprints) |
17
-
18
- ### Removed Tools
19
-
20
- | Tool | Status | Reason |
21
- |------|--------|--------|
22
- | bioRxiv | ❌ Removed | No search API - only date/DOI lookup |
23
-
24
- ---
25
-
26
- ## Priority Improvements
27
-
28
- ### P0: Critical (Do First)
29
-
30
- 1. **Add Rate Limiting to PubMed**
31
- - NCBI will block us without it
32
- - Use `limits` library (see reference repo)
33
- - 3/sec without key, 10/sec with key
34
-
35
- ### P1: High Value, Medium Effort
36
-
37
- 2. **Add OpenAlex as 4th Source**
38
- - Citation network (huge for drug repurposing)
39
- - Concept tagging (semantic discovery)
40
- - Already implemented in reference repo
41
- - Free, no API key
42
-
43
- 3. **PubMed Full-Text via BioC**
44
- - Get full paper text for PMC papers
45
- - Already in reference repo
46
-
47
- ### P2: Nice to Have
48
-
49
- 4. **ClinicalTrials.gov Results**
50
- - Get efficacy data from completed trials
51
- - Requires more complex API calls
52
-
53
- 5. **Europe PMC Annotations**
54
- - Text-mined entities (genes, drugs, diseases)
55
- - Automatic entity extraction
56
-
57
- ---
58
-
59
- ## Effort Estimates
60
-
61
- | Improvement | Effort | Impact | Priority |
62
- |-------------|--------|--------|----------|
63
- | PubMed rate limiting | 1 hour | Stability | P0 |
64
- | OpenAlex basic search | 2 hours | High | P1 |
65
- | OpenAlex citations | 2 hours | Very High | P1 |
66
- | PubMed full-text | 3 hours | Medium | P1 |
67
- | CT.gov results | 4 hours | Medium | P2 |
68
- | Europe PMC annotations | 3 hours | Medium | P2 |
69
-
70
- ---
71
-
72
- ## Architecture Decision
73
-
74
- ### Option A: Keep Current + Add OpenAlex
75
-
76
- ```
77
- User Query
78
-
79
- ┌───────────────────┼───────────────────┐
80
- ↓ ↓ ↓
81
- PubMed ClinicalTrials Europe PMC
82
- (abstracts) (trials only) (preprints)
83
- ↓ ↓ ↓
84
- └───────────────────┼───────────────────┘
85
-
86
- OpenAlex ← NEW
87
- (citations, concepts)
88
-
89
- Orchestrator
90
-
91
- Report
92
- ```
93
-
94
- **Pros**: Low risk, additive
95
- **Cons**: More complexity, some overlap
96
-
97
- ### Option B: OpenAlex as Primary
98
-
99
- ```
100
- User Query
101
-
102
- ┌───────────────────┼───────────────────┐
103
- ↓ ↓ ↓
104
- OpenAlex ClinicalTrials Europe PMC
105
- (primary (trials only) (full-text
106
- search) fallback)
107
- ↓ ↓ ↓
108
- └───────────────────┼───────────────────┘
109
-
110
- Orchestrator
111
-
112
- Report
113
- ```
114
-
115
- **Pros**: Simpler, citation network built-in
116
- **Cons**: Lose some PubMed-specific features
117
-
118
- ### Recommendation: Option A
119
-
120
- Keep current architecture working, add OpenAlex incrementally.
121
-
122
- ---
123
-
124
- ## Quick Wins (Can Do Today)
125
-
126
- 1. **Add `limits` to `pyproject.toml`**
127
- ```toml
128
- dependencies = [
129
- "limits>=3.0",
130
- ]
131
- ```
132
-
133
- 2. **Copy OpenAlex tool from reference repo**
134
- - File: `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`
135
- - Adapt to our `SearchTool` base class
136
-
137
- 3. **Enable NCBI API Key**
138
- - Add to `.env`: `NCBI_API_KEY=your_key`
139
- - 10x rate limit improvement
140
-
141
- ---
142
-
143
- ## External Resources Worth Exploring
144
-
145
- ### Python Libraries
146
-
147
- | Library | For | Notes |
148
- |---------|-----|-------|
149
- | `limits` | Rate limiting | Used by reference repo |
150
- | `pyalex` | OpenAlex wrapper | [GitHub](https://github.com/J535D165/pyalex) |
151
- | `metapub` | PubMed | Full-featured |
152
- | `sentence-transformers` | Semantic search | For embeddings |
153
-
154
- ### APIs Not Yet Used
155
-
156
- | API | Provides | Effort |
157
- |-----|----------|--------|
158
- | RxNorm | Drug name normalization | Low |
159
- | DrugBank | Drug targets/mechanisms | Medium (license) |
160
- | UniProt | Protein data | Medium |
161
- | ChEMBL | Bioactivity data | Medium |
162
-
163
- ### RAG Tools (Future)
164
-
165
- | Tool | Purpose |
166
- |------|---------|
167
- | [PaperQA](https://github.com/Future-House/paper-qa) | RAG for scientific papers |
168
- | [txtai](https://github.com/neuml/txtai) | Embeddings + search |
169
- | [PubMedBERT](https://huggingface.co/NeuML/pubmedbert-base-embeddings) | Biomedical embeddings |
170
-
171
- ---
172
-
173
- ## Files in This Directory
174
-
175
- | File | Contents |
176
- |------|----------|
177
- | `00_ROADMAP_SUMMARY.md` | This file |
178
- | `01_PUBMED_IMPROVEMENTS.md` | PubMed enhancement details |
179
- | `02_CLINICALTRIALS_IMPROVEMENTS.md` | ClinicalTrials.gov details |
180
- | `03_EUROPEPMC_IMPROVEMENTS.md` | Europe PMC details |
181
- | `04_OPENALEX_INTEGRATION.md` | OpenAlex integration plan |
182
-
183
- ---
184
-
185
- ## For Future Maintainers
186
-
187
- If you're picking this up after the hackathon:
188
-
189
- 1. **Start with OpenAlex** - biggest bang for buck
190
- 2. **Add rate limiting** - prevents API blocks
191
- 3. **Don't bother with bioRxiv** - use Europe PMC instead
192
- 4. **Reference repo is gold** - `reference_repos/DeepCritical/` has working implementations
193
-
194
- Good luck! 🚀
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/01_PUBMED_IMPROVEMENTS.md DELETED
@@ -1,125 +0,0 @@
1
- # PubMed Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented
4
- **Priority**: High (Core Data Source)
5
-
6
- ---
7
-
8
- ## Current Implementation
9
-
10
- ### What We Have (`src/tools/pubmed.py`)
11
-
12
- - Basic E-utilities search via `esearch.fcgi` and `efetch.fcgi`
13
- - Query preprocessing (strips question words, expands synonyms)
14
- - Returns: title, abstract, authors, journal, PMID
15
- - Rate limiting: None implemented (relying on NCBI defaults)
16
-
17
- ### Current Limitations
18
-
19
- 1. **No Full-Text Access**: Only retrieves abstracts, not full paper text
20
- 2. **No Rate Limiting**: Risk of being blocked by NCBI
21
- 3. **No BioC Format**: Missing structured full-text extraction
22
- 4. **No Figure Retrieval**: No supplementary materials access
23
- 5. **No PMC Integration**: Missing open-access full-text via PMC
24
-
25
- ---
26
-
27
- ## Reference Implementation (DeepCritical Reference Repo)
28
-
29
- The reference repo at `reference_repos/DeepCritical/DeepResearch/src/tools/bioinformatics_tools.py` has a more sophisticated implementation:
30
-
31
- ### Features We're Missing
32
-
33
- ```python
34
- # Rate limiting (lines 47-50)
35
- from limits import parse
36
- from limits.storage import MemoryStorage
37
- from limits.strategies import MovingWindowRateLimiter
38
-
39
- storage = MemoryStorage()
40
- limiter = MovingWindowRateLimiter(storage)
41
- rate_limit = parse("3/second") # NCBI allows 3/sec without API key, 10/sec with
42
-
43
- # Full-text via BioC format (lines 108-120)
44
- def _get_fulltext(pmid: int) -> dict[str, Any] | None:
45
- pmid_url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
46
- # Returns structured JSON with full text for open-access papers
47
-
48
- # Figure retrieval via Europe PMC (lines 123-149)
49
- def _get_figures(pmcid: str) -> dict[str, str]:
50
- suppl_url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles"
51
- # Returns base64-encoded images from supplementary materials
52
- ```
53
-
54
- ---
55
-
56
- ## Recommended Improvements
57
-
58
- ### Phase 1: Rate Limiting (Critical)
59
-
60
- ```python
61
- # Add to src/tools/pubmed.py
62
- from limits import parse
63
- from limits.storage import MemoryStorage
64
- from limits.strategies import MovingWindowRateLimiter
65
-
66
- storage = MemoryStorage()
67
- limiter = MovingWindowRateLimiter(storage)
68
-
69
- # With NCBI_API_KEY: 10/sec, without: 3/sec
70
- def get_rate_limit():
71
- if settings.ncbi_api_key:
72
- return parse("10/second")
73
- return parse("3/second")
74
- ```
75
-
76
- **Dependencies**: `pip install limits`
77
-
78
- ### Phase 2: Full-Text Retrieval
79
-
80
- ```python
81
- async def get_fulltext(pmid: str) -> str | None:
82
- """Get full text for open-access papers via BioC API."""
83
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
84
- # Only works for PMC papers (open access)
85
- ```
86
-
87
- ### Phase 3: PMC ID Resolution
88
-
89
- ```python
90
- async def get_pmc_id(pmid: str) -> str | None:
91
- """Convert PMID to PMCID for full-text access."""
92
- url = f"https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?ids={pmid}&format=json"
93
- ```
94
-
95
- ---
96
-
97
- ## Python Libraries to Consider
98
-
99
- | Library | Purpose | Notes |
100
- |---------|---------|-------|
101
- | [Biopython](https://biopython.org/) | `Bio.Entrez` module | Official, well-maintained |
102
- | [PyMed](https://pypi.org/project/pymed/) | PubMed wrapper | Simpler API, less control |
103
- | [metapub](https://pypi.org/project/metapub/) | Full-featured | Tested on 1/3 of PubMed |
104
- | [limits](https://pypi.org/project/limits/) | Rate limiting | Used by reference repo |
105
-
106
- ---
107
-
108
- ## API Endpoints Reference
109
-
110
- | Endpoint | Purpose | Rate Limit |
111
- |----------|---------|------------|
112
- | `esearch.fcgi` | Search for PMIDs | 3/sec (10 with key) |
113
- | `efetch.fcgi` | Fetch metadata | 3/sec (10 with key) |
114
- | `esummary.fcgi` | Quick metadata | 3/sec (10 with key) |
115
- | `pmcoa.cgi/BioC_json` | Full text (PMC only) | Unknown |
116
- | `idconv/v1.0` | PMID ↔ PMCID | Unknown |
117
-
118
- ---
119
-
120
- ## Sources
121
-
122
- - [PubMed E-utilities Documentation](https://www.ncbi.nlm.nih.gov/books/NBK25501/)
123
- - [NCBI BioC API](https://www.ncbi.nlm.nih.gov/research/bionlp/APIs/)
124
- - [Searching PubMed with Python](https://marcobonzanini.com/2015/01/12/searching-pubmed-with-python/)
125
- - [PyMed on PyPI](https://pypi.org/project/pymed/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/02_CLINICALTRIALS_IMPROVEMENTS.md DELETED
@@ -1,193 +0,0 @@
1
- # ClinicalTrials.gov Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented
4
- **Priority**: High (Core Data Source for Drug Repurposing)
5
-
6
- ---
7
-
8
- ## Current Implementation
9
-
10
- ### What We Have (`src/tools/clinicaltrials.py`)
11
-
12
- - V2 API search via `clinicaltrials.gov/api/v2/studies`
13
- - Filters: `INTERVENTIONAL` study type, `RECRUITING` status
14
- - Returns: NCT ID, title, conditions, interventions, phase, status
15
- - Query preprocessing via shared `query_utils.py`
16
-
17
- ### Current Strengths
18
-
19
- 1. **Good Filtering**: Already filtering for interventional + recruiting
20
- 2. **V2 API**: Using the modern API (v1 deprecated)
21
- 3. **Phase Info**: Extracting trial phases for drug development context
22
-
23
- ### Current Limitations
24
-
25
- 1. **No Outcome Data**: Missing primary/secondary outcomes
26
- 2. **No Eligibility Criteria**: Missing inclusion/exclusion details
27
- 3. **No Sponsor Info**: Missing who's running the trial
28
- 4. **No Result Data**: For completed trials, no efficacy data
29
- 5. **Limited Drug Mapping**: No integration with drug databases
30
-
31
- ---
32
-
33
- ## API Capabilities We're Not Using
34
-
35
- ### Fields We Could Request
36
-
37
- ```python
38
- # Current fields
39
- fields = ["NCTId", "BriefTitle", "Condition", "InterventionName", "Phase", "OverallStatus"]
40
-
41
- # Additional valuable fields
42
- additional_fields = [
43
- "PrimaryOutcomeMeasure", # What are they measuring?
44
- "SecondaryOutcomeMeasure", # Secondary endpoints
45
- "EligibilityCriteria", # Who can participate?
46
- "LeadSponsorName", # Who's funding?
47
- "ResultsFirstPostDate", # Has results?
48
- "StudyFirstPostDate", # When started?
49
- "CompletionDate", # When finished?
50
- "EnrollmentCount", # Sample size
51
- "InterventionDescription", # Drug details
52
- "ArmGroupLabel", # Treatment arms
53
- "InterventionOtherName", # Drug aliases
54
- ]
55
- ```
56
-
57
- ### Filter Enhancements
58
-
59
- ```python
60
- # Current
61
- aggFilters = "studyType:INTERVENTIONAL,status:RECRUITING"
62
-
63
- # Could add
64
- "status:RECRUITING,ACTIVE_NOT_RECRUITING,COMPLETED" # Include completed for results
65
- "phase:PHASE2,PHASE3" # Only later-stage trials
66
- "resultsFirstPostDateRange:2020-01-01_" # Trials with posted results
67
- ```
68
-
69
- ---
70
-
71
- ## Recommended Improvements
72
-
73
- ### Phase 1: Richer Metadata
74
-
75
- ```python
76
- EXTENDED_FIELDS = [
77
- "NCTId",
78
- "BriefTitle",
79
- "OfficialTitle",
80
- "Condition",
81
- "InterventionName",
82
- "InterventionDescription",
83
- "InterventionOtherName", # Drug synonyms!
84
- "Phase",
85
- "OverallStatus",
86
- "PrimaryOutcomeMeasure",
87
- "EnrollmentCount",
88
- "LeadSponsorName",
89
- "StudyFirstPostDate",
90
- ]
91
- ```
92
-
93
- ### Phase 2: Results Retrieval
94
-
95
- For completed trials, we can get actual efficacy data:
96
-
97
- ```python
98
- async def get_trial_results(nct_id: str) -> dict | None:
99
- """Fetch results for completed trials."""
100
- url = f"https://clinicaltrials.gov/api/v2/studies/{nct_id}"
101
- params = {
102
- "fields": "ResultsSection",
103
- }
104
- # Returns outcome measures and statistics
105
- ```
106
-
107
- ### Phase 3: Drug Name Normalization
108
-
109
- Map intervention names to standard identifiers:
110
-
111
- ```python
112
- # Problem: "Metformin", "Metformin HCl", "Glucophage" are the same drug
113
- # Solution: Use RxNorm or DrugBank for normalization
114
-
115
- async def normalize_drug_name(intervention: str) -> str:
116
- """Normalize drug name via RxNorm API."""
117
- url = f"https://rxnav.nlm.nih.gov/REST/rxcui.json?name={intervention}"
118
- # Returns standardized RxCUI
119
- ```
120
-
121
- ---
122
-
123
- ## Integration Opportunities
124
-
125
- ### With PubMed
126
-
127
- Cross-reference trials with publications:
128
- ```python
129
- # ClinicalTrials.gov provides PMID links
130
- # Can correlate trial results with published papers
131
- ```
132
-
133
- ### With DrugBank/ChEMBL
134
-
135
- Map interventions to:
136
- - Mechanism of action
137
- - Known targets
138
- - Adverse effects
139
- - Drug-drug interactions
140
-
141
- ---
142
-
143
- ## Python Libraries to Consider
144
-
145
- | Library | Purpose | Notes |
146
- |---------|---------|-------|
147
- | [pytrials](https://pypi.org/project/pytrials/) | CT.gov wrapper | V2 API support unclear |
148
- | [clinicaltrials](https://github.com/ebmdatalab/clinicaltrials-act-tracker) | Data tracking | More for analysis |
149
- | [drugbank-downloader](https://pypi.org/project/drugbank-downloader/) | Drug mapping | Requires license |
150
-
151
- ---
152
-
153
- ## API Quirks & Gotchas
154
-
155
- 1. **Rate Limiting**: Undocumented, be conservative
156
- 2. **Pagination**: Max 1000 results per request
157
- 3. **Field Names**: Case-sensitive, camelCase
158
- 4. **Empty Results**: Some fields may be null even if requested
159
- 5. **Status Changes**: Trials change status frequently
160
-
161
- ---
162
-
163
- ## Example Enhanced Query
164
-
165
- ```python
166
- async def search_drug_repurposing_trials(
167
- drug_name: str,
168
- condition: str,
169
- include_completed: bool = True,
170
- ) -> list[Evidence]:
171
- """Search for trials repurposing a drug for a new condition."""
172
-
173
- statuses = ["RECRUITING", "ACTIVE_NOT_RECRUITING"]
174
- if include_completed:
175
- statuses.append("COMPLETED")
176
-
177
- params = {
178
- "query.intr": drug_name,
179
- "query.cond": condition,
180
- "filter.overallStatus": ",".join(statuses),
181
- "filter.studyType": "INTERVENTIONAL",
182
- "fields": ",".join(EXTENDED_FIELDS),
183
- "pageSize": 50,
184
- }
185
- ```
186
-
187
- ---
188
-
189
- ## Sources
190
-
191
- - [ClinicalTrials.gov API Documentation](https://clinicaltrials.gov/data-api/api)
192
- - [CT.gov Field Definitions](https://clinicaltrials.gov/data-api/about-api/study-data-structure)
193
- - [RxNorm API](https://lhncbc.nlm.nih.gov/RxNav/APIs/api-RxNorm.findRxcuiByString.html)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/03_EUROPEPMC_IMPROVEMENTS.md DELETED
@@ -1,211 +0,0 @@
1
- # Europe PMC Tool: Current State & Future Improvements
2
-
3
- **Status**: Currently Implemented (Replaced bioRxiv)
4
- **Priority**: High (Preprint + Open Access Source)
5
-
6
- ---
7
-
8
- ## Why Europe PMC Over bioRxiv?
9
-
10
- ### bioRxiv API Limitations (Why We Abandoned It)
11
-
12
- 1. **No Search API**: Only returns papers by date range or DOI
13
- 2. **No Query Capability**: Cannot search for "metformin cancer"
14
- 3. **Workaround Required**: Would need to download ALL preprints and build local search
15
- 4. **Known Issue**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) documents the limitation
16
-
17
- ### Europe PMC Advantages
18
-
19
- 1. **Full Search API**: Boolean queries, filters, facets
20
- 2. **Aggregates bioRxiv**: Includes bioRxiv, medRxiv content anyway
21
- 3. **Includes PubMed**: Also has MEDLINE content
22
- 4. **34 Preprint Servers**: Not just bioRxiv
23
- 5. **Open Access Focus**: Full-text when available
24
-
25
- ---
26
-
27
- ## Current Implementation
28
-
29
- ### What We Have (`src/tools/europepmc.py`)
30
-
31
- - REST API search via `europepmc.org/webservices/rest/search`
32
- - Preprint flagging via `firstPublicationDate` heuristics
33
- - Returns: title, abstract, authors, DOI, source
34
- - Marks preprints for transparency
35
-
36
- ### Current Limitations
37
-
38
- 1. **No Full-Text Retrieval**: Only metadata/abstracts
39
- 2. **No Citation Network**: Missing references/citations
40
- 3. **No Supplementary Files**: Not fetching figures/data
41
- 4. **Basic Preprint Detection**: Heuristic, not explicit flag
42
-
43
- ---
44
-
45
- ## Europe PMC API Capabilities
46
-
47
- ### Endpoints We Could Use
48
-
49
- | Endpoint | Purpose | Currently Using |
50
- |----------|---------|-----------------|
51
- | `/search` | Query papers | Yes |
52
- | `/fulltext/{ID}` | Full text (XML/JSON) | No |
53
- | `/{PMCID}/supplementaryFiles` | Figures, data | No |
54
- | `/citations/{ID}` | Who cited this | No |
55
- | `/references/{ID}` | What this cites | No |
56
- | `/annotations` | Text-mined entities | No |
57
-
58
- ### Rich Query Syntax
59
-
60
- ```python
61
- # Current simple query
62
- query = "metformin cancer"
63
-
64
- # Could use advanced syntax
65
- query = "(TITLE:metformin OR ABSTRACT:metformin) AND (cancer OR oncology)"
66
- query += " AND (SRC:PPR)" # Only preprints
67
- query += " AND (FIRST_PDATE:[2023-01-01 TO 2024-12-31])" # Date range
68
- query += " AND (OPEN_ACCESS:y)" # Only open access
69
- ```
70
-
71
- ### Source Filters
72
-
73
- ```python
74
- # Filter by source
75
- "SRC:MED" # MEDLINE
76
- "SRC:PMC" # PubMed Central
77
- "SRC:PPR" # Preprints (bioRxiv, medRxiv, etc.)
78
- "SRC:AGR" # Agricola
79
- "SRC:CBA" # Chinese Biological Abstracts
80
- ```
81
-
82
- ---
83
-
84
- ## Recommended Improvements
85
-
86
- ### Phase 1: Rich Metadata
87
-
88
- ```python
89
- # Add to search results
90
- additional_fields = [
91
- "citedByCount", # Impact indicator
92
- "source", # Explicit source (MED, PMC, PPR)
93
- "isOpenAccess", # Boolean flag
94
- "fullTextUrlList", # URLs for full text
95
- "authorAffiliations", # Institution info
96
- "grantsList", # Funding info
97
- ]
98
- ```
99
-
100
- ### Phase 2: Full-Text Retrieval
101
-
102
- ```python
103
- async def get_fulltext(pmcid: str) -> str | None:
104
- """Get full text for open access papers."""
105
- # XML format
106
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextXML"
107
- # Or JSON
108
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/fullTextJSON"
109
- ```
110
-
111
- ### Phase 3: Citation Network
112
-
113
- ```python
114
- async def get_citations(pmcid: str) -> list[str]:
115
- """Get papers that cite this one."""
116
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/citations"
117
-
118
- async def get_references(pmcid: str) -> list[str]:
119
- """Get papers this one cites."""
120
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/references"
121
- ```
122
-
123
- ### Phase 4: Text-Mined Annotations
124
-
125
- Europe PMC extracts entities automatically:
126
-
127
- ```python
128
- async def get_annotations(pmcid: str) -> dict:
129
- """Get text-mined entities (genes, diseases, drugs)."""
130
- url = f"https://www.ebi.ac.uk/europepmc/annotations_api/annotationsByArticleIds"
131
- params = {
132
- "articleIds": f"PMC:{pmcid}",
133
- "type": "Gene_Proteins,Diseases,Chemicals",
134
- "format": "JSON",
135
- }
136
- # Returns structured entity mentions with positions
137
- ```
138
-
139
- ---
140
-
141
- ## Supplementary File Retrieval
142
-
143
- From reference repo (`bioinformatics_tools.py` lines 123-149):
144
-
145
- ```python
146
- def get_figures(pmcid: str) -> dict[str, str]:
147
- """Download figures and supplementary files."""
148
- url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/{pmcid}/supplementaryFiles?includeInlineImage=true"
149
- # Returns ZIP with images, returns base64-encoded
150
- ```
151
-
152
- ---
153
-
154
- ## Preprint-Specific Features
155
-
156
- ### Identify Preprint Servers
157
-
158
- ```python
159
- PREPRINT_SOURCES = {
160
- "PPR": "General preprints",
161
- "bioRxiv": "Biology preprints",
162
- "medRxiv": "Medical preprints",
163
- "chemRxiv": "Chemistry preprints",
164
- "Research Square": "Multi-disciplinary",
165
- "Preprints.org": "MDPI preprints",
166
- }
167
-
168
- # Check if published version exists
169
- async def check_published_version(preprint_doi: str) -> str | None:
170
- """Check if preprint has been peer-reviewed and published."""
171
- # Europe PMC links preprints to final versions
172
- ```
173
-
174
- ---
175
-
176
- ## Rate Limiting
177
-
178
- Europe PMC is more generous than NCBI:
179
-
180
- ```python
181
- # No documented hard limit, but be respectful
182
- # Recommend: 10-20 requests/second max
183
- # Use email in User-Agent for polite pool
184
- headers = {
185
- "User-Agent": "DeepCritical/1.0 (mailto:[email protected])"
186
- }
187
- ```
188
-
189
- ---
190
-
191
- ## vs. The Lens & OpenAlex
192
-
193
- | Feature | Europe PMC | The Lens | OpenAlex |
194
- |---------|------------|----------|----------|
195
- | Biomedical Focus | Yes | Partial | Partial |
196
- | Preprints | Yes (34 servers) | Yes | Yes |
197
- | Full Text | PMC papers | Links | No |
198
- | Citations | Yes | Yes | Yes |
199
- | Annotations | Yes (text-mined) | No | No |
200
- | Rate Limits | Generous | Moderate | Very generous |
201
- | API Key | Optional | Required | Optional |
202
-
203
- ---
204
-
205
- ## Sources
206
-
207
- - [Europe PMC REST API](https://europepmc.org/RestfulWebService)
208
- - [Europe PMC Annotations API](https://europepmc.org/AnnotationsApi)
209
- - [Europe PMC Articles API](https://europepmc.org/ArticlesApi)
210
- - [rOpenSci medrxivr](https://docs.ropensci.org/medrxivr/)
211
- - [bioRxiv TDM Resources](https://www.biorxiv.org/tdm)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/04_OPENALEX_INTEGRATION.md DELETED
@@ -1,303 +0,0 @@
1
- # OpenAlex Integration: The Missing Piece?
2
-
3
- **Status**: NOT Implemented (Candidate for Addition)
4
- **Priority**: HIGH - Could Replace Multiple Tools
5
- **Reference**: Already implemented in `reference_repos/DeepCritical`
6
-
7
- ---
8
-
9
- ## What is OpenAlex?
10
-
11
- OpenAlex is a **fully open** index of the global research system:
12
-
13
- - **209M+ works** (papers, books, datasets)
14
- - **2B+ author records** (disambiguated)
15
- - **124K+ venues** (journals, repositories)
16
- - **109K+ institutions**
17
- - **65K+ concepts** (hierarchical, linked to Wikidata)
18
-
19
- **Free. Open. No API key required.**
20
-
21
- ---
22
-
23
- ## Why OpenAlex for DeepCritical?
24
-
25
- ### Current Architecture
26
-
27
- ```
28
- User Query
29
-
30
- ┌──────────────────────────────────────┐
31
- │ PubMed ClinicalTrials Europe PMC │ ← 3 separate APIs
32
- └──────────────────────────────────────┘
33
-
34
- Orchestrator (deduplicate, judge, synthesize)
35
- ```
36
-
37
- ### With OpenAlex
38
-
39
- ```
40
- User Query
41
-
42
- ┌──────────────────────────────────────┐
43
- │ OpenAlex │ ← Single API
44
- │ (includes PubMed + preprints + │
45
- │ citations + concepts + authors) │
46
- └──────────────────────────────────────┘
47
-
48
- Orchestrator (enrich with CT.gov for trials)
49
- ```
50
-
51
- **OpenAlex already aggregates**:
52
- - PubMed/MEDLINE
53
- - Crossref
54
- - ORCID
55
- - Unpaywall (open access links)
56
- - Microsoft Academic Graph (legacy)
57
- - Preprint servers
58
-
59
- ---
60
-
61
- ## Reference Implementation
62
-
63
- From `reference_repos/DeepCritical/DeepResearch/src/tools/openalex_tools.py`:
64
-
65
- ```python
66
- class OpenAlexFetchTool(ToolRunner):
67
- def __init__(self):
68
- super().__init__(
69
- ToolSpec(
70
- name="openalex_fetch",
71
- description="Fetch OpenAlex work or author",
72
- inputs={"entity": "TEXT", "identifier": "TEXT"},
73
- outputs={"result": "JSON"},
74
- )
75
- )
76
-
77
- def run(self, params: dict[str, Any]) -> ExecutionResult:
78
- entity = params["entity"] # "works", "authors", "venues"
79
- identifier = params["identifier"]
80
- base = "https://api.openalex.org"
81
- url = f"{base}/{entity}/{identifier}"
82
- resp = requests.get(url, timeout=30)
83
- return ExecutionResult(success=True, data={"result": resp.json()})
84
- ```
85
-
86
- ---
87
-
88
- ## OpenAlex API Features
89
-
90
- ### Search Works (Papers)
91
-
92
- ```python
93
- # Search for metformin + cancer papers
94
- url = "https://api.openalex.org/works"
95
- params = {
96
- "search": "metformin cancer drug repurposing",
97
- "filter": "publication_year:>2020,type:article",
98
- "sort": "cited_by_count:desc",
99
- "per_page": 50,
100
- }
101
- ```
102
-
103
- ### Rich Filtering
104
-
105
- ```python
106
- # Filter examples
107
- "publication_year:2023"
108
- "type:article" # vs preprint, book, etc.
109
- "is_oa:true" # Open access only
110
- "concepts.id:C71924100" # Papers about "Medicine"
111
- "authorships.institutions.id:I27837315" # From Harvard
112
- "cited_by_count:>100" # Highly cited
113
- "has_fulltext:true" # Full text available
114
- ```
115
-
116
- ### What You Get Back
117
-
118
- ```json
119
- {
120
- "id": "W2741809807",
121
- "title": "Metformin: A candidate drug for...",
122
- "publication_year": 2023,
123
- "type": "article",
124
- "cited_by_count": 45,
125
- "is_oa": true,
126
- "primary_location": {
127
- "source": {"display_name": "Nature Medicine"},
128
- "pdf_url": "https://...",
129
- "landing_page_url": "https://..."
130
- },
131
- "concepts": [
132
- {"id": "C71924100", "display_name": "Medicine", "score": 0.95},
133
- {"id": "C54355233", "display_name": "Pharmacology", "score": 0.88}
134
- ],
135
- "authorships": [
136
- {
137
- "author": {"id": "A123", "display_name": "John Smith"},
138
- "institutions": [{"display_name": "Harvard Medical School"}]
139
- }
140
- ],
141
- "referenced_works": ["W123", "W456"], # Citations
142
- "related_works": ["W789", "W012"] # Similar papers
143
- }
144
- ```
145
-
146
- ---
147
-
148
- ## Key Advantages Over Current Tools
149
-
150
- ### 1. Citation Network (We Don't Have This!)
151
-
152
- ```python
153
- # Get papers that cite a work
154
- url = f"https://api.openalex.org/works?filter=cites:{work_id}"
155
-
156
- # Get papers cited by a work
157
- # Already in `referenced_works` field
158
- ```
159
-
160
- ### 2. Concept Tagging (We Don't Have This!)
161
-
162
- OpenAlex auto-tags papers with hierarchical concepts:
163
- - "Medicine" → "Pharmacology" → "Drug Repurposing"
164
- - Can search by concept, not just keywords
165
-
166
- ### 3. Author Disambiguation (We Don't Have This!)
167
-
168
- ```python
169
- # Find all works by an author
170
- url = f"https://api.openalex.org/works?filter=authorships.author.id:{author_id}"
171
- ```
172
-
173
- ### 4. Institution Tracking
174
-
175
- ```python
176
- # Find drug repurposing papers from top institutions
177
- url = "https://api.openalex.org/works"
178
- params = {
179
- "search": "drug repurposing",
180
- "filter": "authorships.institutions.id:I27837315", # Harvard
181
- }
182
- ```
183
-
184
- ### 5. Related Works
185
-
186
- Each paper comes with `related_works` - semantically similar papers discovered by OpenAlex's ML.
187
-
188
- ---
189
-
190
- ## Proposed Implementation
191
-
192
- ### New Tool: `src/tools/openalex.py`
193
-
194
- ```python
195
- """OpenAlex search tool for comprehensive scholarly data."""
196
-
197
- import httpx
198
- from src.tools.base import SearchTool
199
- from src.utils.models import Evidence
200
-
201
- class OpenAlexTool(SearchTool):
202
- """Search OpenAlex for scholarly works with rich metadata."""
203
-
204
- name = "openalex"
205
-
206
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
207
- async with httpx.AsyncClient() as client:
208
- resp = await client.get(
209
- "https://api.openalex.org/works",
210
- params={
211
- "search": query,
212
- "filter": "type:article,is_oa:true",
213
- "sort": "cited_by_count:desc",
214
- "per_page": max_results,
215
- "mailto": "[email protected]", # Polite pool
216
- },
217
- )
218
- data = resp.json()
219
-
220
- return [
221
- Evidence(
222
- source="openalex",
223
- title=work["title"],
224
- abstract=work.get("abstract", ""),
225
- url=work["primary_location"]["landing_page_url"],
226
- metadata={
227
- "cited_by_count": work["cited_by_count"],
228
- "concepts": [c["display_name"] for c in work["concepts"][:5]],
229
- "is_open_access": work["is_oa"],
230
- "pdf_url": work["primary_location"].get("pdf_url"),
231
- },
232
- )
233
- for work in data["results"]
234
- ]
235
- ```
236
-
237
- ---
238
-
239
- ## Rate Limits
240
-
241
- OpenAlex is **extremely generous**:
242
-
243
- - No hard rate limit documented
244
- - Recommended: <100,000 requests/day
245
- - **Polite pool**: Add `[email protected]` param for faster responses
246
- - No API key required (optional for priority support)
247
-
248
- ---
249
-
250
- ## Should We Add OpenAlex?
251
-
252
- ### Arguments FOR
253
-
254
- 1. **Already in reference repo** - proven pattern
255
- 2. **Richer data** - citations, concepts, authors
256
- 3. **Single source** - reduces API complexity
257
- 4. **Free & open** - no keys, no limits
258
- 5. **Institution adoption** - Leiden, Sorbonne switched to it
259
-
260
- ### Arguments AGAINST
261
-
262
- 1. **Adds complexity** - another data source
263
- 2. **Overlap** - duplicates some PubMed data
264
- 3. **Not biomedical-focused** - covers all disciplines
265
- 4. **No full text** - still need PMC/Europe PMC for that
266
-
267
- ### Recommendation
268
-
269
- **Add OpenAlex as a 4th source**, don't replace existing tools.
270
-
271
- Use it for:
272
- - Citation network analysis
273
- - Concept-based discovery
274
- - High-impact paper finding
275
- - Author/institution tracking
276
-
277
- Keep PubMed, ClinicalTrials, Europe PMC for:
278
- - Authoritative biomedical search
279
- - Clinical trial data
280
- - Full-text access
281
- - Preprint tracking
282
-
283
- ---
284
-
285
- ## Implementation Priority
286
-
287
- | Task | Effort | Value |
288
- |------|--------|-------|
289
- | Basic search | Low | High |
290
- | Citation network | Medium | Very High |
291
- | Concept filtering | Low | High |
292
- | Related works | Low | High |
293
- | Author tracking | Medium | Medium |
294
-
295
- ---
296
-
297
- ## Sources
298
-
299
- - [OpenAlex Documentation](https://docs.openalex.org)
300
- - [OpenAlex API Overview](https://docs.openalex.org/api)
301
- - [OpenAlex Wikipedia](https://en.wikipedia.org/wiki/OpenAlex)
302
- - [Leiden University Announcement](https://www.leidenranking.com/information/openalex)
303
- - [OpenAlex: A fully-open index (Paper)](https://arxiv.org/abs/2205.01833)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/15_PHASE_OPENALEX.md DELETED
@@ -1,603 +0,0 @@
1
- # Phase 15: OpenAlex Integration
2
-
3
- **Priority**: HIGH - Biggest bang for buck
4
- **Effort**: ~2-3 hours
5
- **Dependencies**: None (existing codebase patterns sufficient)
6
-
7
- ---
8
-
9
- ## Prerequisites (COMPLETED)
10
-
11
- The following model changes have been implemented to support this integration:
12
-
13
- 1. **`SourceName` Literal Updated** (`src/utils/models.py:9`)
14
- ```python
15
- SourceName = Literal["pubmed", "clinicaltrials", "europepmc", "preprint", "openalex"]
16
- ```
17
- - Without this, `source="openalex"` would fail Pydantic validation
18
-
19
- 2. **`Evidence.metadata` Field Added** (`src/utils/models.py:39-42`)
20
- ```python
21
- metadata: dict[str, Any] = Field(
22
- default_factory=dict,
23
- description="Additional metadata (e.g., cited_by_count, concepts, is_open_access)",
24
- )
25
- ```
26
- - Required for storing `cited_by_count`, `concepts`, etc.
27
- - Model is still frozen - metadata must be passed at construction time
28
-
29
- 3. **`__init__.py` Exports Updated** (`src/tools/__init__.py`)
30
- - All tools are now exported: `ClinicalTrialsTool`, `EuropePMCTool`, `PubMedTool`
31
- - OpenAlexTool should be added here after implementation
32
-
33
- ---
34
-
35
- ## Overview
36
-
37
- Add OpenAlex as a 4th data source for comprehensive scholarly data including:
38
- - Citation networks (who cites whom)
39
- - Concept tagging (hierarchical topic classification)
40
- - Author disambiguation
41
- - 209M+ works indexed
42
-
43
- **Why OpenAlex?**
44
- - Free, no API key required
45
- - Already implemented in reference repo
46
- - Provides citation data we don't have
47
- - Aggregates PubMed + preprints + more
48
-
49
- ---
50
-
51
- ## TDD Implementation Plan
52
-
53
- ### Step 1: Write the Tests First
54
-
55
- **File**: `tests/unit/tools/test_openalex.py`
56
-
57
- ```python
58
- """Tests for OpenAlex search tool."""
59
-
60
- import pytest
61
- import respx
62
- from httpx import Response
63
-
64
- from src.tools.openalex import OpenAlexTool
65
- from src.utils.models import Evidence
66
-
67
-
68
- class TestOpenAlexTool:
69
- """Test suite for OpenAlex search functionality."""
70
-
71
- @pytest.fixture
72
- def tool(self) -> OpenAlexTool:
73
- return OpenAlexTool()
74
-
75
- def test_name_property(self, tool: OpenAlexTool) -> None:
76
- """Tool should identify itself as 'openalex'."""
77
- assert tool.name == "openalex"
78
-
79
- @respx.mock
80
- @pytest.mark.asyncio
81
- async def test_search_returns_evidence(self, tool: OpenAlexTool) -> None:
82
- """Search should return list of Evidence objects."""
83
- mock_response = {
84
- "results": [
85
- {
86
- "id": "W2741809807",
87
- "title": "Metformin and cancer: A systematic review",
88
- "publication_year": 2023,
89
- "cited_by_count": 45,
90
- "type": "article",
91
- "is_oa": True,
92
- "primary_location": {
93
- "source": {"display_name": "Nature Medicine"},
94
- "landing_page_url": "https://doi.org/10.1038/example",
95
- "pdf_url": None,
96
- },
97
- "abstract_inverted_index": {
98
- "Metformin": [0],
99
- "shows": [1],
100
- "anticancer": [2],
101
- "effects": [3],
102
- },
103
- "concepts": [
104
- {"display_name": "Medicine", "score": 0.95},
105
- {"display_name": "Oncology", "score": 0.88},
106
- ],
107
- "authorships": [
108
- {
109
- "author": {"display_name": "John Smith"},
110
- "institutions": [{"display_name": "Harvard"}],
111
- }
112
- ],
113
- }
114
- ]
115
- }
116
-
117
- respx.get("https://api.openalex.org/works").mock(
118
- return_value=Response(200, json=mock_response)
119
- )
120
-
121
- results = await tool.search("metformin cancer", max_results=10)
122
-
123
- assert len(results) == 1
124
- assert isinstance(results[0], Evidence)
125
- assert "Metformin and cancer" in results[0].citation.title
126
- assert results[0].citation.source == "openalex"
127
-
128
- @respx.mock
129
- @pytest.mark.asyncio
130
- async def test_search_empty_results(self, tool: OpenAlexTool) -> None:
131
- """Search with no results should return empty list."""
132
- respx.get("https://api.openalex.org/works").mock(
133
- return_value=Response(200, json={"results": []})
134
- )
135
-
136
- results = await tool.search("xyznonexistentquery123")
137
- assert results == []
138
-
139
- @respx.mock
140
- @pytest.mark.asyncio
141
- async def test_search_handles_missing_abstract(self, tool: OpenAlexTool) -> None:
142
- """Tool should handle papers without abstracts."""
143
- mock_response = {
144
- "results": [
145
- {
146
- "id": "W123",
147
- "title": "Paper without abstract",
148
- "publication_year": 2023,
149
- "cited_by_count": 10,
150
- "type": "article",
151
- "is_oa": False,
152
- "primary_location": {
153
- "source": {"display_name": "Journal"},
154
- "landing_page_url": "https://example.com",
155
- },
156
- "abstract_inverted_index": None,
157
- "concepts": [],
158
- "authorships": [],
159
- }
160
- ]
161
- }
162
-
163
- respx.get("https://api.openalex.org/works").mock(
164
- return_value=Response(200, json=mock_response)
165
- )
166
-
167
- results = await tool.search("test query")
168
- assert len(results) == 1
169
- assert results[0].content == "" # No abstract
170
-
171
- @respx.mock
172
- @pytest.mark.asyncio
173
- async def test_search_extracts_citation_count(self, tool: OpenAlexTool) -> None:
174
- """Citation count should be in metadata."""
175
- mock_response = {
176
- "results": [
177
- {
178
- "id": "W456",
179
- "title": "Highly cited paper",
180
- "publication_year": 2020,
181
- "cited_by_count": 500,
182
- "type": "article",
183
- "is_oa": True,
184
- "primary_location": {
185
- "source": {"display_name": "Science"},
186
- "landing_page_url": "https://example.com",
187
- },
188
- "abstract_inverted_index": {"Test": [0]},
189
- "concepts": [],
190
- "authorships": [],
191
- }
192
- ]
193
- }
194
-
195
- respx.get("https://api.openalex.org/works").mock(
196
- return_value=Response(200, json=mock_response)
197
- )
198
-
199
- results = await tool.search("highly cited")
200
- assert results[0].metadata["cited_by_count"] == 500
201
-
202
- @respx.mock
203
- @pytest.mark.asyncio
204
- async def test_search_extracts_concepts(self, tool: OpenAlexTool) -> None:
205
- """Concepts should be extracted for semantic discovery."""
206
- mock_response = {
207
- "results": [
208
- {
209
- "id": "W789",
210
- "title": "Drug repurposing study",
211
- "publication_year": 2023,
212
- "cited_by_count": 25,
213
- "type": "article",
214
- "is_oa": True,
215
- "primary_location": {
216
- "source": {"display_name": "PLOS ONE"},
217
- "landing_page_url": "https://example.com",
218
- },
219
- "abstract_inverted_index": {"Drug": [0], "repurposing": [1]},
220
- "concepts": [
221
- {"display_name": "Pharmacology", "score": 0.92},
222
- {"display_name": "Drug Discovery", "score": 0.85},
223
- {"display_name": "Medicine", "score": 0.80},
224
- ],
225
- "authorships": [],
226
- }
227
- ]
228
- }
229
-
230
- respx.get("https://api.openalex.org/works").mock(
231
- return_value=Response(200, json=mock_response)
232
- )
233
-
234
- results = await tool.search("drug repurposing")
235
- assert "Pharmacology" in results[0].metadata["concepts"]
236
- assert "Drug Discovery" in results[0].metadata["concepts"]
237
-
238
- @respx.mock
239
- @pytest.mark.asyncio
240
- async def test_search_api_error_raises_search_error(
241
- self, tool: OpenAlexTool
242
- ) -> None:
243
- """API errors should raise SearchError."""
244
- from src.utils.exceptions import SearchError
245
-
246
- respx.get("https://api.openalex.org/works").mock(
247
- return_value=Response(500, text="Internal Server Error")
248
- )
249
-
250
- with pytest.raises(SearchError):
251
- await tool.search("test query")
252
-
253
- def test_reconstruct_abstract(self, tool: OpenAlexTool) -> None:
254
- """Test abstract reconstruction from inverted index."""
255
- inverted_index = {
256
- "Metformin": [0, 5],
257
- "is": [1],
258
- "a": [2],
259
- "diabetes": [3],
260
- "drug": [4],
261
- "effective": [6],
262
- }
263
- abstract = tool._reconstruct_abstract(inverted_index)
264
- assert abstract == "Metformin is a diabetes drug Metformin effective"
265
- ```
266
-
267
- ---
268
-
269
- ### Step 2: Create the Implementation
270
-
271
- **File**: `src/tools/openalex.py`
272
-
273
- ```python
274
- """OpenAlex search tool for comprehensive scholarly data."""
275
-
276
- from typing import Any
277
-
278
- import httpx
279
- from tenacity import retry, stop_after_attempt, wait_exponential
280
-
281
- from src.utils.exceptions import SearchError
282
- from src.utils.models import Citation, Evidence
283
-
284
-
285
- class OpenAlexTool:
286
- """
287
- Search OpenAlex for scholarly works with rich metadata.
288
-
289
- OpenAlex provides:
290
- - 209M+ scholarly works
291
- - Citation counts and networks
292
- - Concept tagging (hierarchical)
293
- - Author disambiguation
294
- - Open access links
295
-
296
- API Docs: https://docs.openalex.org/
297
- """
298
-
299
- BASE_URL = "https://api.openalex.org/works"
300
-
301
- def __init__(self, email: str | None = None) -> None:
302
- """
303
- Initialize OpenAlex tool.
304
-
305
- Args:
306
- email: Optional email for polite pool (faster responses)
307
- """
308
- self.email = email or "[email protected]"
309
-
310
- @property
311
- def name(self) -> str:
312
- return "openalex"
313
-
314
- @retry(
315
- stop=stop_after_attempt(3),
316
- wait=wait_exponential(multiplier=1, min=1, max=10),
317
- reraise=True,
318
- )
319
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
320
- """
321
- Search OpenAlex for scholarly works.
322
-
323
- Args:
324
- query: Search terms
325
- max_results: Maximum results to return (max 200 per request)
326
-
327
- Returns:
328
- List of Evidence objects with citation metadata
329
-
330
- Raises:
331
- SearchError: If API request fails
332
- """
333
- params = {
334
- "search": query,
335
- "filter": "type:article", # Only peer-reviewed articles
336
- "sort": "cited_by_count:desc", # Most cited first
337
- "per_page": min(max_results, 200),
338
- "mailto": self.email, # Polite pool for faster responses
339
- }
340
-
341
- async with httpx.AsyncClient(timeout=30.0) as client:
342
- try:
343
- response = await client.get(self.BASE_URL, params=params)
344
- response.raise_for_status()
345
-
346
- data = response.json()
347
- results = data.get("results", [])
348
-
349
- return [self._to_evidence(work) for work in results[:max_results]]
350
-
351
- except httpx.HTTPStatusError as e:
352
- raise SearchError(f"OpenAlex API error: {e}") from e
353
- except httpx.RequestError as e:
354
- raise SearchError(f"OpenAlex connection failed: {e}") from e
355
-
356
- def _to_evidence(self, work: dict[str, Any]) -> Evidence:
357
- """Convert OpenAlex work to Evidence object."""
358
- title = work.get("title", "Untitled")
359
- pub_year = work.get("publication_year", "Unknown")
360
- cited_by = work.get("cited_by_count", 0)
361
- is_oa = work.get("is_oa", False)
362
-
363
- # Reconstruct abstract from inverted index
364
- abstract_index = work.get("abstract_inverted_index")
365
- abstract = self._reconstruct_abstract(abstract_index) if abstract_index else ""
366
-
367
- # Extract concepts (top 5)
368
- concepts = [
369
- c.get("display_name", "")
370
- for c in work.get("concepts", [])[:5]
371
- if c.get("display_name")
372
- ]
373
-
374
- # Extract authors (top 5)
375
- authorships = work.get("authorships", [])
376
- authors = [
377
- a.get("author", {}).get("display_name", "")
378
- for a in authorships[:5]
379
- if a.get("author", {}).get("display_name")
380
- ]
381
-
382
- # Get URL
383
- primary_loc = work.get("primary_location") or {}
384
- url = primary_loc.get("landing_page_url", "")
385
- if not url:
386
- # Fallback to OpenAlex page
387
- work_id = work.get("id", "").replace("https://openalex.org/", "")
388
- url = f"https://openalex.org/{work_id}"
389
-
390
- return Evidence(
391
- content=abstract[:2000],
392
- citation=Citation(
393
- source="openalex",
394
- title=title[:500],
395
- url=url,
396
- date=str(pub_year),
397
- authors=authors,
398
- ),
399
- relevance=min(0.9, 0.5 + (cited_by / 1000)), # Boost by citations
400
- metadata={
401
- "cited_by_count": cited_by,
402
- "is_open_access": is_oa,
403
- "concepts": concepts,
404
- "pdf_url": primary_loc.get("pdf_url"),
405
- },
406
- )
407
-
408
- def _reconstruct_abstract(
409
- self, inverted_index: dict[str, list[int]]
410
- ) -> str:
411
- """
412
- Reconstruct abstract from OpenAlex inverted index format.
413
-
414
- OpenAlex stores abstracts as {"word": [position1, position2, ...]}.
415
- This rebuilds the original text.
416
- """
417
- if not inverted_index:
418
- return ""
419
-
420
- # Build position -> word mapping
421
- position_word: dict[int, str] = {}
422
- for word, positions in inverted_index.items():
423
- for pos in positions:
424
- position_word[pos] = word
425
-
426
- # Reconstruct in order
427
- if not position_word:
428
- return ""
429
-
430
- max_pos = max(position_word.keys())
431
- words = [position_word.get(i, "") for i in range(max_pos + 1)]
432
- return " ".join(w for w in words if w)
433
- ```
434
-
435
- ---
436
-
437
- ### Step 3: Register in Search Handler
438
-
439
- **File**: `src/tools/search_handler.py` (add to imports and tool list)
440
-
441
- ```python
442
- # Add import
443
- from src.tools.openalex import OpenAlexTool
444
-
445
- # Add to _create_tools method
446
- def _create_tools(self) -> list[SearchTool]:
447
- return [
448
- PubMedTool(),
449
- ClinicalTrialsTool(),
450
- EuropePMCTool(),
451
- OpenAlexTool(), # NEW
452
- ]
453
- ```
454
-
455
- ---
456
-
457
- ### Step 4: Update `__init__.py`
458
-
459
- **File**: `src/tools/__init__.py`
460
-
461
- ```python
462
- from src.tools.openalex import OpenAlexTool
463
-
464
- __all__ = [
465
- "PubMedTool",
466
- "ClinicalTrialsTool",
467
- "EuropePMCTool",
468
- "OpenAlexTool", # NEW
469
- # ...
470
- ]
471
- ```
472
-
473
- ---
474
-
475
- ## Demo Script
476
-
477
- **File**: `examples/openalex_demo.py`
478
-
479
- ```python
480
- #!/usr/bin/env python3
481
- """Demo script to verify OpenAlex integration."""
482
-
483
- import asyncio
484
- from src.tools.openalex import OpenAlexTool
485
-
486
-
487
- async def main():
488
- """Run OpenAlex search demo."""
489
- tool = OpenAlexTool()
490
-
491
- print("=" * 60)
492
- print("OpenAlex Integration Demo")
493
- print("=" * 60)
494
-
495
- # Test 1: Basic drug repurposing search
496
- print("\n[Test 1] Searching for 'metformin cancer drug repurposing'...")
497
- results = await tool.search("metformin cancer drug repurposing", max_results=5)
498
-
499
- for i, evidence in enumerate(results, 1):
500
- print(f"\n--- Result {i} ---")
501
- print(f"Title: {evidence.citation.title}")
502
- print(f"Year: {evidence.citation.date}")
503
- print(f"Citations: {evidence.metadata.get('cited_by_count', 'N/A')}")
504
- print(f"Concepts: {', '.join(evidence.metadata.get('concepts', []))}")
505
- print(f"Open Access: {evidence.metadata.get('is_open_access', False)}")
506
- print(f"URL: {evidence.citation.url}")
507
- if evidence.content:
508
- print(f"Abstract: {evidence.content[:200]}...")
509
-
510
- # Test 2: High-impact papers
511
- print("\n" + "=" * 60)
512
- print("[Test 2] Finding highly-cited papers on 'long COVID treatment'...")
513
- results = await tool.search("long COVID treatment", max_results=3)
514
-
515
- for evidence in results:
516
- print(f"\n- {evidence.citation.title}")
517
- print(f" Citations: {evidence.metadata.get('cited_by_count', 0)}")
518
-
519
- print("\n" + "=" * 60)
520
- print("Demo complete!")
521
-
522
-
523
- if __name__ == "__main__":
524
- asyncio.run(main())
525
- ```
526
-
527
- ---
528
-
529
- ## Verification Checklist
530
-
531
- ### Unit Tests
532
- ```bash
533
- # Run just OpenAlex tests
534
- uv run pytest tests/unit/tools/test_openalex.py -v
535
-
536
- # Expected: All tests pass
537
- ```
538
-
539
- ### Integration Test (Manual)
540
- ```bash
541
- # Run demo script with real API
542
- uv run python examples/openalex_demo.py
543
-
544
- # Expected: Real results from OpenAlex API
545
- ```
546
-
547
- ### Full Test Suite
548
- ```bash
549
- # Ensure nothing broke
550
- make check
551
-
552
- # Expected: All 110+ tests pass, mypy clean
553
- ```
554
-
555
- ---
556
-
557
- ## Success Criteria
558
-
559
- 1. **Unit tests pass**: All mocked tests in `test_openalex.py` pass
560
- 2. **Integration works**: Demo script returns real results
561
- 3. **No regressions**: `make check` passes completely
562
- 4. **SearchHandler integration**: OpenAlex appears in search results alongside other sources
563
- 5. **Citation metadata**: Results include `cited_by_count`, `concepts`, `is_open_access`
564
-
565
- ---
566
-
567
- ## Future Enhancements (P2)
568
-
569
- Once basic integration works:
570
-
571
- 1. **Citation Network Queries**
572
- ```python
573
- # Get papers citing a specific work
574
- async def get_citing_works(self, work_id: str) -> list[Evidence]:
575
- params = {"filter": f"cites:{work_id}"}
576
- ...
577
- ```
578
-
579
- 2. **Concept-Based Search**
580
- ```python
581
- # Search by OpenAlex concept ID
582
- async def search_by_concept(self, concept_id: str) -> list[Evidence]:
583
- params = {"filter": f"concepts.id:{concept_id}"}
584
- ...
585
- ```
586
-
587
- 3. **Author Tracking**
588
- ```python
589
- # Find all works by an author
590
- async def search_by_author(self, author_id: str) -> list[Evidence]:
591
- params = {"filter": f"authorships.author.id:{author_id}"}
592
- ...
593
- ```
594
-
595
- ---
596
-
597
- ## Notes
598
-
599
- - OpenAlex is **very generous** with rate limits (no documented hard limit)
600
- - Adding `mailto` parameter gives priority access (polite pool)
601
- - Abstract is stored as inverted index - must reconstruct
602
- - Citation count is a good proxy for paper quality/impact
603
- - Consider caching responses for repeated queries
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/16_PHASE_PUBMED_FULLTEXT.md DELETED
@@ -1,586 +0,0 @@
1
- # Phase 16: PubMed Full-Text Retrieval
2
-
3
- **Priority**: MEDIUM - Enhances evidence quality
4
- **Effort**: ~3 hours
5
- **Dependencies**: None (existing PubMed tool sufficient)
6
-
7
- ---
8
-
9
- ## Prerequisites (COMPLETED)
10
-
11
- The `Evidence.metadata` field has been added to `src/utils/models.py` to support:
12
- ```python
13
- metadata={"has_fulltext": True}
14
- ```
15
-
16
- ---
17
-
18
- ## Architecture Decision: Constructor Parameter vs Method Parameter
19
-
20
- **IMPORTANT**: The original spec proposed `include_fulltext` as a method parameter:
21
- ```python
22
- # WRONG - SearchHandler won't pass this parameter
23
- async def search(self, query: str, max_results: int = 10, include_fulltext: bool = False):
24
- ```
25
-
26
- **Problem**: `SearchHandler` calls `tool.search(query, max_results)` uniformly across all tools.
27
- It has no mechanism to pass tool-specific parameters like `include_fulltext`.
28
-
29
- **Solution**: Use constructor parameter instead:
30
- ```python
31
- # CORRECT - Configured at instantiation time
32
- class PubMedTool:
33
- def __init__(self, api_key: str | None = None, include_fulltext: bool = False):
34
- self.include_fulltext = include_fulltext
35
- ...
36
- ```
37
-
38
- This way, you can create a full-text-enabled PubMed tool:
39
- ```python
40
- # In orchestrator or wherever tools are created
41
- tools = [
42
- PubMedTool(include_fulltext=True), # Full-text enabled
43
- ClinicalTrialsTool(),
44
- EuropePMCTool(),
45
- ]
46
- ```
47
-
48
- ---
49
-
50
- ## Overview
51
-
52
- Add full-text retrieval for PubMed papers via the BioC API, enabling:
53
- - Complete paper text for open-access PMC papers
54
- - Structured sections (intro, methods, results, discussion)
55
- - Better evidence for LLM synthesis
56
-
57
- **Why Full-Text?**
58
- - Abstracts only give ~200-300 words
59
- - Full text provides detailed methods, results, figures
60
- - Reference repo already has this implemented
61
- - Makes LLM judgments more accurate
62
-
63
- ---
64
-
65
- ## TDD Implementation Plan
66
-
67
- ### Step 1: Write the Tests First
68
-
69
- **File**: `tests/unit/tools/test_pubmed_fulltext.py`
70
-
71
- ```python
72
- """Tests for PubMed full-text retrieval."""
73
-
74
- import pytest
75
- import respx
76
- from httpx import Response
77
-
78
- from src.tools.pubmed import PubMedTool
79
-
80
-
81
- class TestPubMedFullText:
82
- """Test suite for PubMed full-text functionality."""
83
-
84
- @pytest.fixture
85
- def tool(self) -> PubMedTool:
86
- return PubMedTool()
87
-
88
- @respx.mock
89
- @pytest.mark.asyncio
90
- async def test_get_pmc_id_success(self, tool: PubMedTool) -> None:
91
- """Should convert PMID to PMCID for full-text access."""
92
- mock_response = {
93
- "records": [
94
- {
95
- "pmid": "12345678",
96
- "pmcid": "PMC1234567",
97
- }
98
- ]
99
- }
100
-
101
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
102
- return_value=Response(200, json=mock_response)
103
- )
104
-
105
- pmcid = await tool.get_pmc_id("12345678")
106
- assert pmcid == "PMC1234567"
107
-
108
- @respx.mock
109
- @pytest.mark.asyncio
110
- async def test_get_pmc_id_not_in_pmc(self, tool: PubMedTool) -> None:
111
- """Should return None if paper not in PMC."""
112
- mock_response = {
113
- "records": [
114
- {
115
- "pmid": "12345678",
116
- # No pmcid means not in PMC
117
- }
118
- ]
119
- }
120
-
121
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
122
- return_value=Response(200, json=mock_response)
123
- )
124
-
125
- pmcid = await tool.get_pmc_id("12345678")
126
- assert pmcid is None
127
-
128
- @respx.mock
129
- @pytest.mark.asyncio
130
- async def test_get_fulltext_success(self, tool: PubMedTool) -> None:
131
- """Should retrieve full text for PMC papers."""
132
- # Mock BioC API response
133
- mock_bioc = {
134
- "documents": [
135
- {
136
- "passages": [
137
- {
138
- "infons": {"section_type": "INTRO"},
139
- "text": "Introduction text here.",
140
- },
141
- {
142
- "infons": {"section_type": "METHODS"},
143
- "text": "Methods description here.",
144
- },
145
- {
146
- "infons": {"section_type": "RESULTS"},
147
- "text": "Results summary here.",
148
- },
149
- {
150
- "infons": {"section_type": "DISCUSS"},
151
- "text": "Discussion and conclusions.",
152
- },
153
- ]
154
- }
155
- ]
156
- }
157
-
158
- respx.get(
159
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
160
- ).mock(return_value=Response(200, json=mock_bioc))
161
-
162
- fulltext = await tool.get_fulltext("12345678")
163
-
164
- assert fulltext is not None
165
- assert "Introduction text here" in fulltext
166
- assert "Methods description here" in fulltext
167
- assert "Results summary here" in fulltext
168
-
169
- @respx.mock
170
- @pytest.mark.asyncio
171
- async def test_get_fulltext_not_available(self, tool: PubMedTool) -> None:
172
- """Should return None if full text not available."""
173
- respx.get(
174
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/99999999/unicode"
175
- ).mock(return_value=Response(404))
176
-
177
- fulltext = await tool.get_fulltext("99999999")
178
- assert fulltext is None
179
-
180
- @respx.mock
181
- @pytest.mark.asyncio
182
- async def test_get_fulltext_structured(self, tool: PubMedTool) -> None:
183
- """Should return structured sections dict."""
184
- mock_bioc = {
185
- "documents": [
186
- {
187
- "passages": [
188
- {"infons": {"section_type": "INTRO"}, "text": "Intro..."},
189
- {"infons": {"section_type": "METHODS"}, "text": "Methods..."},
190
- {"infons": {"section_type": "RESULTS"}, "text": "Results..."},
191
- {"infons": {"section_type": "DISCUSS"}, "text": "Discussion..."},
192
- ]
193
- }
194
- ]
195
- }
196
-
197
- respx.get(
198
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
199
- ).mock(return_value=Response(200, json=mock_bioc))
200
-
201
- sections = await tool.get_fulltext_structured("12345678")
202
-
203
- assert sections is not None
204
- assert "introduction" in sections
205
- assert "methods" in sections
206
- assert "results" in sections
207
- assert "discussion" in sections
208
-
209
- @respx.mock
210
- @pytest.mark.asyncio
211
- async def test_search_with_fulltext_enabled(self) -> None:
212
- """Search should include full text when tool is configured for it."""
213
- # Create tool WITH full-text enabled via constructor
214
- tool = PubMedTool(include_fulltext=True)
215
-
216
- # Mock esearch
217
- respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi").mock(
218
- return_value=Response(
219
- 200, json={"esearchresult": {"idlist": ["12345678"]}}
220
- )
221
- )
222
-
223
- # Mock efetch (abstract)
224
- mock_xml = """
225
- <PubmedArticleSet>
226
- <PubmedArticle>
227
- <MedlineCitation>
228
- <PMID>12345678</PMID>
229
- <Article>
230
- <ArticleTitle>Test Paper</ArticleTitle>
231
- <Abstract><AbstractText>Short abstract.</AbstractText></Abstract>
232
- <AuthorList><Author><LastName>Smith</LastName></Author></AuthorList>
233
- </Article>
234
- </MedlineCitation>
235
- </PubmedArticle>
236
- </PubmedArticleSet>
237
- """
238
- respx.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi").mock(
239
- return_value=Response(200, text=mock_xml)
240
- )
241
-
242
- # Mock ID converter
243
- respx.get("https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/").mock(
244
- return_value=Response(
245
- 200, json={"records": [{"pmid": "12345678", "pmcid": "PMC1234567"}]}
246
- )
247
- )
248
-
249
- # Mock BioC full text
250
- mock_bioc = {
251
- "documents": [
252
- {
253
- "passages": [
254
- {"infons": {"section_type": "INTRO"}, "text": "Full intro..."},
255
- ]
256
- }
257
- ]
258
- }
259
- respx.get(
260
- "https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/12345678/unicode"
261
- ).mock(return_value=Response(200, json=mock_bioc))
262
-
263
- # NOTE: No include_fulltext param - it's set via constructor
264
- results = await tool.search("test", max_results=1)
265
-
266
- assert len(results) == 1
267
- # Full text should be appended or replace abstract
268
- assert "Full intro" in results[0].content or "Short abstract" in results[0].content
269
- ```
270
-
271
- ---
272
-
273
- ### Step 2: Implement Full-Text Methods
274
-
275
- **File**: `src/tools/pubmed.py` (additions to existing class)
276
-
277
- ```python
278
- # Add these methods to PubMedTool class
279
-
280
- async def get_pmc_id(self, pmid: str) -> str | None:
281
- """
282
- Convert PMID to PMCID for full-text access.
283
-
284
- Args:
285
- pmid: PubMed ID
286
-
287
- Returns:
288
- PMCID if paper is in PMC, None otherwise
289
- """
290
- url = "https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/"
291
- params = {"ids": pmid, "format": "json"}
292
-
293
- async with httpx.AsyncClient(timeout=30.0) as client:
294
- try:
295
- response = await client.get(url, params=params)
296
- response.raise_for_status()
297
- data = response.json()
298
-
299
- records = data.get("records", [])
300
- if records and records[0].get("pmcid"):
301
- return records[0]["pmcid"]
302
- return None
303
-
304
- except httpx.HTTPError:
305
- return None
306
-
307
-
308
- async def get_fulltext(self, pmid: str) -> str | None:
309
- """
310
- Get full text for a PubMed paper via BioC API.
311
-
312
- Only works for open-access papers in PubMed Central.
313
-
314
- Args:
315
- pmid: PubMed ID
316
-
317
- Returns:
318
- Full text as string, or None if not available
319
- """
320
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
321
-
322
- async with httpx.AsyncClient(timeout=60.0) as client:
323
- try:
324
- response = await client.get(url)
325
- if response.status_code == 404:
326
- return None
327
- response.raise_for_status()
328
- data = response.json()
329
-
330
- # Extract text from all passages
331
- documents = data.get("documents", [])
332
- if not documents:
333
- return None
334
-
335
- passages = documents[0].get("passages", [])
336
- text_parts = [p.get("text", "") for p in passages if p.get("text")]
337
-
338
- return "\n\n".join(text_parts) if text_parts else None
339
-
340
- except httpx.HTTPError:
341
- return None
342
-
343
-
344
- async def get_fulltext_structured(self, pmid: str) -> dict[str, str] | None:
345
- """
346
- Get structured full text with sections.
347
-
348
- Args:
349
- pmid: PubMed ID
350
-
351
- Returns:
352
- Dict mapping section names to text, or None if not available
353
- """
354
- url = f"https://www.ncbi.nlm.nih.gov/research/bionlp/RESTful/pmcoa.cgi/BioC_json/{pmid}/unicode"
355
-
356
- async with httpx.AsyncClient(timeout=60.0) as client:
357
- try:
358
- response = await client.get(url)
359
- if response.status_code == 404:
360
- return None
361
- response.raise_for_status()
362
- data = response.json()
363
-
364
- documents = data.get("documents", [])
365
- if not documents:
366
- return None
367
-
368
- # Map section types to readable names
369
- section_map = {
370
- "INTRO": "introduction",
371
- "METHODS": "methods",
372
- "RESULTS": "results",
373
- "DISCUSS": "discussion",
374
- "CONCL": "conclusion",
375
- "ABSTRACT": "abstract",
376
- }
377
-
378
- sections: dict[str, list[str]] = {}
379
- for passage in documents[0].get("passages", []):
380
- section_type = passage.get("infons", {}).get("section_type", "other")
381
- section_name = section_map.get(section_type, "other")
382
- text = passage.get("text", "")
383
-
384
- if text:
385
- if section_name not in sections:
386
- sections[section_name] = []
387
- sections[section_name].append(text)
388
-
389
- # Join multiple passages per section
390
- return {k: "\n\n".join(v) for k, v in sections.items()}
391
-
392
- except httpx.HTTPError:
393
- return None
394
- ```
395
-
396
- ---
397
-
398
- ### Step 3: Update Constructor and Search Method
399
-
400
- Add full-text flag to constructor and update search to use it:
401
-
402
- ```python
403
- class PubMedTool:
404
- """Search tool for PubMed/NCBI."""
405
-
406
- def __init__(
407
- self,
408
- api_key: str | None = None,
409
- include_fulltext: bool = False, # NEW CONSTRUCTOR PARAM
410
- ) -> None:
411
- self.api_key = api_key or settings.ncbi_api_key
412
- if self.api_key == "your-ncbi-key-here":
413
- self.api_key = None
414
- self._last_request_time = 0.0
415
- self.include_fulltext = include_fulltext # Store for use in search()
416
-
417
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
418
- """
419
- Search PubMed and return evidence.
420
-
421
- Note: Full-text enrichment is controlled by constructor parameter,
422
- not method parameter, because SearchHandler doesn't pass extra args.
423
- """
424
- # ... existing search logic ...
425
-
426
- evidence_list = self._parse_pubmed_xml(fetch_resp.text)
427
-
428
- # Optionally enrich with full text (if configured at construction)
429
- if self.include_fulltext:
430
- evidence_list = await self._enrich_with_fulltext(evidence_list)
431
-
432
- return evidence_list
433
-
434
-
435
- async def _enrich_with_fulltext(
436
- self, evidence_list: list[Evidence]
437
- ) -> list[Evidence]:
438
- """Attempt to add full text to evidence items."""
439
- enriched = []
440
-
441
- for evidence in evidence_list:
442
- # Extract PMID from URL
443
- url = evidence.citation.url
444
- pmid = url.rstrip("/").split("/")[-1] if url else None
445
-
446
- if pmid:
447
- fulltext = await self.get_fulltext(pmid)
448
- if fulltext:
449
- # Replace abstract with full text (truncated)
450
- evidence = Evidence(
451
- content=fulltext[:8000], # Larger limit for full text
452
- citation=evidence.citation,
453
- relevance=evidence.relevance,
454
- metadata={
455
- **evidence.metadata,
456
- "has_fulltext": True,
457
- },
458
- )
459
-
460
- enriched.append(evidence)
461
-
462
- return enriched
463
- ```
464
-
465
- ---
466
-
467
- ## Demo Script
468
-
469
- **File**: `examples/pubmed_fulltext_demo.py`
470
-
471
- ```python
472
- #!/usr/bin/env python3
473
- """Demo script to verify PubMed full-text retrieval."""
474
-
475
- import asyncio
476
- from src.tools.pubmed import PubMedTool
477
-
478
-
479
- async def main():
480
- """Run PubMed full-text demo."""
481
- tool = PubMedTool()
482
-
483
- print("=" * 60)
484
- print("PubMed Full-Text Demo")
485
- print("=" * 60)
486
-
487
- # Test 1: Convert PMID to PMCID
488
- print("\n[Test 1] Converting PMID to PMCID...")
489
- # Use a known open-access paper
490
- test_pmid = "34450029" # Example: COVID-related open-access paper
491
- pmcid = await tool.get_pmc_id(test_pmid)
492
- print(f"PMID {test_pmid} -> PMCID: {pmcid or 'Not in PMC'}")
493
-
494
- # Test 2: Get full text
495
- print("\n[Test 2] Fetching full text...")
496
- if pmcid:
497
- fulltext = await tool.get_fulltext(test_pmid)
498
- if fulltext:
499
- print(f"Full text length: {len(fulltext)} characters")
500
- print(f"Preview: {fulltext[:500]}...")
501
- else:
502
- print("Full text not available")
503
-
504
- # Test 3: Get structured sections
505
- print("\n[Test 3] Fetching structured sections...")
506
- if pmcid:
507
- sections = await tool.get_fulltext_structured(test_pmid)
508
- if sections:
509
- print("Available sections:")
510
- for section, text in sections.items():
511
- print(f" - {section}: {len(text)} chars")
512
- else:
513
- print("Structured text not available")
514
-
515
- # Test 4: Search with full text
516
- print("\n[Test 4] Search with full-text enrichment...")
517
- results = await tool.search(
518
- "metformin cancer open access",
519
- max_results=3,
520
- include_fulltext=True
521
- )
522
-
523
- for i, evidence in enumerate(results, 1):
524
- has_ft = evidence.metadata.get("has_fulltext", False)
525
- print(f"\n--- Result {i} ---")
526
- print(f"Title: {evidence.citation.title}")
527
- print(f"Has Full Text: {has_ft}")
528
- print(f"Content Length: {len(evidence.content)} chars")
529
-
530
- print("\n" + "=" * 60)
531
- print("Demo complete!")
532
-
533
-
534
- if __name__ == "__main__":
535
- asyncio.run(main())
536
- ```
537
-
538
- ---
539
-
540
- ## Verification Checklist
541
-
542
- ### Unit Tests
543
- ```bash
544
- # Run full-text tests
545
- uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
546
-
547
- # Run all PubMed tests
548
- uv run pytest tests/unit/tools/test_pubmed.py -v
549
-
550
- # Expected: All tests pass
551
- ```
552
-
553
- ### Integration Test (Manual)
554
- ```bash
555
- # Run demo with real API
556
- uv run python examples/pubmed_fulltext_demo.py
557
-
558
- # Expected: Real full text from PMC papers
559
- ```
560
-
561
- ### Full Test Suite
562
- ```bash
563
- make check
564
- # Expected: All tests pass, mypy clean
565
- ```
566
-
567
- ---
568
-
569
- ## Success Criteria
570
-
571
- 1. **ID Conversion works**: PMID -> PMCID conversion successful
572
- 2. **Full text retrieval works**: BioC API returns paper text
573
- 3. **Structured sections work**: Can get intro/methods/results/discussion separately
574
- 4. **Search integration works**: `include_fulltext=True` enriches results
575
- 5. **No regressions**: Existing tests still pass
576
- 6. **Graceful degradation**: Non-PMC papers still return abstracts
577
-
578
- ---
579
-
580
- ## Notes
581
-
582
- - Only ~30% of PubMed papers have full text in PMC
583
- - BioC API has no documented rate limit, but be respectful
584
- - Full text can be very long - truncate appropriately
585
- - Consider caching full text responses (they don't change)
586
- - Timeout should be longer for full text (60s vs 30s)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/17_PHASE_RATE_LIMITING.md DELETED
@@ -1,540 +0,0 @@
1
- # Phase 17: Rate Limiting with `limits` Library
2
-
3
- **Priority**: P0 CRITICAL - Prevents API blocks
4
- **Effort**: ~1 hour
5
- **Dependencies**: None
6
-
7
- ---
8
-
9
- ## CRITICAL: Async Safety Requirements
10
-
11
- **WARNING**: The rate limiter MUST be async-safe. Blocking the event loop will freeze:
12
- - The Gradio UI
13
- - All parallel searches
14
- - The orchestrator
15
-
16
- **Rules**:
17
- 1. **NEVER use `time.sleep()`** - Always use `await asyncio.sleep()`
18
- 2. **NEVER use blocking while loops** - Use async-aware polling
19
- 3. **The `limits` library check is synchronous** - Wrap it carefully
20
-
21
- The implementation below uses a polling pattern that:
22
- - Checks the limit (synchronous, fast)
23
- - If exceeded, `await asyncio.sleep()` (non-blocking)
24
- - Retry the check
25
-
26
- **Alternative**: If `limits` proves problematic, use `aiolimiter` which is pure-async.
27
-
28
- ---
29
-
30
- ## Overview
31
-
32
- Replace naive `asyncio.sleep` rate limiting with proper rate limiter using the `limits` library, which provides:
33
- - Moving window rate limiting
34
- - Per-API configurable limits
35
- - Thread-safe storage
36
- - Already used in reference repo
37
-
38
- **Why This Matters?**
39
- - NCBI will block us without proper rate limiting (3/sec without key, 10/sec with)
40
- - Current implementation only has simple sleep delay
41
- - Need coordinated limits across all PubMed calls
42
- - Professional-grade rate limiting prevents production issues
43
-
44
- ---
45
-
46
- ## Current State
47
-
48
- ### What We Have (`src/tools/pubmed.py:20-21, 34-41`)
49
-
50
- ```python
51
- RATE_LIMIT_DELAY = 0.34 # ~3 requests/sec without API key
52
-
53
- async def _rate_limit(self) -> None:
54
- """Enforce NCBI rate limiting."""
55
- loop = asyncio.get_running_loop()
56
- now = loop.time()
57
- elapsed = now - self._last_request_time
58
- if elapsed < self.RATE_LIMIT_DELAY:
59
- await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed)
60
- self._last_request_time = loop.time()
61
- ```
62
-
63
- ### Problems
64
-
65
- 1. **Not shared across instances**: Each `PubMedTool()` has its own counter
66
- 2. **Simple delay vs moving window**: Doesn't handle bursts properly
67
- 3. **Hardcoded rate**: Doesn't adapt to API key presence
68
- 4. **No backoff on 429**: Just retries blindly
69
-
70
- ---
71
-
72
- ## TDD Implementation Plan
73
-
74
- ### Step 1: Add Dependency
75
-
76
- **File**: `pyproject.toml`
77
-
78
- ```toml
79
- dependencies = [
80
- # ... existing deps ...
81
- "limits>=3.0",
82
- ]
83
- ```
84
-
85
- Then run:
86
- ```bash
87
- uv sync
88
- ```
89
-
90
- ---
91
-
92
- ### Step 2: Write the Tests First
93
-
94
- **File**: `tests/unit/tools/test_rate_limiting.py`
95
-
96
- ```python
97
- """Tests for rate limiting functionality."""
98
-
99
- import asyncio
100
- import time
101
-
102
- import pytest
103
-
104
- from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter
105
-
106
-
107
- class TestRateLimiter:
108
- """Test suite for rate limiter."""
109
-
110
- def test_create_limiter_without_api_key(self) -> None:
111
- """Should create 3/sec limiter without API key."""
112
- limiter = RateLimiter(rate="3/second")
113
- assert limiter.rate == "3/second"
114
-
115
- def test_create_limiter_with_api_key(self) -> None:
116
- """Should create 10/sec limiter with API key."""
117
- limiter = RateLimiter(rate="10/second")
118
- assert limiter.rate == "10/second"
119
-
120
- @pytest.mark.asyncio
121
- async def test_limiter_allows_requests_under_limit(self) -> None:
122
- """Should allow requests under the rate limit."""
123
- limiter = RateLimiter(rate="10/second")
124
-
125
- # 3 requests should all succeed immediately
126
- for _ in range(3):
127
- allowed = await limiter.acquire()
128
- assert allowed is True
129
-
130
- @pytest.mark.asyncio
131
- async def test_limiter_blocks_when_exceeded(self) -> None:
132
- """Should wait when rate limit exceeded."""
133
- limiter = RateLimiter(rate="2/second")
134
-
135
- # First 2 should be instant
136
- await limiter.acquire()
137
- await limiter.acquire()
138
-
139
- # Third should block briefly
140
- start = time.monotonic()
141
- await limiter.acquire()
142
- elapsed = time.monotonic() - start
143
-
144
- # Should have waited ~0.5 seconds (half second window for 2/sec)
145
- assert elapsed >= 0.3
146
-
147
- @pytest.mark.asyncio
148
- async def test_limiter_resets_after_window(self) -> None:
149
- """Rate limit should reset after time window."""
150
- limiter = RateLimiter(rate="5/second")
151
-
152
- # Use up the limit
153
- for _ in range(5):
154
- await limiter.acquire()
155
-
156
- # Wait for window to pass
157
- await asyncio.sleep(1.1)
158
-
159
- # Should be allowed again
160
- start = time.monotonic()
161
- await limiter.acquire()
162
- elapsed = time.monotonic() - start
163
-
164
- assert elapsed < 0.1 # Should be nearly instant
165
-
166
-
167
- class TestGetPubmedLimiter:
168
- """Test PubMed-specific limiter factory."""
169
-
170
- def test_limiter_without_api_key(self) -> None:
171
- """Should return 3/sec limiter without key."""
172
- limiter = get_pubmed_limiter(api_key=None)
173
- assert "3" in limiter.rate
174
-
175
- def test_limiter_with_api_key(self) -> None:
176
- """Should return 10/sec limiter with key."""
177
- limiter = get_pubmed_limiter(api_key="my-api-key")
178
- assert "10" in limiter.rate
179
-
180
- def test_limiter_is_singleton(self) -> None:
181
- """Same API key should return same limiter instance."""
182
- limiter1 = get_pubmed_limiter(api_key="key1")
183
- limiter2 = get_pubmed_limiter(api_key="key1")
184
- assert limiter1 is limiter2
185
-
186
- def test_different_keys_different_limiters(self) -> None:
187
- """Different API keys should return different limiters."""
188
- limiter1 = get_pubmed_limiter(api_key="key1")
189
- limiter2 = get_pubmed_limiter(api_key="key2")
190
- # Clear cache for clean test
191
- # Actually, different keys SHOULD share the same limiter
192
- # since we're limiting against the same API
193
- assert limiter1 is limiter2 # Shared NCBI rate limit
194
- ```
195
-
196
- ---
197
-
198
- ### Step 3: Create Rate Limiter Module
199
-
200
- **File**: `src/tools/rate_limiter.py`
201
-
202
- ```python
203
- """Rate limiting utilities using the limits library."""
204
-
205
- import asyncio
206
- from typing import ClassVar
207
-
208
- from limits import RateLimitItem, parse
209
- from limits.storage import MemoryStorage
210
- from limits.strategies import MovingWindowRateLimiter
211
-
212
-
213
- class RateLimiter:
214
- """
215
- Async-compatible rate limiter using limits library.
216
-
217
- Uses moving window algorithm for smooth rate limiting.
218
- """
219
-
220
- def __init__(self, rate: str) -> None:
221
- """
222
- Initialize rate limiter.
223
-
224
- Args:
225
- rate: Rate string like "3/second" or "10/second"
226
- """
227
- self.rate = rate
228
- self._storage = MemoryStorage()
229
- self._limiter = MovingWindowRateLimiter(self._storage)
230
- self._rate_limit: RateLimitItem = parse(rate)
231
- self._identity = "default" # Single identity for shared limiting
232
-
233
- async def acquire(self, wait: bool = True) -> bool:
234
- """
235
- Acquire permission to make a request.
236
-
237
- ASYNC-SAFE: Uses asyncio.sleep(), never time.sleep().
238
- The polling pattern allows other coroutines to run while waiting.
239
-
240
- Args:
241
- wait: If True, wait until allowed. If False, return immediately.
242
-
243
- Returns:
244
- True if allowed, False if not (only when wait=False)
245
- """
246
- while True:
247
- # Check if we can proceed (synchronous, fast - ~microseconds)
248
- if self._limiter.hit(self._rate_limit, self._identity):
249
- return True
250
-
251
- if not wait:
252
- return False
253
-
254
- # CRITICAL: Use asyncio.sleep(), NOT time.sleep()
255
- # This yields control to the event loop, allowing other
256
- # coroutines (UI, parallel searches) to run
257
- await asyncio.sleep(0.1)
258
-
259
- def reset(self) -> None:
260
- """Reset the rate limiter (for testing)."""
261
- self._storage.reset()
262
-
263
-
264
- # Singleton limiter for PubMed/NCBI
265
- _pubmed_limiter: RateLimiter | None = None
266
-
267
-
268
- def get_pubmed_limiter(api_key: str | None = None) -> RateLimiter:
269
- """
270
- Get the shared PubMed rate limiter.
271
-
272
- Rate depends on whether API key is provided:
273
- - Without key: 3 requests/second
274
- - With key: 10 requests/second
275
-
276
- Args:
277
- api_key: NCBI API key (optional)
278
-
279
- Returns:
280
- Shared RateLimiter instance
281
- """
282
- global _pubmed_limiter
283
-
284
- if _pubmed_limiter is None:
285
- rate = "10/second" if api_key else "3/second"
286
- _pubmed_limiter = RateLimiter(rate)
287
-
288
- return _pubmed_limiter
289
-
290
-
291
- def reset_pubmed_limiter() -> None:
292
- """Reset the PubMed limiter (for testing)."""
293
- global _pubmed_limiter
294
- _pubmed_limiter = None
295
-
296
-
297
- # Factory for other APIs
298
- class RateLimiterFactory:
299
- """Factory for creating/getting rate limiters for different APIs."""
300
-
301
- _limiters: ClassVar[dict[str, RateLimiter]] = {}
302
-
303
- @classmethod
304
- def get(cls, api_name: str, rate: str) -> RateLimiter:
305
- """
306
- Get or create a rate limiter for an API.
307
-
308
- Args:
309
- api_name: Unique identifier for the API
310
- rate: Rate limit string (e.g., "10/second")
311
-
312
- Returns:
313
- RateLimiter instance (shared for same api_name)
314
- """
315
- if api_name not in cls._limiters:
316
- cls._limiters[api_name] = RateLimiter(rate)
317
- return cls._limiters[api_name]
318
-
319
- @classmethod
320
- def reset_all(cls) -> None:
321
- """Reset all limiters (for testing)."""
322
- cls._limiters.clear()
323
- ```
324
-
325
- ---
326
-
327
- ### Step 4: Update PubMed Tool
328
-
329
- **File**: `src/tools/pubmed.py` (replace rate limiting code)
330
-
331
- ```python
332
- # Replace imports and rate limiting
333
-
334
- from src.tools.rate_limiter import get_pubmed_limiter
335
-
336
-
337
- class PubMedTool:
338
- """Search tool for PubMed/NCBI."""
339
-
340
- BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils"
341
- HTTP_TOO_MANY_REQUESTS = 429
342
-
343
- def __init__(self, api_key: str | None = None) -> None:
344
- self.api_key = api_key or settings.ncbi_api_key
345
- if self.api_key == "your-ncbi-key-here":
346
- self.api_key = None
347
- # Use shared rate limiter
348
- self._limiter = get_pubmed_limiter(self.api_key)
349
-
350
- async def _rate_limit(self) -> None:
351
- """Enforce NCBI rate limiting using shared limiter."""
352
- await self._limiter.acquire()
353
-
354
- # ... rest of class unchanged ...
355
- ```
356
-
357
- ---
358
-
359
- ### Step 5: Add Rate Limiters for Other APIs
360
-
361
- **File**: `src/tools/clinicaltrials.py` (optional)
362
-
363
- ```python
364
- from src.tools.rate_limiter import RateLimiterFactory
365
-
366
-
367
- class ClinicalTrialsTool:
368
- def __init__(self) -> None:
369
- # ClinicalTrials.gov doesn't document limits, but be conservative
370
- self._limiter = RateLimiterFactory.get("clinicaltrials", "5/second")
371
-
372
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
373
- await self._limiter.acquire()
374
- # ... rest of method ...
375
- ```
376
-
377
- **File**: `src/tools/europepmc.py` (optional)
378
-
379
- ```python
380
- from src.tools.rate_limiter import RateLimiterFactory
381
-
382
-
383
- class EuropePMCTool:
384
- def __init__(self) -> None:
385
- # Europe PMC is generous, but still be respectful
386
- self._limiter = RateLimiterFactory.get("europepmc", "10/second")
387
-
388
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
389
- await self._limiter.acquire()
390
- # ... rest of method ...
391
- ```
392
-
393
- ---
394
-
395
- ## Demo Script
396
-
397
- **File**: `examples/rate_limiting_demo.py`
398
-
399
- ```python
400
- #!/usr/bin/env python3
401
- """Demo script to verify rate limiting works correctly."""
402
-
403
- import asyncio
404
- import time
405
-
406
- from src.tools.rate_limiter import RateLimiter, get_pubmed_limiter, reset_pubmed_limiter
407
- from src.tools.pubmed import PubMedTool
408
-
409
-
410
- async def test_basic_limiter():
411
- """Test basic rate limiter behavior."""
412
- print("=" * 60)
413
- print("Rate Limiting Demo")
414
- print("=" * 60)
415
-
416
- # Test 1: Basic limiter
417
- print("\n[Test 1] Testing 3/second limiter...")
418
- limiter = RateLimiter("3/second")
419
-
420
- start = time.monotonic()
421
- for i in range(6):
422
- await limiter.acquire()
423
- elapsed = time.monotonic() - start
424
- print(f" Request {i+1} at {elapsed:.2f}s")
425
-
426
- total = time.monotonic() - start
427
- print(f" Total time for 6 requests: {total:.2f}s (expected ~2s)")
428
-
429
-
430
- async def test_pubmed_limiter():
431
- """Test PubMed-specific limiter."""
432
- print("\n[Test 2] Testing PubMed limiter (shared)...")
433
-
434
- reset_pubmed_limiter() # Clean state
435
-
436
- # Without API key: 3/sec
437
- limiter = get_pubmed_limiter(api_key=None)
438
- print(f" Rate without key: {limiter.rate}")
439
-
440
- # Multiple tools should share the same limiter
441
- tool1 = PubMedTool()
442
- tool2 = PubMedTool()
443
-
444
- # Verify they share the limiter
445
- print(f" Tools share limiter: {tool1._limiter is tool2._limiter}")
446
-
447
-
448
- async def test_concurrent_requests():
449
- """Test rate limiting under concurrent load."""
450
- print("\n[Test 3] Testing concurrent request limiting...")
451
-
452
- limiter = RateLimiter("5/second")
453
-
454
- async def make_request(i: int):
455
- await limiter.acquire()
456
- return time.monotonic()
457
-
458
- start = time.monotonic()
459
- # Launch 10 concurrent requests
460
- tasks = [make_request(i) for i in range(10)]
461
- times = await asyncio.gather(*tasks)
462
-
463
- # Calculate distribution
464
- relative_times = [t - start for t in times]
465
- print(f" Request times: {[f'{t:.2f}s' for t in sorted(relative_times)]}")
466
-
467
- total = max(relative_times)
468
- print(f" All 10 requests completed in {total:.2f}s (expected ~2s)")
469
-
470
-
471
- async def main():
472
- await test_basic_limiter()
473
- await test_pubmed_limiter()
474
- await test_concurrent_requests()
475
-
476
- print("\n" + "=" * 60)
477
- print("Demo complete!")
478
-
479
-
480
- if __name__ == "__main__":
481
- asyncio.run(main())
482
- ```
483
-
484
- ---
485
-
486
- ## Verification Checklist
487
-
488
- ### Unit Tests
489
- ```bash
490
- # Run rate limiting tests
491
- uv run pytest tests/unit/tools/test_rate_limiting.py -v
492
-
493
- # Expected: All tests pass
494
- ```
495
-
496
- ### Integration Test (Manual)
497
- ```bash
498
- # Run demo
499
- uv run python examples/rate_limiting_demo.py
500
-
501
- # Expected: Requests properly spaced
502
- ```
503
-
504
- ### Full Test Suite
505
- ```bash
506
- make check
507
- # Expected: All tests pass, mypy clean
508
- ```
509
-
510
- ---
511
-
512
- ## Success Criteria
513
-
514
- 1. **`limits` library installed**: Dependency added to pyproject.toml
515
- 2. **RateLimiter class works**: Can create and use limiters
516
- 3. **PubMed uses new limiter**: Shared limiter across instances
517
- 4. **Rate adapts to API key**: 3/sec without, 10/sec with
518
- 5. **Concurrent requests handled**: Multiple async requests properly queued
519
- 6. **No regressions**: All existing tests pass
520
-
521
- ---
522
-
523
- ## API Rate Limit Reference
524
-
525
- | API | Without Key | With Key |
526
- |-----|-------------|----------|
527
- | PubMed/NCBI | 3/sec | 10/sec |
528
- | ClinicalTrials.gov | Undocumented (~5/sec safe) | N/A |
529
- | Europe PMC | ~10-20/sec (generous) | N/A |
530
- | OpenAlex | ~100k/day (no per-sec limit) | Faster with `mailto` |
531
-
532
- ---
533
-
534
- ## Notes
535
-
536
- - `limits` library uses moving window algorithm (fairer than fixed window)
537
- - Singleton pattern ensures all PubMed calls share the limit
538
- - The factory pattern allows easy extension to other APIs
539
- - Consider adding 429 response detection + exponential backoff
540
- - In production, consider Redis storage for distributed rate limiting
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/implementation/README.md DELETED
@@ -1,143 +0,0 @@
1
- # Implementation Plans
2
-
3
- TDD implementation plans based on the brainstorming documents. Each phase is a self-contained vertical slice with tests, implementation, and demo scripts.
4
-
5
- ---
6
-
7
- ## Prerequisites (COMPLETED)
8
-
9
- The following foundational changes have been implemented to support all three phases:
10
-
11
- | Change | File | Status |
12
- |--------|------|--------|
13
- | Add `"openalex"` to `SourceName` | `src/utils/models.py:9` | ✅ Done |
14
- | Add `metadata` field to `Evidence` | `src/utils/models.py:39-42` | ✅ Done |
15
- | Export all tools from `__init__.py` | `src/tools/__init__.py` | ✅ Done |
16
-
17
- All 110 tests pass after these changes.
18
-
19
- ---
20
-
21
- ## Priority Order
22
-
23
- | Phase | Name | Priority | Effort | Value |
24
- |-------|------|----------|--------|-------|
25
- | **17** | Rate Limiting | P0 CRITICAL | 1 hour | Stability |
26
- | **15** | OpenAlex | HIGH | 2-3 hours | Very High |
27
- | **16** | PubMed Full-Text | MEDIUM | 3 hours | High |
28
-
29
- **Recommended implementation order**: 17 → 15 → 16
30
-
31
- ---
32
-
33
- ## Phase 15: OpenAlex Integration
34
-
35
- **File**: [15_PHASE_OPENALEX.md](./15_PHASE_OPENALEX.md)
36
-
37
- Add OpenAlex as 4th data source for:
38
- - Citation networks (who cites whom)
39
- - Concept tagging (semantic discovery)
40
- - 209M+ scholarly works
41
- - Free, no API key required
42
-
43
- **Quick Start**:
44
- ```bash
45
- # Create the tool
46
- touch src/tools/openalex.py
47
- touch tests/unit/tools/test_openalex.py
48
-
49
- # Run tests first (TDD)
50
- uv run pytest tests/unit/tools/test_openalex.py -v
51
-
52
- # Demo
53
- uv run python examples/openalex_demo.py
54
- ```
55
-
56
- ---
57
-
58
- ## Phase 16: PubMed Full-Text
59
-
60
- **File**: [16_PHASE_PUBMED_FULLTEXT.md](./16_PHASE_PUBMED_FULLTEXT.md)
61
-
62
- Add full-text retrieval via BioC API for:
63
- - Complete paper text (not just abstracts)
64
- - Structured sections (intro, methods, results)
65
- - Better evidence for LLM synthesis
66
-
67
- **Quick Start**:
68
- ```bash
69
- # Add methods to existing pubmed.py
70
- # Tests in test_pubmed_fulltext.py
71
-
72
- # Run tests
73
- uv run pytest tests/unit/tools/test_pubmed_fulltext.py -v
74
-
75
- # Demo
76
- uv run python examples/pubmed_fulltext_demo.py
77
- ```
78
-
79
- ---
80
-
81
- ## Phase 17: Rate Limiting
82
-
83
- **File**: [17_PHASE_RATE_LIMITING.md](./17_PHASE_RATE_LIMITING.md)
84
-
85
- Replace naive sleep-based rate limiting with `limits` library for:
86
- - Moving window algorithm
87
- - Shared limits across instances
88
- - Configurable per-API rates
89
- - Production-grade stability
90
-
91
- **Quick Start**:
92
- ```bash
93
- # Add dependency
94
- uv add limits
95
-
96
- # Create module
97
- touch src/tools/rate_limiter.py
98
- touch tests/unit/tools/test_rate_limiting.py
99
-
100
- # Run tests
101
- uv run pytest tests/unit/tools/test_rate_limiting.py -v
102
-
103
- # Demo
104
- uv run python examples/rate_limiting_demo.py
105
- ```
106
-
107
- ---
108
-
109
- ## TDD Workflow
110
-
111
- Each implementation doc follows this pattern:
112
-
113
- 1. **Write tests first** - Define expected behavior
114
- 2. **Run tests** - Verify they fail (red)
115
- 3. **Implement** - Write minimal code to pass
116
- 4. **Run tests** - Verify they pass (green)
117
- 5. **Refactor** - Clean up if needed
118
- 6. **Demo** - Verify end-to-end with real APIs
119
- 7. **`make check`** - Ensure no regressions
120
-
121
- ---
122
-
123
- ## Related Brainstorming Docs
124
-
125
- These implementation plans are derived from:
126
-
127
- - [00_ROADMAP_SUMMARY.md](../00_ROADMAP_SUMMARY.md) - Priority overview
128
- - [01_PUBMED_IMPROVEMENTS.md](../01_PUBMED_IMPROVEMENTS.md) - PubMed details
129
- - [02_CLINICALTRIALS_IMPROVEMENTS.md](../02_CLINICALTRIALS_IMPROVEMENTS.md) - CT.gov details
130
- - [03_EUROPEPMC_IMPROVEMENTS.md](../03_EUROPEPMC_IMPROVEMENTS.md) - Europe PMC details
131
- - [04_OPENALEX_INTEGRATION.md](../04_OPENALEX_INTEGRATION.md) - OpenAlex integration
132
-
133
- ---
134
-
135
- ## Future Phases (Not Yet Documented)
136
-
137
- Based on brainstorming, these could be added later:
138
-
139
- - **Phase 18**: ClinicalTrials.gov Results Retrieval
140
- - **Phase 19**: Europe PMC Annotations API
141
- - **Phase 20**: Drug Name Normalization (RxNorm)
142
- - **Phase 21**: Citation Network Queries (OpenAlex)
143
- - **Phase 22**: Semantic Search with Embeddings
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md DELETED
@@ -1,189 +0,0 @@
1
- # Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
2
-
3
- **Date:** November 27, 2025
4
- **Status:** ACTIVE DECISION REQUIRED
5
- **Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
6
-
7
- ---
8
-
9
- ## 1. The Problem
10
-
11
- We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
12
-
13
- **They are not.** They are complementary:
14
- - **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
15
- - **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
16
-
17
- ---
18
-
19
- ## 2. Current Branch State
20
-
21
- | Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
22
- |--------|----------|---------------------|------------------------------|--------|
23
- | `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
24
- | `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
25
- | `origin/main` | GitHub | YES | NO | **SAFE** |
26
- | `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
27
- | `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
28
- | Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
29
-
30
- ### Key Files at Risk
31
-
32
- **On `origin/dev` (PRESERVED):**
33
- ```text
34
- src/agents/
35
- ├── analysis_agent.py # StatisticalAnalyzer wrapper
36
- ├── hypothesis_agent.py # Hypothesis generation
37
- ├── judge_agent.py # JudgeHandler wrapper
38
- ├── magentic_agents.py # Multi-agent definitions
39
- ├── report_agent.py # Report synthesis
40
- ├── search_agent.py # SearchHandler wrapper
41
- ├── state.py # Thread-safe state management
42
- └── tools.py # @ai_function decorated tools
43
-
44
- src/orchestrator_magentic.py # Multi-agent orchestrator
45
- src/utils/llm_factory.py # Centralized LLM client factory
46
- ```
47
-
48
- **Deleted in refactor branch (would be lost if merged):**
49
- - All of the above
50
-
51
- ---
52
-
53
- ## 3. Target Architecture
54
-
55
- ```text
56
- ┌─────────────────────────────────────────────────────────────────┐
57
- │ Microsoft Agent Framework (Orchestration Layer) │
58
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
59
- │ │ SearchAgent │→ │ JudgeAgent │→ │ ReportAgent │ │
60
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
61
- │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
62
- │ │ │ │ │
63
- │ ▼ ▼ ▼ │
64
- │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
65
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
66
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
67
- │ │ output_type= │ │ output_type= │ │ output_type= │ │
68
- │ │ SearchResult │ │ JudgeAssess │ │ Report │ │
69
- │ └──────────────┘ └──────────────┘ └──────────────┘ │
70
- └─────────────────────────────────────────────────────────────────┘
71
- ```
72
-
73
- **Why this architecture:**
74
- 1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
75
- 2. **pydantic-ai** handles: type-safe LLM calls within each agent
76
-
77
- ---
78
-
79
- ## 4. CRITICAL: Naming Confusion Clarification
80
-
81
- > **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
82
-
83
- **The naming confusion:**
84
- - `magentic` (PyPI package): A different library for structured LLM outputs
85
- - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
86
- - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
87
-
88
- **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
89
-
90
- ---
91
-
92
- ## 5. What the Refactor DID Get Right
93
-
94
- The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
95
-
96
- 1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
97
- 2. **HuggingFace free tier support** - `HuggingFaceModel` integration
98
- 3. **Test fix** - Properly mocks `HuggingFaceModel` class
99
- 4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
100
-
101
- **What it got WRONG:**
102
- 1. Deleted `src/agents/` entirely instead of refactoring them
103
- 2. Deleted `src/orchestrator_magentic.py` instead of fixing it
104
- 3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
105
-
106
- ---
107
-
108
- ## 6. Options for Path Forward
109
-
110
- ### Option A: Abandon Refactor, Start Fresh
111
- - Close PR #41
112
- - Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
113
- - Reset local `dev` to match `origin/dev`
114
- - Cherry-pick ONLY the good parts (judges.py improvements, HF support)
115
- - **Pros:** Clean, safe
116
- - **Cons:** Lose some work, need to redo carefully
117
-
118
- ### Option B: Cherry-Pick Good Parts to origin/dev
119
- - Do NOT merge PR #41
120
- - Create new branch from `origin/dev`
121
- - Cherry-pick specific commits/changes that improve pydantic-ai usage
122
- - Keep agent framework code intact
123
- - **Pros:** Preserves both, surgical
124
- - **Cons:** Requires careful file-by-file review
125
-
126
- ### Option C: Revert Deletions in Refactor Branch
127
- - On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
128
- - Keep the pydantic-ai improvements
129
- - Merge THAT to dev
130
- - **Pros:** Gets both
131
- - **Cons:** Complex git operations, risk of conflicts
132
-
133
- ---
134
-
135
- ## 7. Recommended Action: Option B (Cherry-Pick)
136
-
137
- **Step-by-step:**
138
-
139
- 1. **Close PR #41** (do not merge)
140
- 2. **Delete redundant branches:**
141
- - `refactor/pydantic-unification` (local)
142
- - Reset local `dev` to `origin/dev`
143
- 3. **Create new branch from origin/dev:**
144
- ```bash
145
- git checkout -b feat/pydantic-ai-improvements origin/dev
146
- ```
147
- 4. **Cherry-pick or manually port these improvements:**
148
- - `src/agent_factory/judges.py` - the unified `get_model()` function
149
- - `examples/free_tier_demo.py` - HuggingFace demo
150
- - Test improvements
151
- 5. **Do NOT delete any agent framework files**
152
- 6. **Create PR for review**
153
-
154
- ---
155
-
156
- ## 8. Files to Cherry-Pick (Safe Improvements)
157
-
158
- | File | What Changed | Safe to Port? |
159
- |------|-------------|---------------|
160
- | `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
161
- | `examples/free_tier_demo.py` | New demo for HF inference | YES |
162
- | `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
163
- | `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
164
-
165
- ---
166
-
167
- ## 9. Questions to Answer Before Proceeding
168
-
169
- 1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
170
- 2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
171
- 3. **Timeline**: How much time do we have to get this right?
172
-
173
- ---
174
-
175
- ## 10. Immediate Actions (DO NOW)
176
-
177
- - [ ] **DO NOT merge PR #41**
178
- - [ ] Close PR #41 with comment explaining the situation
179
- - [ ] Do not push local `dev` branch anywhere
180
- - [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
181
-
182
- ---
183
-
184
- ## 11. Decision Log
185
-
186
- | Date | Decision | Rationale |
187
- |------|----------|-----------|
188
- | 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
189
- | TBD | ? | Awaiting decision on path forward |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md DELETED
@@ -1,289 +0,0 @@
1
- # Architecture Specification: Dual-Mode Agent System
2
-
3
- **Date:** November 27, 2025
4
- **Status:** SPECIFICATION
5
- **Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
6
-
7
- ---
8
-
9
- ## 1. Core Concept: Two Operating Modes
10
-
11
- ```text
12
- ┌─────────────────────────────────────────────────────────────────────┐
13
- │ USER REQUEST │
14
- │ │ │
15
- │ ▼ │
16
- │ ┌─────────────────┐ │
17
- │ │ Mode Selection │ │
18
- │ │ (Auto-detect) │ │
19
- │ └────────┬────────┘ │
20
- │ │ │
21
- │ ┌───────────────┴───────────────┐ │
22
- │ │ │ │
23
- │ ▼ ▼ │
24
- │ ┌─────────────────┐ ┌─────────────────┐ │
25
- │ │ SIMPLE MODE │ │ ADVANCED MODE │ │
26
- │ │ (Free Tier) │ │ (Paid Tier) │ │
27
- │ │ │ │ │ │
28
- │ │ pydantic-ai │ │ MS Agent Fwk │ │
29
- │ │ single-agent │ │ + pydantic-ai │ │
30
- │ │ loop │ │ multi-agent │ │
31
- │ └─────────────────┘ └─────────────────┘ │
32
- │ │ │ │
33
- │ └───────────────┬───────────────┘ │
34
- │ ▼ │
35
- │ ┌─────────────────┐ │
36
- │ │ Research Report │ │
37
- │ │ with Citations │ │
38
- │ └─────────────────┘ │
39
- └─────────────────────────────────────────────────────────────────────┘
40
- ```
41
-
42
- ---
43
-
44
- ## 2. Mode Comparison
45
-
46
- | Aspect | Simple Mode | Advanced Mode |
47
- |--------|-------------|---------------|
48
- | **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
49
- | **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
50
- | **Architecture** | Single orchestrator loop | Multi-agent coordination |
51
- | **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
52
- | **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
53
- | **Quality** | Good (functional) | Better (specialized agents, coordination) |
54
- | **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
55
- | **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
56
-
57
- ---
58
-
59
- ## 3. Simple Mode Architecture (pydantic-ai Only)
60
-
61
- ```text
62
- ┌─────────────────────────────────────────────────────┐
63
- │ Orchestrator │
64
- │ │
65
- │ while not sufficient and iteration < max: │
66
- │ 1. SearchHandler.execute(query) │
67
- │ 2. JudgeHandler.assess(evidence) ◄── pydantic-ai Agent │
68
- │ 3. if sufficient: break │
69
- │ 4. query = judge.next_queries │
70
- │ │
71
- │ return ReportGenerator.generate(evidence) │
72
- └─────────────────────────────────────────────────────┘
73
- ```
74
-
75
- **Components:**
76
- - `src/orchestrator.py` - Simple loop orchestrator
77
- - `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
78
- - `src/tools/search_handler.py` - Scatter-gather search
79
- - `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
80
-
81
- ---
82
-
83
- ## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
84
-
85
- ```text
86
- ┌─────────────────────────────────────────────────────────────────────┐
87
- │ Microsoft Agent Framework Orchestrator │
88
- │ │
89
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
90
- │ │ SearchAgent │───▶│ JudgeAgent │───▶│ ReportAgent │ │
91
- │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
92
- │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
93
- │ │ │ │ │
94
- │ ▼ ▼ ▼ │
95
- │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
96
- │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
97
- │ │ Agent() │ │ Agent() │ │ Agent() │ │
98
- │ │ output_type=│ │ output_type=│ │ output_type=│ │
99
- │ │ SearchResult│ │ JudgeAssess │ │ Report │ │
100
- │ └─────────────┘ └─────────────┘ └─────────────┘ │
101
- │ │
102
- │ Shared State: MagenticState (thread-safe via contextvars) │
103
- │ - evidence: list[Evidence] │
104
- │ - embedding_service: EmbeddingService │
105
- └─────────────────────────────────────────────────────────────────────┘
106
- ```
107
-
108
- **Components:**
109
- - `src/orchestrator_magentic.py` - Multi-agent orchestrator
110
- - `src/agents/search_agent.py` - SearchAgent (BaseAgent)
111
- - `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
112
- - `src/agents/report_agent.py` - ReportAgent (BaseAgent)
113
- - `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
114
- - `src/agents/state.py` - Thread-safe state management
115
- - `src/agents/tools.py` - @ai_function decorated tools
116
-
117
- ---
118
-
119
- ## 5. Mode Selection Logic
120
-
121
- ```python
122
- # src/orchestrator_factory.py (actual implementation)
123
-
124
- def create_orchestrator(
125
- search_handler: SearchHandlerProtocol | None = None,
126
- judge_handler: JudgeHandlerProtocol | None = None,
127
- config: OrchestratorConfig | None = None,
128
- mode: Literal["simple", "magentic", "advanced"] | None = None,
129
- ) -> Any:
130
- """
131
- Auto-select orchestrator based on available credentials.
132
-
133
- Priority:
134
- 1. If mode explicitly set, use that
135
- 2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
136
- 3. Otherwise -> Simple Mode (HuggingFace free tier)
137
- """
138
- effective_mode = _determine_mode(mode)
139
-
140
- if effective_mode == "advanced":
141
- orchestrator_cls = _get_magentic_orchestrator_class()
142
- return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
143
-
144
- # Simple mode requires handlers
145
- if search_handler is None or judge_handler is None:
146
- raise ValueError("Simple mode requires search_handler and judge_handler")
147
-
148
- return Orchestrator(
149
- search_handler=search_handler,
150
- judge_handler=judge_handler,
151
- config=config,
152
- )
153
- ```
154
-
155
- ---
156
-
157
- ## 6. Shared Components (Both Modes Use)
158
-
159
- These components work in both modes:
160
-
161
- | Component | Purpose |
162
- |-----------|---------|
163
- | `src/tools/pubmed.py` | PubMed search |
164
- | `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
165
- | `src/tools/europepmc.py` | Europe PMC search |
166
- | `src/tools/search_handler.py` | Scatter-gather orchestration |
167
- | `src/tools/rate_limiter.py` | Rate limiting |
168
- | `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
169
- | `src/utils/config.py` | Settings |
170
- | `src/services/embeddings.py` | Vector search (optional) |
171
-
172
- ---
173
-
174
- ## 7. pydantic-ai Integration Points
175
-
176
- Both modes use pydantic-ai for structured LLM outputs:
177
-
178
- ```python
179
- # In JudgeHandler (both modes)
180
- from pydantic_ai import Agent
181
- from pydantic_ai.models.huggingface import HuggingFaceModel
182
- from pydantic_ai.models.openai import OpenAIModel
183
- from pydantic_ai.models.anthropic import AnthropicModel
184
-
185
- class JudgeHandler:
186
- def __init__(self, model: Any = None):
187
- self.model = model or get_model() # Auto-selects based on config
188
- self.agent = Agent(
189
- model=self.model,
190
- output_type=JudgeAssessment, # Structured output!
191
- system_prompt=SYSTEM_PROMPT,
192
- )
193
-
194
- async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
195
- result = await self.agent.run(format_prompt(question, evidence))
196
- return result.output # Guaranteed to be JudgeAssessment
197
- ```
198
-
199
- ---
200
-
201
- ## 8. Microsoft Agent Framework Integration Points
202
-
203
- Advanced mode wraps pydantic-ai agents in BaseAgent:
204
-
205
- ```python
206
- # In JudgeAgent (advanced mode only)
207
- from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
208
-
209
- class JudgeAgent(BaseAgent):
210
- def __init__(self, judge_handler: JudgeHandlerProtocol):
211
- super().__init__(
212
- name="JudgeAgent",
213
- description="Evaluates evidence quality",
214
- )
215
- self._handler = judge_handler # Uses pydantic-ai internally
216
-
217
- async def run(self, messages, **kwargs) -> AgentRunResponse:
218
- question = extract_question(messages)
219
- evidence = self._evidence_store.get("current", [])
220
-
221
- # Delegate to pydantic-ai powered handler
222
- assessment = await self._handler.assess(question, evidence)
223
-
224
- return AgentRunResponse(
225
- messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
226
- additional_properties={"assessment": assessment.model_dump()},
227
- )
228
- ```
229
-
230
- ---
231
-
232
- ## 9. Benefits of This Architecture
233
-
234
- 1. **Graceful Degradation**: Works without API keys (free tier)
235
- 2. **Progressive Enhancement**: Better with API keys (orchestration)
236
- 3. **Code Reuse**: pydantic-ai handlers shared between modes
237
- 4. **Hackathon Ready**: Demo works without requiring paid keys
238
- 5. **Production Ready**: Full orchestration available when needed
239
- 6. **Future Proof**: Can add more agents to advanced mode
240
- 7. **Testable**: Simple mode is easier to unit test
241
-
242
- ---
243
-
244
- ## 10. Known Risks and Mitigations
245
-
246
- > **From Senior Agent Review**
247
-
248
- ### 10.1 Bridge Complexity (MEDIUM)
249
-
250
- **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
251
-
252
- **Mitigation:**
253
- - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
254
- - Test context propagation explicitly in integration tests
255
- - If issues arise, pass state explicitly rather than via context vars
256
-
257
- ### 10.2 Integration Drift (MEDIUM)
258
-
259
- **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
260
-
261
- **Mitigation:**
262
- - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
263
- - Handlers are the single source of truth for business logic
264
- - Agents are thin wrappers that delegate to handlers
265
-
266
- ### 10.3 Testing Burden (LOW-MEDIUM)
267
-
268
- **Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
269
-
270
- **Mitigation:**
271
- - Unit test handlers independently (shared code)
272
- - Integration tests for each mode separately
273
- - End-to-end tests verify same output for same input (determinism permitting)
274
-
275
- ### 10.4 Dependency Conflicts (LOW)
276
-
277
- **Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
278
-
279
- **Status:** Both use `pydantic>=2.x`. Should be compatible.
280
-
281
- ---
282
-
283
- ## 11. Naming Clarification
284
-
285
- > See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
286
-
287
- **Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
288
-
289
- **Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.