VibecoderMcSwaggins commited on
Commit
8a98024
·
1 Parent(s): 1515e72

docs: replace completed bug docs with new Magentic bug report

Browse files

Deleted (all implemented):
- P0_ACTIONABLE_FIXES.md
- P0_CRITICAL_BUGS.md
- P0_MAGENTIC_AND_SEARCH_AUDIT.md
- PHASE_00-03 implementation docs

Added new bug report for actual issue found:
- Magentic mode returns ChatMessage object instead of text
- Root cause: event.message.text extraction fails
- Max rounds reached before ReportAgent can synthesize

OpenAI key works. Simple mode works. The bug is in how
the final result event is processed.

docs/bugs/P0_ACTIONABLE_FIXES.md DELETED
@@ -1,281 +0,0 @@
1
- # P0 Actionable Fixes - What to Do
2
-
3
- **Date:** November 27, 2025
4
- **Status:** ACTIONABLE
5
-
6
- ---
7
-
8
- ## Summary: What's Broken and What's Fixable
9
-
10
- | Tool | Problem | Fixable? | How |
11
- |------|---------|----------|-----|
12
- | BioRxiv | API has NO search endpoint | **NO** | Replace with Europe PMC |
13
- | PubMed | No query preprocessing | **YES** | Add query cleaner |
14
- | ClinicalTrials | No filters applied | **YES** | Add filter params |
15
- | Magentic Framework | Nothing wrong | N/A | Already working |
16
-
17
- ---
18
-
19
- ## FIX 1: Replace BioRxiv with Europe PMC (30 min)
20
-
21
- ### Why BioRxiv Can't Be Fixed
22
-
23
- The bioRxiv API only has this endpoint:
24
- ```
25
- https://api.biorxiv.org/details/{server}/{date-range}/{cursor}/json
26
- ```
27
-
28
- This returns papers **by date**, not by keyword. There is NO search endpoint.
29
-
30
- **Proof:** I queried `medrxiv/2024-01-01/2024-01-02` and got:
31
- - "Global risk of Plasmodium falciparum" (malaria)
32
- - "Multiple Endocrine Neoplasia in India"
33
- - "Acupuncture for Acute Musculoskeletal Pain"
34
-
35
- **None of these are about Long COVID** because the API doesn't search.
36
-
37
- ### Europe PMC Has Search + Preprints
38
-
39
- ```bash
40
- curl "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&resultType=core&pageSize=3&format=json"
41
- ```
42
-
43
- Returns 283,058 results including:
44
- - "Long COVID Treatment No Silver Bullets, Only a Few Bronze BBs" ✅
45
-
46
- ### The Fix
47
-
48
- Replace `src/tools/biorxiv.py` with `src/tools/europepmc.py`:
49
-
50
- ```python
51
- """Europe PMC preprint and paper search tool."""
52
-
53
- import httpx
54
- from src.utils.models import Citation, Evidence
55
-
56
- class EuropePMCTool:
57
- """Search Europe PMC for papers and preprints."""
58
-
59
- BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
60
-
61
- @property
62
- def name(self) -> str:
63
- return "europepmc"
64
-
65
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
66
- """Search Europe PMC (includes preprints from bioRxiv/medRxiv)."""
67
- params = {
68
- "query": query,
69
- "resultType": "core",
70
- "pageSize": max_results,
71
- "format": "json",
72
- }
73
-
74
- async with httpx.AsyncClient(timeout=30.0) as client:
75
- response = await client.get(self.BASE_URL, params=params)
76
- response.raise_for_status()
77
-
78
- data = response.json()
79
- results = data.get("resultList", {}).get("result", [])
80
-
81
- return [self._to_evidence(r) for r in results]
82
-
83
- def _to_evidence(self, result: dict) -> Evidence:
84
- """Convert Europe PMC result to Evidence."""
85
- title = result.get("title", "Untitled")
86
- abstract = result.get("abstractText", "No abstract")
87
- doi = result.get("doi", "")
88
- pub_year = result.get("pubYear", "Unknown")
89
- source = result.get("source", "europepmc")
90
-
91
- # Mark preprints
92
- pub_type = result.get("pubTypeList", {}).get("pubType", [])
93
- is_preprint = "Preprint" in pub_type
94
-
95
- content = f"{'[PREPRINT] ' if is_preprint else ''}{abstract[:1800]}"
96
-
97
- return Evidence(
98
- content=content,
99
- citation=Citation(
100
- source="europepmc" if not is_preprint else "preprint",
101
- title=title[:500],
102
- url=f"https://doi.org/{doi}" if doi else "",
103
- date=str(pub_year),
104
- ),
105
- relevance=0.75 if is_preprint else 0.9,
106
- )
107
- ```
108
-
109
- ---
110
-
111
- ## FIX 2: Add PubMed Query Preprocessing (1 hour)
112
-
113
- ### Current Problem
114
-
115
- User enters: `What medications show promise for Long COVID?`
116
- PubMed receives: `What medications show promise for Long COVID?`
117
-
118
- The question words pollute the search.
119
-
120
- ### The Fix
121
-
122
- Add `src/tools/query_utils.py`:
123
-
124
- ```python
125
- """Query preprocessing utilities."""
126
-
127
- import re
128
-
129
- # Question words to remove
130
- QUESTION_WORDS = {
131
- "what", "which", "how", "why", "when", "where", "who",
132
- "is", "are", "can", "could", "would", "should", "do", "does",
133
- "show", "promise", "help", "treat", "cure",
134
- }
135
-
136
- # Medical synonyms to expand
137
- SYNONYMS = {
138
- "long covid": ["long COVID", "PASC", "post-COVID syndrome", "post-acute sequelae"],
139
- "alzheimer": ["Alzheimer's disease", "AD", "Alzheimer dementia"],
140
- "cancer": ["neoplasm", "tumor", "malignancy", "carcinoma"],
141
- }
142
-
143
- def preprocess_pubmed_query(raw_query: str) -> str:
144
- """Convert natural language to cleaner PubMed query."""
145
- # Lowercase
146
- query = raw_query.lower()
147
-
148
- # Remove question marks
149
- query = query.replace("?", "")
150
-
151
- # Remove question words
152
- words = query.split()
153
- words = [w for w in words if w not in QUESTION_WORDS]
154
- query = " ".join(words)
155
-
156
- # Expand synonyms
157
- for term, expansions in SYNONYMS.items():
158
- if term in query:
159
- # Add OR clause
160
- expansion = " OR ".join([f'"{e}"' for e in expansions])
161
- query = query.replace(term, f"({expansion})")
162
-
163
- return query.strip()
164
- ```
165
-
166
- Then update `src/tools/pubmed.py`:
167
-
168
- ```python
169
- from src.tools.query_utils import preprocess_pubmed_query
170
-
171
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
172
- # Preprocess query
173
- clean_query = preprocess_pubmed_query(query)
174
-
175
- search_params = self._build_params(
176
- db="pubmed",
177
- term=clean_query, # Use cleaned query
178
- retmax=max_results,
179
- sort="relevance",
180
- )
181
- # ... rest unchanged
182
- ```
183
-
184
- ---
185
-
186
- ## FIX 3: Add ClinicalTrials.gov Filters (30 min)
187
-
188
- ### Current Problem
189
-
190
- Returns ALL trials including withdrawn, terminated, observational studies.
191
-
192
- ### The Fix
193
-
194
- The API supports `filter.overallStatus` and other filters. Update `src/tools/clinicaltrials.py`:
195
-
196
- ```python
197
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
198
- params: dict[str, str | int] = {
199
- "query.term": query,
200
- "pageSize": min(max_results, 100),
201
- "fields": "|".join(self.FIELDS),
202
- # ADD THESE FILTERS:
203
- "filter.overallStatus": "COMPLETED|RECRUITING|ACTIVE_NOT_RECRUITING",
204
- # Only interventional studies (not observational)
205
- "aggFilters": "studyType:int",
206
- }
207
- # ... rest unchanged
208
- ```
209
-
210
- **Note:** I tested the API - it supports filtering but with slightly different syntax. Check the [API docs](https://clinicaltrials.gov/data-api/api).
211
-
212
- ---
213
-
214
- ## What NOT to Change
215
-
216
- ### Microsoft Agent Framework - WORKING
217
-
218
- I verified:
219
- ```python
220
- from agent_framework import MagenticBuilder, ChatAgent
221
- from agent_framework.openai import OpenAIChatClient
222
- # All imports OK
223
-
224
- orchestrator = MagenticOrchestrator(max_rounds=2)
225
- workflow = orchestrator._build_workflow()
226
- # Workflow built successfully
227
- ```
228
-
229
- The Magentic agents are correctly wired:
230
- - SearchAgent → GPT-5.1 ✅
231
- - JudgeAgent → GPT-5.1 ✅
232
- - HypothesisAgent → GPT-5.1 ✅
233
- - ReportAgent → GPT-5.1 ✅
234
-
235
- **The framework is fine. The tools it calls are broken.**
236
-
237
- ---
238
-
239
- ## Priority Order
240
-
241
- 1. **Replace BioRxiv** → Immediate, fundamental
242
- 2. **Add PubMed preprocessing** → High impact, easy
243
- 3. **Add ClinicalTrials filters** → Medium impact, easy
244
-
245
- ---
246
-
247
- ## Test After Fixes
248
-
249
- ```bash
250
- # Test Europe PMC
251
- uv run python -c "
252
- import asyncio
253
- from src.tools.europepmc import EuropePMCTool
254
- tool = EuropePMCTool()
255
- results = asyncio.run(tool.search('long covid treatment', 3))
256
- for r in results:
257
- print(r.citation.title)
258
- "
259
-
260
- # Test PubMed with preprocessing
261
- uv run python -c "
262
- from src.tools.query_utils import preprocess_pubmed_query
263
- q = 'What medications show promise for Long COVID?'
264
- print(preprocess_pubmed_query(q))
265
- # Should output: (\"long COVID\" OR \"PASC\" OR \"post-COVID syndrome\") medications
266
- "
267
- ```
268
-
269
- ---
270
-
271
- ## After These Fixes
272
-
273
- The Magentic workflow will:
274
- 1. SearchAgent calls `search_pubmed("long COVID treatment")` → Gets RELEVANT papers
275
- 2. SearchAgent calls `search_preprints("long COVID treatment")` → Gets RELEVANT preprints via Europe PMC
276
- 3. SearchAgent calls `search_clinical_trials("long COVID")` → Gets INTERVENTIONAL trials only
277
- 4. JudgeAgent evaluates GOOD evidence
278
- 5. HypothesisAgent generates hypotheses from GOOD evidence
279
- 6. ReportAgent synthesizes GOOD report
280
-
281
- **The framework will work once we feed it good data.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/P0_CRITICAL_BUGS.md DELETED
@@ -1,298 +0,0 @@
1
- # P0 CRITICAL BUGS - Why DeepCritical Produces Garbage Results
2
-
3
- **Date:** November 27, 2025
4
- **Status:** CRITICAL - App is functionally useless
5
- **Severity:** P0 (Blocker)
6
-
7
- ## TL;DR
8
-
9
- The app produces garbage because:
10
- 1. **BioRxiv search doesn't work** - returns random papers
11
- 2. **Free tier LLM is too dumb** - can't identify drugs
12
- 3. **Query construction is naive** - no optimization for PubMed/CT.gov syntax
13
- 4. **Loop terminates too early** - 5 iterations isn't enough
14
-
15
- ---
16
-
17
- ## P0-001: BioRxiv Search is Fundamentally Broken
18
-
19
- **File:** `src/tools/biorxiv.py:248-286`
20
-
21
- **The Problem:**
22
- The bioRxiv API **DOES NOT SUPPORT KEYWORD SEARCH**.
23
-
24
- The code does this:
25
- ```python
26
- # Fetch recent papers (last 90 days, first 100 papers)
27
- url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
28
- # Then filter client-side for keywords
29
- ```
30
-
31
- **What Actually Happens:**
32
- 1. Fetches the first 100 papers from medRxiv in the last 90 days (chronological order)
33
- 2. Filters those 100 random papers for query keywords
34
- 3. Returns whatever garbage matches
35
-
36
- **Result:** For "Long COVID medications", you get random papers like:
37
- - "Calf muscle structure-function adaptations"
38
- - "Work-Life Balance of Ophthalmologists During COVID"
39
-
40
- These papers contain "COVID" somewhere but have NOTHING to do with Long COVID treatments.
41
-
42
- **Root Cause:** The `/0/json` pagination only returns 100 papers. You'd need to paginate through ALL papers (thousands) to do proper keyword filtering.
43
-
44
- **Fix Options:**
45
- 1. **Remove BioRxiv entirely** - It's unusable without proper search API
46
- 2. **Use a different preprint aggregator** - Europe PMC has preprints WITH search
47
- 3. **Add pagination** - Fetch all papers (slow, expensive)
48
- 4. **Use Semantic Scholar API** - Has preprints and proper search
49
-
50
- ---
51
-
52
- ## P0-002: Free Tier LLM Cannot Perform Drug Identification
53
-
54
- **File:** `src/agent_factory/judges.py:153-211`
55
-
56
- **The Problem:**
57
- Without an API key, the app uses `HFInferenceJudgeHandler` with:
58
- - Llama 3.1 8B Instruct
59
- - Mistral 7B Instruct
60
-
61
- These are **7-8 billion parameter models**. They cannot:
62
- - Reliably parse complex biomedical abstracts
63
- - Identify drug candidates from scientific text
64
- - Generate structured JSON output consistently
65
- - Reason about mechanism of action
66
-
67
- **Evidence of Failure:**
68
- ```python
69
- # From MockJudgeHandler - the honest fallback when LLM fails
70
- drug_candidates=[
71
- "Drug identification requires AI analysis",
72
- "Enter API key above for full results",
73
- ]
74
- ```
75
-
76
- The team KNEW the free tier can't identify drugs and added this message.
77
-
78
- **Root Cause:** Drug repurposing requires understanding:
79
- - Drug mechanisms
80
- - Disease pathophysiology
81
- - Clinical trial phases
82
- - Statistical significance
83
-
84
- This requires GPT-4 / Claude Sonnet class models (100B+ parameters).
85
-
86
- **Fix Options:**
87
- 1. **Require API key** - No free tier, be honest
88
- 2. **Use larger HF models** - Llama 70B or Mixtral 8x7B (expensive on free tier)
89
- 3. **Hybrid approach** - Use free tier for search, require paid for synthesis
90
-
91
- ---
92
-
93
- ## P0-003: PubMed Query Not Optimized
94
-
95
- **File:** `src/tools/pubmed.py:54-71`
96
-
97
- **The Problem:**
98
- The query is passed directly to PubMed without optimization:
99
- ```python
100
- search_params = self._build_params(
101
- db="pubmed",
102
- term=query, # Raw user query!
103
- retmax=max_results,
104
- sort="relevance",
105
- )
106
- ```
107
-
108
- **What User Enters:** "What medications show promise for Long COVID?"
109
-
110
- **What PubMed Receives:** `What medications show promise for Long COVID?`
111
-
112
- **What PubMed Should Receive:**
113
- ```
114
- ("long covid"[Title/Abstract] OR "post-COVID"[Title/Abstract] OR "PASC"[Title/Abstract])
115
- AND (drug[Title/Abstract] OR treatment[Title/Abstract] OR medication[Title/Abstract] OR therapy[Title/Abstract])
116
- AND (clinical trial[Publication Type] OR randomized[Title/Abstract])
117
- ```
118
-
119
- **Root Cause:** No query preprocessing or medical term expansion.
120
-
121
- **Fix Options:**
122
- 1. **Add query preprocessor** - Extract medical entities, expand synonyms
123
- 2. **Use MeSH terms** - PubMed's controlled vocabulary for better recall
124
- 3. **LLM query generation** - Use LLM to generate optimized PubMed query
125
-
126
- ---
127
-
128
- ## P0-004: Loop Terminates Too Early
129
-
130
- **File:** `src/app.py:42-45` and `src/utils/models.py`
131
-
132
- **The Problem:**
133
- ```python
134
- config = OrchestratorConfig(
135
- max_iterations=5,
136
- max_results_per_tool=10,
137
- )
138
- ```
139
-
140
- 5 iterations is not enough to:
141
- 1. Search multiple variations of the query
142
- 2. Gather enough evidence for the Judge to synthesize
143
- 3. Refine queries based on initial results
144
-
145
- **Evidence:** The user's output shows "Max Iterations Reached" with only 6 sources.
146
-
147
- **Root Cause:** Conservative defaults to avoid API costs, but makes app useless.
148
-
149
- **Fix Options:**
150
- 1. **Increase default to 10-15** - More iterations = better results
151
- 2. **Dynamic termination** - Stop when confidence > threshold, not iteration count
152
- 3. **Parallel query expansion** - Run more queries per iteration
153
-
154
- ---
155
-
156
- ## P0-005: No Query Understanding Layer
157
-
158
- **Files:** `src/orchestrator.py`, `src/tools/search_handler.py`
159
-
160
- **The Problem:**
161
- There's no NLU (Natural Language Understanding) layer. The system:
162
- 1. Takes raw user query
163
- 2. Passes directly to search tools
164
- 3. No entity extraction
165
- 4. No intent classification
166
- 5. No query expansion
167
-
168
- For drug repurposing, you need to extract:
169
- - **Disease:** "Long COVID" → [Long COVID, PASC, Post-COVID syndrome, chronic COVID]
170
- - **Drug intent:** "medications" → [drugs, treatments, therapeutics, interventions]
171
- - **Evidence type:** "show promise" → [clinical trials, efficacy, RCT]
172
-
173
- **Root Cause:** No preprocessing pipeline between user input and search execution.
174
-
175
- **Fix Options:**
176
- 1. **Add entity extraction** - Use BioBERT or PubMedBERT for medical NER
177
- 2. **Add query expansion** - Use medical ontologies (UMLS, MeSH)
178
- 3. **LLM preprocessing** - Use LLM to generate search strategy before searching
179
-
180
- ---
181
-
182
- ## P0-006: ClinicalTrials.gov Results Not Filtered
183
-
184
- **File:** `src/tools/clinicaltrials.py`
185
-
186
- **The Problem:**
187
- ClinicalTrials.gov returns ALL matching trials including:
188
- - Withdrawn trials
189
- - Terminated trials
190
- - Not yet recruiting
191
- - Observational studies (not interventional)
192
-
193
- For drug repurposing, you want:
194
- - Interventional studies
195
- - Phase 2+ (has safety/efficacy data)
196
- - Completed or with results
197
-
198
- **Root Cause:** No filtering of trial metadata.
199
-
200
- ---
201
-
202
- ## Summary: Why This App Produces Garbage
203
-
204
- ```
205
- User Query: "What medications show promise for Long COVID?"
206
-
207
-
208
- ┌─────────────────────────────────────────────────────────────┐
209
- │ NO QUERY PREPROCESSING │
210
- │ - No entity extraction │
211
- │ - No synonym expansion │
212
- │ - No medical term normalization │
213
- └─────────────────────────────────────────────────────────────┘
214
-
215
-
216
- ┌─────────────────────────────────────────────────────────────┐
217
- │ BROKEN SEARCH LAYER │
218
- │ - PubMed: Raw query, no MeSH, gets 1 result │
219
- │ - BioRxiv: Returns random papers (API doesn't support search)│
220
- │ - ClinicalTrials: Returns all trials, no filtering │
221
- └─────────────────────────────────────────────────────────────┘
222
-
223
-
224
- ┌─────────────────────────────────────────────────────────────┐
225
- │ GARBAGE EVIDENCE │
226
- │ - 6 papers, most irrelevant │
227
- │ - "Calf muscle adaptations" (mentions COVID once) │
228
- │ - "Ophthalmologist work-life balance" │
229
- └─────────────────────────────────────────────────────────────┘
230
-
231
-
232
- ┌─────────────────────────────────────────────────────────────┐
233
- │ DUMB JUDGE (Free Tier) │
234
- │ - Llama 8B can't identify drugs from garbage │
235
- │ - JSON parsing fails │
236
- │ - Falls back to "Drug identification requires AI analysis" │
237
- └─────────────────────────────────────────────────────────────┘
238
-
239
-
240
- ┌─────────────────────────────────────────────────────────────┐
241
- │ LOOP HITS MAX (5 iterations) │
242
- │ - Never finds enough good evidence │
243
- │ - Never synthesizes anything useful │
244
- └─────────────────────────────────────────────────────────────┘
245
-
246
-
247
- GARBAGE OUTPUT
248
- ```
249
-
250
- ---
251
-
252
- ## What Would Make This Actually Work
253
-
254
- ### Minimum Viable Fix (1-2 days)
255
-
256
- 1. **Remove BioRxiv** - It doesn't work
257
- 2. **Require API key** - Be honest that free tier is useless
258
- 3. **Add basic query preprocessing** - Strip question words, expand COVID synonyms
259
- 4. **Increase iterations to 10**
260
-
261
- ### Proper Fix (1-2 weeks)
262
-
263
- 1. **Query Understanding Layer**
264
- - Medical NER (BioBERT/SciBERT)
265
- - Query expansion with MeSH/UMLS
266
- - Intent classification (drug discovery vs mechanism vs safety)
267
-
268
- 2. **Optimized Search**
269
- - PubMed: Proper query syntax with MeSH terms
270
- - ClinicalTrials: Filter by phase, status, intervention type
271
- - Replace BioRxiv with Europe PMC (has preprints + search)
272
-
273
- 3. **Evidence Ranking**
274
- - Score by publication type (RCT > cohort > case report)
275
- - Score by journal impact factor
276
- - Score by recency
277
- - Score by citation count
278
-
279
- 4. **Proper LLM Pipeline**
280
- - Use GPT-4 / Claude for synthesis
281
- - Structured extraction of: drug, mechanism, evidence level, effect size
282
- - Multi-step reasoning: identify → validate → rank → synthesize
283
-
284
- ---
285
-
286
- ## The Hard Truth
287
-
288
- Building a drug repurposing agent that works is HARD. The state of the art is:
289
-
290
- - **Drug2Disease (IBM)** - Uses knowledge graphs + ML
291
- - **COVID-KG (Stanford)** - Dedicated COVID knowledge graph
292
- - **Literature Mining at scale (PubMed)** - Millions of papers, not 10
293
-
294
- This hackathon project is fundamentally a **search wrapper with an LLM prompt**. That's not enough.
295
-
296
- To make it useful:
297
- 1. Either scope it down (e.g., "find clinical trials for X disease")
298
- 2. Or invest serious engineering in the NLU + search + ranking pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/P0_MAGENTIC_AND_SEARCH_AUDIT.md DELETED
@@ -1,249 +0,0 @@
1
- # P0 Audit: Microsoft Agent Framework (Magentic) & Search Tools
2
-
3
- **Date:** November 27, 2025
4
- **Auditor:** Claude Code
5
- **Status:** VERIFIED
6
-
7
- ---
8
-
9
- ## TL;DR
10
-
11
- | Component | Status | Verdict |
12
- |-----------|--------|---------|
13
- | Microsoft Agent Framework | ✅ WORKING | Correctly wired, no bugs |
14
- | GPT-5.1 Model Config | ✅ CORRECT | Using `gpt-5.1` as configured |
15
- | Search Tools | ❌ BROKEN | Root cause of garbage results |
16
-
17
- **The orchestration framework is fine. The search layer is garbage.**
18
-
19
- ---
20
-
21
- ## Microsoft Agent Framework Verification
22
-
23
- ### Import Test: PASSED
24
- ```python
25
- from agent_framework import MagenticBuilder, ChatAgent
26
- from agent_framework.openai import OpenAIChatClient
27
- # All imports successful
28
- ```
29
-
30
- ### Agent Creation Test: PASSED
31
- ```python
32
- from src.agents.magentic_agents import create_search_agent
33
- search_agent = create_search_agent()
34
- # SearchAgent created: SearchAgent
35
- # Description: Searches biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv)
36
- ```
37
-
38
- ### Workflow Build Test: PASSED
39
- ```python
40
- from src.orchestrator_magentic import MagenticOrchestrator
41
- orchestrator = MagenticOrchestrator(max_rounds=2)
42
- workflow = orchestrator._build_workflow()
43
- # Workflow built successfully: <class 'agent_framework._workflows._workflow.Workflow'>
44
- ```
45
-
46
- ### Model Configuration: CORRECT
47
- ```python
48
- settings.openai_model = "gpt-5.1" # ✅ Using GPT-5.1, not GPT-4o
49
- settings.openai_api_key = True # ✅ API key is set
50
- ```
51
-
52
- ---
53
-
54
- ## What Magentic Provides (Working)
55
-
56
- 1. **Multi-Agent Coordination**
57
- - Manager agent orchestrates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
58
- - Uses `MagenticBuilder().with_standard_manager()` for coordination
59
-
60
- 2. **ChatAgent Pattern**
61
- - Each agent has internal LLM (GPT-5.1)
62
- - Can call tools via `@ai_function` decorator
63
- - Has proper instructions for domain-specific tasks
64
-
65
- 3. **Workflow Streaming**
66
- - Events: `MagenticAgentMessageEvent`, `MagenticFinalResultEvent`, etc.
67
- - Real-time UI updates via `workflow.run_stream(task)`
68
-
69
- 4. **State Management**
70
- - `MagenticState` persists evidence across agents
71
- - `get_bibliography()` tool for ReportAgent
72
-
73
- ---
74
-
75
- ## What's Actually Broken: The Search Tools
76
-
77
- ### File: `src/agents/tools.py`
78
-
79
- The Magentic agents call these tools:
80
- - `search_pubmed` → Uses `PubMedTool`
81
- - `search_clinical_trials` → Uses `ClinicalTrialsTool`
82
- - `search_preprints` → Uses `BioRxivTool`
83
-
84
- **These tools are the problem, not the framework.**
85
-
86
- ---
87
-
88
- ## Search Tool Bugs (Detailed)
89
-
90
- ### BUG 1: BioRxiv API Does Not Support Search
91
-
92
- **File:** `src/tools/biorxiv.py:248-286`
93
-
94
- ```python
95
- # This fetches the FIRST 100 papers from the last 90 days
96
- # It does NOT search by keyword - the API doesn't support that
97
- url = f"{self.BASE_URL}/{self.server}/{interval}/0/json"
98
-
99
- # Then filters client-side for keywords
100
- matching = self._filter_by_keywords(papers, query_terms, max_results)
101
- ```
102
-
103
- **Problem:**
104
- - Fetches 100 random chronological papers
105
- - Filters for ANY keyword match in title/abstract
106
- - "Long COVID medications" returns papers about "calf muscles" because they mention "COVID" once
107
-
108
- **Fix:** Remove BioRxiv or use Europe PMC (which has actual search)
109
-
110
- ---
111
-
112
- ### BUG 2: PubMed Query Not Optimized
113
-
114
- **File:** `src/tools/pubmed.py:54-71`
115
-
116
- ```python
117
- search_params = self._build_params(
118
- db="pubmed",
119
- term=query, # RAW USER QUERY - no preprocessing!
120
- retmax=max_results,
121
- sort="relevance",
122
- )
123
- ```
124
-
125
- **Problem:**
126
- - User enters: "What medications show promise for Long COVID?"
127
- - PubMed receives: `What medications show promise for Long COVID?`
128
- - Should receive: `("long covid"[Title/Abstract] OR "PASC"[Title/Abstract]) AND (treatment[Title/Abstract] OR drug[Title/Abstract])`
129
-
130
- **Fix:** Add query preprocessing:
131
- 1. Strip question words (what, which, how, etc.)
132
- 2. Expand medical synonyms (Long COVID → PASC, Post-COVID)
133
- 3. Use MeSH terms for better recall
134
-
135
- ---
136
-
137
- ### BUG 3: ClinicalTrials.gov No Filtering
138
-
139
- **File:** `src/tools/clinicaltrials.py`
140
-
141
- Returns ALL trials including:
142
- - Withdrawn trials
143
- - Terminated trials
144
- - Observational studies (not drug interventions)
145
- - Phase 1 (no efficacy data)
146
-
147
- **Fix:** Filter by:
148
- - `studyType=INTERVENTIONAL`
149
- - `phase=PHASE2,PHASE3,PHASE4`
150
- - `status=COMPLETED,ACTIVE_NOT_RECRUITING,RECRUITING`
151
-
152
- ---
153
-
154
- ## Evidence: Garbage In → Garbage Out
155
-
156
- When the Magentic SearchAgent calls these tools:
157
-
158
- ```
159
- SearchAgent: "Find evidence for Long COVID medications"
160
-
161
-
162
- search_pubmed("Long COVID medications")
163
- → Returns 1 semi-relevant paper (raw query hits)
164
-
165
- search_preprints("Long COVID medications")
166
- → Returns garbage (BioRxiv API doesn't search)
167
- → "Calf muscle adaptations" (has "COVID" somewhere)
168
- → "Ophthalmologist work-life balance" (mentions COVID)
169
-
170
- search_clinical_trials("Long COVID medications")
171
- → Returns all trials, no filtering
172
-
173
-
174
- JudgeAgent receives garbage evidence
175
-
176
-
177
- HypothesisAgent can't generate good hypotheses from garbage
178
-
179
-
180
- ReportAgent produces garbage report
181
- ```
182
-
183
- **The framework is doing its job. It's orchestrating agents correctly. But the agents are being fed garbage data.**
184
-
185
- ---
186
-
187
- ## Recommended Fixes
188
-
189
- ### Priority 1: Delete or Fix BioRxiv (30 min)
190
-
191
- **Option A: Delete it**
192
- ```python
193
- # In src/agents/tools.py, remove:
194
- # from src.tools.biorxiv import BioRxivTool
195
- # _biorxiv = BioRxivTool()
196
- # @ai_function search_preprints(...)
197
- ```
198
-
199
- **Option B: Replace with Europe PMC**
200
- Europe PMC has preprints AND proper search API:
201
- ```
202
- https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=long+covid+treatment&format=json
203
- ```
204
-
205
- ### Priority 2: Fix PubMed Query (1 hour)
206
-
207
- Add query preprocessor:
208
- ```python
209
- def preprocess_query(raw_query: str) -> str:
210
- """Convert natural language to PubMed query syntax."""
211
- # Strip question words
212
- # Expand medical synonyms
213
- # Add field tags [Title/Abstract]
214
- # Return optimized query
215
- ```
216
-
217
- ### Priority 3: Filter ClinicalTrials (30 min)
218
-
219
- Add parameters to API call:
220
- ```python
221
- params = {
222
- "query.term": query,
223
- "filter.overallStatus": "COMPLETED,RECRUITING",
224
- "filter.studyType": "INTERVENTIONAL",
225
- "pageSize": max_results,
226
- }
227
- ```
228
-
229
- ---
230
-
231
- ## Conclusion
232
-
233
- **Microsoft Agent Framework: NO BUGS FOUND**
234
- - Imports work ✅
235
- - Agent creation works ✅
236
- - Workflow building works ✅
237
- - Model config correct (GPT-5.1) ✅
238
- - Streaming events work ✅
239
-
240
- **Search Tools: CRITICALLY BROKEN**
241
- - BioRxiv: API doesn't support search (fundamental)
242
- - PubMed: No query optimization (fixable)
243
- - ClinicalTrials: No filtering (fixable)
244
-
245
- **Recommendation:**
246
- 1. Delete BioRxiv immediately (unusable)
247
- 2. Add PubMed query preprocessing
248
- 3. Add ClinicalTrials filtering
249
- 4. Then the Magentic multi-agent system will work as designed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/P0_MAGENTIC_MODE_BROKEN.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P0 Bug: Magentic Mode Returns ChatMessage Object Instead of Report Text
2
+
3
+ **Status**: OPEN
4
+ **Priority**: P0 (Critical)
5
+ **Date**: 2025-11-27
6
+
7
+ ---
8
+
9
+ ## Actual Bug Found (Not What We Thought)
10
+
11
+ **The OpenAI key works fine.** The real bug is different:
12
+
13
+ ### The Problem
14
+
15
+ When Magentic mode completes, the final report returns a `ChatMessage` object instead of the actual text:
16
+
17
+ ```
18
+ FINAL REPORT:
19
+ <agent_framework._types.ChatMessage object at 0x11db70310>
20
+ ```
21
+
22
+ ### Evidence
23
+
24
+ Full test output shows:
25
+ 1. Magentic orchestrator starts correctly
26
+ 2. SearchAgent finds evidence
27
+ 3. HypothesisAgent generates hypotheses
28
+ 4. JudgeAgent evaluates
29
+ 5. **BUT**: Final output is `ChatMessage` object, not text
30
+
31
+ ### Root Cause
32
+
33
+ In `src/orchestrator_magentic.py` line 193:
34
+
35
+ ```python
36
+ elif isinstance(event, MagenticFinalResultEvent):
37
+ text = event.message.text if event.message else "No result"
38
+ ```
39
+
40
+ The `event.message` is a `ChatMessage` object, and `.text` may not extract the content correctly, or the message structure changed in the agent-framework library.
41
+
42
+ ---
43
+
44
+ ## Secondary Issue: Max Rounds Reached
45
+
46
+ The orchestrator hits max rounds before producing a report:
47
+
48
+ ```
49
+ [ERROR] Magentic Orchestrator: Max round count reached
50
+ ```
51
+
52
+ This means the workflow times out before the ReportAgent synthesizes the final output.
53
+
54
+ ---
55
+
56
+ ## What Works
57
+
58
+ - OpenAI API key: **Works** (loaded from .env)
59
+ - SearchAgent: **Works** (finds evidence from PubMed, ClinicalTrials, Europe PMC)
60
+ - HypothesisAgent: **Works** (generates Drug -> Target -> Pathway chains)
61
+ - JudgeAgent: **Partial** (evaluates but sometimes loses context)
62
+
63
+ ---
64
+
65
+ ## Files to Fix
66
+
67
+ | File | Line | Issue |
68
+ |------|------|-------|
69
+ | `src/orchestrator_magentic.py` | 193 | `event.message.text` returns object, not string |
70
+ | `src/orchestrator_magentic.py` | 97-99 | `max_round_count=3` too low for full pipeline |
71
+
72
+ ---
73
+
74
+ ## Suggested Fix
75
+
76
+ ```python
77
+ # In _process_event, line 192-199
78
+ elif isinstance(event, MagenticFinalResultEvent):
79
+ # Handle ChatMessage object properly
80
+ if event.message:
81
+ if hasattr(event.message, 'content'):
82
+ text = event.message.content
83
+ elif hasattr(event.message, 'text'):
84
+ text = event.message.text
85
+ else:
86
+ text = str(event.message)
87
+ else:
88
+ text = "No result"
89
+ ```
90
+
91
+ And increase rounds:
92
+
93
+ ```python
94
+ # In _build_workflow, line 97
95
+ max_round_count=self._max_rounds, # Use configured value, default 10
96
+ ```
97
+
98
+ ---
99
+
100
+ ## Test Command
101
+
102
+ ```bash
103
+ set -a && source .env && set +a && uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
104
+ ```
105
+
106
+ ---
107
+
108
+ ## Simple Mode Works
109
+
110
+ For reference, simple mode produces full reports:
111
+
112
+ ```bash
113
+ uv run python examples/orchestrator_demo/run_agent.py "metformin alzheimer"
114
+ ```
115
+
116
+ Output includes structured report with Drug Candidates, Key Findings, etc.
docs/bugs/PHASE_00_IMPLEMENTATION_ORDER.md DELETED
@@ -1,156 +0,0 @@
1
- # Phase 00: Implementation Order & Summary
2
-
3
- **Total Effort:** 5-8 hours
4
- **Parallelizable:** Yes (all 3 phases are independent)
5
-
6
- ---
7
-
8
- ## Executive Summary
9
-
10
- The DeepCritical drug repurposing agent produces garbage results because the search tools are broken:
11
-
12
- | Tool | Problem | Fix |
13
- |------|---------|-----|
14
- | BioRxiv | API doesn't support search | Replace with Europe PMC |
15
- | PubMed | Raw queries, no preprocessing | Add query cleaner |
16
- | ClinicalTrials | No filtering | Add status/type filters |
17
-
18
- **The Microsoft Agent Framework (Magentic) is working correctly.** The orchestration layer is fine. The data layer is broken.
19
-
20
- ---
21
-
22
- ## Phase Specs
23
-
24
- | Phase | Title | Effort | Priority | Dependencies |
25
- |-------|-------|--------|----------|--------------|
26
- | **01** | [Replace BioRxiv with Europe PMC](./PHASE_01_REPLACE_BIORXIV.md) | 2-3 hrs | P0 | None |
27
- | **02** | [PubMed Query Preprocessing](./PHASE_02_PUBMED_QUERY_PREPROCESSING.md) | 2-3 hrs | P0 | None |
28
- | **03** | [ClinicalTrials Filtering](./PHASE_03_CLINICALTRIALS_FILTERING.md) | 1-2 hrs | P1 | None |
29
-
30
- ---
31
-
32
- ## Recommended Execution Order
33
-
34
- Since all phases are independent, they can be done in parallel by different developers.
35
-
36
- **If doing sequentially, order by impact:**
37
-
38
- 1. **Phase 01** - BioRxiv is completely broken (returns random papers)
39
- 2. **Phase 02** - PubMed is partially broken (returns suboptimal results)
40
- 3. **Phase 03** - ClinicalTrials returns too much noise
41
-
42
- ---
43
-
44
- ## TDD Workflow (Per Phase)
45
-
46
- ```
47
- 1. Write failing tests
48
- 2. Run tests (confirm they fail)
49
- 3. Implement fix
50
- 4. Run tests (confirm they pass)
51
- 5. Run ALL tests (confirm no regressions)
52
- 6. Manual verification
53
- 7. Commit
54
- ```
55
-
56
- ---
57
-
58
- ## Verification After All Phases
59
-
60
- After completing all 3 phases, run this integration test:
61
-
62
- ```bash
63
- # Full system test
64
- uv run python -c "
65
- import asyncio
66
- from src.tools.europepmc import EuropePMCTool
67
- from src.tools.pubmed import PubMedTool
68
- from src.tools.clinicaltrials import ClinicalTrialsTool
69
-
70
- async def test_all():
71
- query = 'long covid treatment'
72
-
73
- print('=== Europe PMC (Preprints) ===')
74
- epmc = EuropePMCTool()
75
- results = await epmc.search(query, 2)
76
- for r in results:
77
- print(f' - {r.citation.title[:60]}...')
78
-
79
- print()
80
- print('=== PubMed ===')
81
- pm = PubMedTool()
82
- results = await pm.search(query, 2)
83
- for r in results:
84
- print(f' - {r.citation.title[:60]}...')
85
-
86
- print()
87
- print('=== ClinicalTrials.gov ===')
88
- ct = ClinicalTrialsTool()
89
- results = await ct.search(query, 2)
90
- for r in results:
91
- print(f' - {r.citation.title[:60]}...')
92
-
93
- asyncio.run(test_all())
94
- "
95
- ```
96
-
97
- **Expected:** All results should be relevant to "long covid treatment"
98
-
99
- ---
100
-
101
- ## Test Magentic Integration
102
-
103
- After all phases are complete, test the full Magentic workflow:
104
-
105
- ```bash
106
- # Test Magentic mode (requires OPENAI_API_KEY)
107
- uv run python -c "
108
- import asyncio
109
- from src.orchestrator_magentic import MagenticOrchestrator
110
-
111
- async def test_magentic():
112
- orchestrator = MagenticOrchestrator(max_rounds=3)
113
-
114
- print('Running Magentic workflow...')
115
- async for event in orchestrator.run('What drugs show promise for Long COVID?'):
116
- print(f'[{event.type}] {event.message[:100]}...')
117
-
118
- asyncio.run(test_magentic())
119
- "
120
- ```
121
-
122
- ---
123
-
124
- ## Files Changed (All Phases)
125
-
126
- | File | Phase | Action |
127
- |------|-------|--------|
128
- | `src/tools/europepmc.py` | 01 | CREATE |
129
- | `tests/unit/tools/test_europepmc.py` | 01 | CREATE |
130
- | `src/agents/tools.py` | 01 | MODIFY |
131
- | `src/tools/search_handler.py` | 01 | MODIFY |
132
- | `src/tools/biorxiv.py` | 01 | DELETE |
133
- | `tests/unit/tools/test_biorxiv.py` | 01 | DELETE |
134
- | `src/tools/query_utils.py` | 02 | CREATE |
135
- | `tests/unit/tools/test_query_utils.py` | 02 | CREATE |
136
- | `src/tools/pubmed.py` | 02 | MODIFY |
137
- | `src/tools/clinicaltrials.py` | 03 | MODIFY |
138
- | `tests/unit/tools/test_clinicaltrials.py` | 03 | MODIFY |
139
-
140
- ---
141
-
142
- ## Success Criteria (Overall)
143
-
144
- - [ ] All unit tests pass
145
- - [ ] All integration tests pass (real APIs)
146
- - [ ] Query "What drugs show promise for Long COVID?" returns relevant results from all 3 sources
147
- - [ ] Magentic workflow produces a coherent research report
148
- - [ ] No regressions in existing functionality
149
-
150
- ---
151
-
152
- ## Related Documentation
153
-
154
- - [P0 Critical Bugs](./P0_CRITICAL_BUGS.md) - Root cause analysis
155
- - [P0 Magentic Audit](./P0_MAGENTIC_AND_SEARCH_AUDIT.md) - Framework verification
156
- - [P0 Actionable Fixes](./P0_ACTIONABLE_FIXES.md) - Fix summaries
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/PHASE_01_REPLACE_BIORXIV.md DELETED
@@ -1,371 +0,0 @@
1
- # Phase 01: Replace BioRxiv with Europe PMC
2
-
3
- **Priority:** P0 - Critical
4
- **Effort:** 2-3 hours
5
- **Dependencies:** None
6
-
7
- ---
8
-
9
- ## Problem Statement
10
-
11
- The BioRxiv API does not support keyword search. It only returns papers by date range, resulting in completely irrelevant results for any query.
12
-
13
- ## Success Criteria
14
-
15
- - [ ] `search_preprints("long covid treatment")` returns papers actually about Long COVID
16
- - [ ] All existing tests pass
17
- - [ ] New tests cover Europe PMC integration
18
-
19
- ---
20
-
21
- ## TDD Implementation Order
22
-
23
- ### Step 1: Write Failing Test
24
-
25
- **File:** `tests/unit/tools/test_europepmc.py`
26
-
27
- ```python
28
- """Unit tests for Europe PMC tool."""
29
-
30
- import pytest
31
- from unittest.mock import AsyncMock, patch
32
-
33
- from src.tools.europepmc import EuropePMCTool
34
- from src.utils.models import Evidence
35
-
36
-
37
- @pytest.mark.unit
38
- class TestEuropePMCTool:
39
- """Tests for EuropePMCTool."""
40
-
41
- @pytest.fixture
42
- def tool(self):
43
- return EuropePMCTool()
44
-
45
- def test_tool_name(self, tool):
46
- assert tool.name == "europepmc"
47
-
48
- @pytest.mark.asyncio
49
- async def test_search_returns_evidence(self, tool):
50
- """Test that search returns Evidence objects."""
51
- mock_response = {
52
- "resultList": {
53
- "result": [
54
- {
55
- "id": "12345",
56
- "title": "Long COVID Treatment Study",
57
- "abstractText": "This study examines treatments for Long COVID.",
58
- "doi": "10.1234/test",
59
- "pubYear": "2024",
60
- "source": "MED",
61
- "pubTypeList": {"pubType": ["research-article"]},
62
- }
63
- ]
64
- }
65
- }
66
-
67
- with patch("httpx.AsyncClient") as mock_client:
68
- mock_instance = AsyncMock()
69
- mock_client.return_value.__aenter__.return_value = mock_instance
70
- mock_instance.get.return_value.json.return_value = mock_response
71
- mock_instance.get.return_value.raise_for_status = lambda: None
72
-
73
- results = await tool.search("long covid treatment", max_results=5)
74
-
75
- assert len(results) == 1
76
- assert isinstance(results[0], Evidence)
77
- assert "Long COVID Treatment Study" in results[0].citation.title
78
-
79
- @pytest.mark.asyncio
80
- async def test_search_marks_preprints(self, tool):
81
- """Test that preprints are marked correctly."""
82
- mock_response = {
83
- "resultList": {
84
- "result": [
85
- {
86
- "id": "PPR12345",
87
- "title": "Preprint Study",
88
- "abstractText": "Abstract text",
89
- "doi": "10.1234/preprint",
90
- "pubYear": "2024",
91
- "source": "PPR",
92
- "pubTypeList": {"pubType": ["Preprint"]},
93
- }
94
- ]
95
- }
96
- }
97
-
98
- with patch("httpx.AsyncClient") as mock_client:
99
- mock_instance = AsyncMock()
100
- mock_client.return_value.__aenter__.return_value = mock_instance
101
- mock_instance.get.return_value.json.return_value = mock_response
102
- mock_instance.get.return_value.raise_for_status = lambda: None
103
-
104
- results = await tool.search("test", max_results=5)
105
-
106
- assert "[PREPRINT]" in results[0].content
107
- assert results[0].citation.source == "preprint"
108
-
109
- @pytest.mark.asyncio
110
- async def test_search_empty_results(self, tool):
111
- """Test handling of empty results."""
112
- mock_response = {"resultList": {"result": []}}
113
-
114
- with patch("httpx.AsyncClient") as mock_client:
115
- mock_instance = AsyncMock()
116
- mock_client.return_value.__aenter__.return_value = mock_instance
117
- mock_instance.get.return_value.json.return_value = mock_response
118
- mock_instance.get.return_value.raise_for_status = lambda: None
119
-
120
- results = await tool.search("nonexistent query xyz", max_results=5)
121
-
122
- assert results == []
123
-
124
-
125
- @pytest.mark.integration
126
- class TestEuropePMCIntegration:
127
- """Integration tests with real API."""
128
-
129
- @pytest.mark.asyncio
130
- async def test_real_api_call(self):
131
- """Test actual API returns relevant results."""
132
- tool = EuropePMCTool()
133
- results = await tool.search("long covid treatment", max_results=3)
134
-
135
- assert len(results) > 0
136
- # At least one result should mention COVID
137
- titles = " ".join([r.citation.title.lower() for r in results])
138
- assert "covid" in titles or "sars" in titles
139
- ```
140
-
141
- ### Step 2: Implement Europe PMC Tool
142
-
143
- **File:** `src/tools/europepmc.py`
144
-
145
- ```python
146
- """Europe PMC search tool - replaces BioRxiv."""
147
-
148
- from typing import Any
149
-
150
- import httpx
151
- from tenacity import retry, stop_after_attempt, wait_exponential
152
-
153
- from src.utils.exceptions import SearchError
154
- from src.utils.models import Citation, Evidence
155
-
156
-
157
- class EuropePMCTool:
158
- """
159
- Search Europe PMC for papers and preprints.
160
-
161
- Europe PMC indexes:
162
- - PubMed/MEDLINE articles
163
- - PMC full-text articles
164
- - Preprints from bioRxiv, medRxiv, ChemRxiv, etc.
165
- - Patents and clinical guidelines
166
-
167
- API Docs: https://europepmc.org/RestfulWebService
168
- """
169
-
170
- BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search"
171
-
172
- @property
173
- def name(self) -> str:
174
- return "europepmc"
175
-
176
- @retry(
177
- stop=stop_after_attempt(3),
178
- wait=wait_exponential(multiplier=1, min=1, max=10),
179
- reraise=True,
180
- )
181
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
182
- """
183
- Search Europe PMC for papers matching query.
184
-
185
- Args:
186
- query: Search keywords
187
- max_results: Maximum results to return
188
-
189
- Returns:
190
- List of Evidence objects
191
- """
192
- params = {
193
- "query": query,
194
- "resultType": "core",
195
- "pageSize": min(max_results, 100),
196
- "format": "json",
197
- }
198
-
199
- async with httpx.AsyncClient(timeout=30.0) as client:
200
- try:
201
- response = await client.get(self.BASE_URL, params=params)
202
- response.raise_for_status()
203
-
204
- data = response.json()
205
- results = data.get("resultList", {}).get("result", [])
206
-
207
- return [self._to_evidence(r) for r in results[:max_results]]
208
-
209
- except httpx.HTTPStatusError as e:
210
- raise SearchError(f"Europe PMC API error: {e}") from e
211
- except httpx.RequestError as e:
212
- raise SearchError(f"Europe PMC connection failed: {e}") from e
213
-
214
- def _to_evidence(self, result: dict[str, Any]) -> Evidence:
215
- """Convert Europe PMC result to Evidence."""
216
- title = result.get("title", "Untitled")
217
- abstract = result.get("abstractText", "No abstract available.")
218
- doi = result.get("doi", "")
219
- pub_year = result.get("pubYear", "Unknown")
220
-
221
- # Get authors
222
- author_list = result.get("authorList", {}).get("author", [])
223
- authors = [a.get("fullName", "") for a in author_list[:5] if a.get("fullName")]
224
-
225
- # Check if preprint
226
- pub_types = result.get("pubTypeList", {}).get("pubType", [])
227
- is_preprint = "Preprint" in pub_types
228
- source_db = result.get("source", "europepmc")
229
-
230
- # Build content
231
- preprint_marker = "[PREPRINT - Not peer-reviewed] " if is_preprint else ""
232
- content = f"{preprint_marker}{abstract[:1800]}"
233
-
234
- # Build URL
235
- if doi:
236
- url = f"https://doi.org/{doi}"
237
- elif result.get("pmid"):
238
- url = f"https://pubmed.ncbi.nlm.nih.gov/{result['pmid']}/"
239
- else:
240
- url = f"https://europepmc.org/article/{source_db}/{result.get('id', '')}"
241
-
242
- return Evidence(
243
- content=content[:2000],
244
- citation=Citation(
245
- source="preprint" if is_preprint else "europepmc",
246
- title=title[:500],
247
- url=url,
248
- date=str(pub_year),
249
- authors=authors,
250
- ),
251
- relevance=0.75 if is_preprint else 0.9,
252
- )
253
- ```
254
-
255
- ### Step 3: Update Magentic Tools
256
-
257
- **File:** `src/agents/tools.py` - Replace biorxiv import:
258
-
259
- ```python
260
- # REMOVE:
261
- # from src.tools.biorxiv import BioRxivTool
262
- # _biorxiv = BioRxivTool()
263
-
264
- # ADD:
265
- from src.tools.europepmc import EuropePMCTool
266
- _europepmc = EuropePMCTool()
267
-
268
- # UPDATE search_preprints function:
269
- @ai_function
270
- async def search_preprints(query: str, max_results: int = 10) -> str:
271
- """Search Europe PMC for preprints and papers.
272
-
273
- Use this tool to find the latest research including preprints
274
- from bioRxiv, medRxiv, and peer-reviewed papers.
275
-
276
- Args:
277
- query: Search terms (e.g., "long covid treatment")
278
- max_results: Maximum results to return (default 10)
279
-
280
- Returns:
281
- Formatted list of papers with abstracts and links
282
- """
283
- state = get_magentic_state()
284
-
285
- results = await _europepmc.search(query, max_results)
286
- if not results:
287
- return f"No papers found for: {query}"
288
-
289
- new_count = state.add_evidence(results)
290
-
291
- output = [f"Found {len(results)} papers ({new_count} new stored):\n"]
292
- for i, r in enumerate(results[:max_results], 1):
293
- title = r.citation.title
294
- date = r.citation.date
295
- source = r.citation.source
296
- content_clean = r.content[:300].replace("\n", " ")
297
- url = r.citation.url
298
-
299
- output.append(f"{i}. **{title}**")
300
- output.append(f" Source: {source} | Date: {date}")
301
- output.append(f" {content_clean}...")
302
- output.append(f" URL: {url}\n")
303
-
304
- return "\n".join(output)
305
- ```
306
-
307
- ### Step 4: Update Search Handler (Simple Mode)
308
-
309
- **File:** `src/tools/search_handler.py` - Update imports:
310
-
311
- ```python
312
- # REMOVE:
313
- # from src.tools.biorxiv import BioRxivTool
314
-
315
- # ADD:
316
- from src.tools.europepmc import EuropePMCTool
317
- ```
318
-
319
- ### Step 5: Delete Old BioRxiv Tests
320
-
321
- ```bash
322
- # After all new tests pass:
323
- rm tests/unit/tools/test_biorxiv.py
324
- ```
325
-
326
- ---
327
-
328
- ## Verification
329
-
330
- ```bash
331
- # Run new tests
332
- uv run pytest tests/unit/tools/test_europepmc.py -v
333
-
334
- # Run integration test (real API)
335
- uv run pytest tests/unit/tools/test_europepmc.py::TestEuropePMCIntegration -v
336
-
337
- # Run all tests to ensure no regressions
338
- uv run pytest tests/unit/ -v
339
-
340
- # Manual verification
341
- uv run python -c "
342
- import asyncio
343
- from src.tools.europepmc import EuropePMCTool
344
- tool = EuropePMCTool()
345
- results = asyncio.run(tool.search('long covid treatment', 3))
346
- for r in results:
347
- print(f'- {r.citation.title}')
348
- "
349
- ```
350
-
351
- ---
352
-
353
- ## Files Changed
354
-
355
- | File | Action |
356
- |------|--------|
357
- | `src/tools/europepmc.py` | CREATE |
358
- | `tests/unit/tools/test_europepmc.py` | CREATE |
359
- | `src/agents/tools.py` | MODIFY (replace biorxiv import) |
360
- | `src/tools/search_handler.py` | MODIFY (replace biorxiv import) |
361
- | `src/tools/biorxiv.py` | DELETE (after verification) |
362
- | `tests/unit/tools/test_biorxiv.py` | DELETE (after verification) |
363
-
364
- ---
365
-
366
- ## Rollback Plan
367
-
368
- If issues arise:
369
- 1. Revert `src/agents/tools.py` to use BioRxivTool
370
- 2. Revert `src/tools/search_handler.py`
371
- 3. Keep `europepmc.py` for future use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/PHASE_02_PUBMED_QUERY_PREPROCESSING.md DELETED
@@ -1,355 +0,0 @@
1
- # Phase 02: PubMed Query Preprocessing
2
-
3
- **Priority:** P0 - Critical
4
- **Effort:** 2-3 hours
5
- **Dependencies:** None (can run parallel with Phase 01)
6
-
7
- ---
8
-
9
- ## Problem Statement
10
-
11
- PubMed receives raw natural language queries like "What medications show promise for Long COVID?" which include question words that pollute search results.
12
-
13
- ## Success Criteria
14
-
15
- - [ ] Question words stripped from queries
16
- - [ ] Medical synonyms expanded (Long COVID → PASC, etc.)
17
- - [ ] Relevant results returned for natural language questions
18
- - [ ] All existing tests pass
19
- - [ ] New tests cover query preprocessing
20
-
21
- ---
22
-
23
- ## TDD Implementation Order
24
-
25
- ### Step 1: Write Failing Tests
26
-
27
- **File:** `tests/unit/tools/test_query_utils.py`
28
-
29
- ```python
30
- """Unit tests for query preprocessing utilities."""
31
-
32
- import pytest
33
-
34
- from src.tools.query_utils import preprocess_query, expand_synonyms, strip_question_words
35
-
36
-
37
- @pytest.mark.unit
38
- class TestQueryPreprocessing:
39
- """Tests for query preprocessing."""
40
-
41
- def test_strip_question_words(self):
42
- """Test removal of question words."""
43
- assert strip_question_words("What drugs treat cancer") == "drugs treat cancer"
44
- assert strip_question_words("Which medications help diabetes") == "medications diabetes"
45
- assert strip_question_words("How can we cure alzheimer") == "cure alzheimer"
46
- assert strip_question_words("Is metformin effective") == "metformin effective"
47
-
48
- def test_strip_preserves_medical_terms(self):
49
- """Test that medical terms are preserved."""
50
- result = strip_question_words("What is the mechanism of metformin")
51
- assert "metformin" in result
52
- assert "mechanism" in result
53
-
54
- def test_expand_synonyms_long_covid(self):
55
- """Test Long COVID synonym expansion."""
56
- result = expand_synonyms("long covid treatment")
57
- assert "PASC" in result or "post-COVID" in result
58
-
59
- def test_expand_synonyms_alzheimer(self):
60
- """Test Alzheimer's synonym expansion."""
61
- result = expand_synonyms("alzheimer drug")
62
- assert "Alzheimer" in result
63
-
64
- def test_expand_synonyms_preserves_unknown(self):
65
- """Test that unknown terms are preserved."""
66
- result = expand_synonyms("metformin diabetes")
67
- assert "metformin" in result
68
- assert "diabetes" in result
69
-
70
- def test_preprocess_query_full_pipeline(self):
71
- """Test complete preprocessing pipeline."""
72
- raw = "What medications show promise for Long COVID?"
73
- result = preprocess_query(raw)
74
-
75
- # Should not contain question words
76
- assert "what" not in result.lower()
77
- assert "show" not in result.lower()
78
- assert "promise" not in result.lower()
79
-
80
- # Should contain expanded terms
81
- assert "PASC" in result or "post-COVID" in result or "long covid" in result.lower()
82
- assert "medications" in result.lower() or "drug" in result.lower()
83
-
84
- def test_preprocess_query_removes_punctuation(self):
85
- """Test that question marks are removed."""
86
- result = preprocess_query("Is metformin safe?")
87
- assert "?" not in result
88
-
89
- def test_preprocess_query_handles_empty(self):
90
- """Test handling of empty/whitespace queries."""
91
- assert preprocess_query("") == ""
92
- assert preprocess_query(" ") == ""
93
-
94
- def test_preprocess_query_already_clean(self):
95
- """Test that clean queries pass through."""
96
- clean = "metformin diabetes mechanism"
97
- result = preprocess_query(clean)
98
- assert "metformin" in result
99
- assert "diabetes" in result
100
- assert "mechanism" in result
101
- ```
102
-
103
- ### Step 2: Implement Query Utils
104
-
105
- **File:** `src/tools/query_utils.py`
106
-
107
- ```python
108
- """Query preprocessing utilities for biomedical search."""
109
-
110
- import re
111
- from typing import ClassVar
112
-
113
- # Question words and filler words to remove
114
- QUESTION_WORDS: set[str] = {
115
- # Question starters
116
- "what", "which", "how", "why", "when", "where", "who", "whom",
117
- # Auxiliary verbs in questions
118
- "is", "are", "was", "were", "do", "does", "did", "can", "could",
119
- "would", "should", "will", "shall", "may", "might",
120
- # Filler words in natural questions
121
- "show", "promise", "help", "believe", "think", "suggest",
122
- "possible", "potential", "effective", "useful", "good",
123
- # Articles (remove but less aggressively)
124
- "the", "a", "an",
125
- }
126
-
127
- # Medical synonym expansions
128
- SYNONYMS: dict[str, list[str]] = {
129
- "long covid": [
130
- "long COVID",
131
- "PASC",
132
- "post-acute sequelae of SARS-CoV-2",
133
- "post-COVID syndrome",
134
- "post-COVID-19 condition",
135
- ],
136
- "alzheimer": [
137
- "Alzheimer's disease",
138
- "Alzheimer disease",
139
- "AD",
140
- "Alzheimer dementia",
141
- ],
142
- "parkinson": [
143
- "Parkinson's disease",
144
- "Parkinson disease",
145
- "PD",
146
- ],
147
- "diabetes": [
148
- "diabetes mellitus",
149
- "type 2 diabetes",
150
- "T2DM",
151
- "diabetic",
152
- ],
153
- "cancer": [
154
- "cancer",
155
- "neoplasm",
156
- "tumor",
157
- "malignancy",
158
- "carcinoma",
159
- ],
160
- "heart disease": [
161
- "cardiovascular disease",
162
- "CVD",
163
- "coronary artery disease",
164
- "heart failure",
165
- ],
166
- }
167
-
168
-
169
- def strip_question_words(query: str) -> str:
170
- """
171
- Remove question words and filler terms from query.
172
-
173
- Args:
174
- query: Raw query string
175
-
176
- Returns:
177
- Query with question words removed
178
- """
179
- words = query.lower().split()
180
- filtered = [w for w in words if w not in QUESTION_WORDS]
181
- return " ".join(filtered)
182
-
183
-
184
- def expand_synonyms(query: str) -> str:
185
- """
186
- Expand medical terms to include synonyms.
187
-
188
- Args:
189
- query: Query string
190
-
191
- Returns:
192
- Query with synonym expansions in OR groups
193
- """
194
- result = query.lower()
195
-
196
- for term, expansions in SYNONYMS.items():
197
- if term in result:
198
- # Create OR group: ("term1" OR "term2" OR "term3")
199
- or_group = " OR ".join([f'"{exp}"' for exp in expansions])
200
- result = result.replace(term, f"({or_group})")
201
-
202
- return result
203
-
204
-
205
- def preprocess_query(raw_query: str) -> str:
206
- """
207
- Full preprocessing pipeline for PubMed queries.
208
-
209
- Pipeline:
210
- 1. Strip whitespace and punctuation
211
- 2. Remove question words
212
- 3. Expand medical synonyms
213
-
214
- Args:
215
- raw_query: Natural language query from user
216
-
217
- Returns:
218
- Optimized query for PubMed
219
- """
220
- if not raw_query or not raw_query.strip():
221
- return ""
222
-
223
- # Remove question marks and extra whitespace
224
- query = raw_query.replace("?", "").strip()
225
- query = re.sub(r"\s+", " ", query)
226
-
227
- # Strip question words
228
- query = strip_question_words(query)
229
-
230
- # Expand synonyms
231
- query = expand_synonyms(query)
232
-
233
- return query.strip()
234
- ```
235
-
236
- ### Step 3: Update PubMed Tool
237
-
238
- **File:** `src/tools/pubmed.py` - Add preprocessing:
239
-
240
- ```python
241
- # Add import at top:
242
- from src.tools.query_utils import preprocess_query
243
-
244
- # Update search method:
245
- @retry(
246
- stop=stop_after_attempt(3),
247
- wait=wait_exponential(multiplier=1, min=1, max=10),
248
- reraise=True,
249
- )
250
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
251
- """
252
- Search PubMed and return evidence.
253
- """
254
- await self._rate_limit()
255
-
256
- # PREPROCESS QUERY
257
- clean_query = preprocess_query(query)
258
- if not clean_query:
259
- clean_query = query # Fallback to original if preprocessing empties it
260
-
261
- async with httpx.AsyncClient(timeout=30.0) as client:
262
- search_params = self._build_params(
263
- db="pubmed",
264
- term=clean_query, # Use preprocessed query
265
- retmax=max_results,
266
- sort="relevance",
267
- )
268
- # ... rest unchanged
269
- ```
270
-
271
- ### Step 4: Update PubMed Tests
272
-
273
- **File:** `tests/unit/tools/test_pubmed.py` - Add preprocessing test:
274
-
275
- ```python
276
- @pytest.mark.asyncio
277
- async def test_search_preprocesses_query(self, pubmed_tool, mock_httpx_client):
278
- """Test that queries are preprocessed before search."""
279
- # This test verifies the integration - the actual preprocessing
280
- # is tested in test_query_utils.py
281
-
282
- mock_httpx_client.get.return_value = httpx.Response(
283
- 200,
284
- json={"esearchresult": {"idlist": []}},
285
- )
286
-
287
- # Natural language query
288
- await pubmed_tool.search("What drugs help with Long COVID?")
289
-
290
- # Verify the call was made (preprocessing happens internally)
291
- assert mock_httpx_client.get.called
292
- ```
293
-
294
- ---
295
-
296
- ## Verification
297
-
298
- ```bash
299
- # Run query utils tests
300
- uv run pytest tests/unit/tools/test_query_utils.py -v
301
-
302
- # Run pubmed tests
303
- uv run pytest tests/unit/tools/test_pubmed.py -v
304
-
305
- # Run all tests
306
- uv run pytest tests/unit/ -v
307
-
308
- # Manual verification
309
- uv run python -c "
310
- from src.tools.query_utils import preprocess_query
311
-
312
- queries = [
313
- 'What medications show promise for Long COVID?',
314
- 'Is metformin effective for cancer treatment?',
315
- 'How can we treat Alzheimer with existing drugs?',
316
- ]
317
-
318
- for q in queries:
319
- print(f'Input: {q}')
320
- print(f'Output: {preprocess_query(q)}')
321
- print()
322
- "
323
- ```
324
-
325
- Expected output:
326
- ```
327
- Input: What medications show promise for Long COVID?
328
- Output: medications ("long COVID" OR "PASC" OR "post-acute sequelae of SARS-CoV-2" OR "post-COVID syndrome" OR "post-COVID-19 condition")
329
-
330
- Input: Is metformin effective for cancer treatment?
331
- Output: metformin for ("cancer" OR "neoplasm" OR "tumor" OR "malignancy" OR "carcinoma") treatment
332
-
333
- Input: How can we treat Alzheimer with existing drugs?
334
- Output: we treat ("Alzheimer's disease" OR "Alzheimer disease" OR "AD" OR "Alzheimer dementia") with existing drugs
335
- ```
336
-
337
- ---
338
-
339
- ## Files Changed
340
-
341
- | File | Action |
342
- |------|--------|
343
- | `src/tools/query_utils.py` | CREATE |
344
- | `tests/unit/tools/test_query_utils.py` | CREATE |
345
- | `src/tools/pubmed.py` | MODIFY (add preprocessing) |
346
- | `tests/unit/tools/test_pubmed.py` | MODIFY (add integration test) |
347
-
348
- ---
349
-
350
- ## Future Enhancements (Out of Scope)
351
-
352
- - MeSH term lookup via NCBI API
353
- - Drug name normalization (brand → generic)
354
- - Disease ontology integration (UMLS)
355
- - Query intent classification
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/bugs/PHASE_03_CLINICALTRIALS_FILTERING.md DELETED
@@ -1,386 +0,0 @@
1
- # Phase 03: ClinicalTrials.gov Filtering
2
-
3
- **Priority:** P1 - High
4
- **Effort:** 1-2 hours
5
- **Dependencies:** None (can run parallel with Phase 01 & 02)
6
-
7
- ---
8
-
9
- ## Problem Statement
10
-
11
- ClinicalTrials.gov returns ALL matching trials including:
12
- - Withdrawn/Terminated trials (no useful data)
13
- - Observational studies (not drug interventions)
14
- - Phase 1 trials (safety only, no efficacy)
15
-
16
- For drug repurposing, we need interventional studies with efficacy data.
17
-
18
- ## Success Criteria
19
-
20
- - [ ] Only interventional studies returned
21
- - [ ] Withdrawn/terminated trials filtered out
22
- - [ ] Phase information included in results
23
- - [ ] All existing tests pass
24
- - [ ] New tests cover filtering
25
-
26
- ---
27
-
28
- ## TDD Implementation Order
29
-
30
- ### Step 1: Write Failing Tests
31
-
32
- **File:** `tests/unit/tools/test_clinicaltrials.py` - Add filter tests:
33
-
34
- ```python
35
- """Unit tests for ClinicalTrials.gov tool."""
36
-
37
- import pytest
38
- from unittest.mock import patch, MagicMock
39
-
40
- from src.tools.clinicaltrials import ClinicalTrialsTool
41
- from src.utils.models import Evidence
42
-
43
-
44
- @pytest.mark.unit
45
- class TestClinicalTrialsTool:
46
- """Tests for ClinicalTrialsTool."""
47
-
48
- @pytest.fixture
49
- def tool(self):
50
- return ClinicalTrialsTool()
51
-
52
- def test_tool_name(self, tool):
53
- assert tool.name == "clinicaltrials"
54
-
55
- @pytest.mark.asyncio
56
- async def test_search_uses_filters(self, tool):
57
- """Test that search applies status and type filters."""
58
- mock_response = MagicMock()
59
- mock_response.json.return_value = {"studies": []}
60
- mock_response.raise_for_status = MagicMock()
61
-
62
- with patch("requests.get", return_value=mock_response) as mock_get:
63
- await tool.search("test query", max_results=5)
64
-
65
- # Verify filters were applied
66
- call_args = mock_get.call_args
67
- params = call_args.kwargs.get("params", call_args[1].get("params", {}))
68
-
69
- # Should filter for active/completed studies
70
- assert "filter.overallStatus" in params
71
- assert "COMPLETED" in params["filter.overallStatus"]
72
- assert "RECRUITING" in params["filter.overallStatus"]
73
-
74
- # Should filter for interventional studies
75
- assert "filter.studyType" in params
76
- assert "INTERVENTIONAL" in params["filter.studyType"]
77
-
78
- @pytest.mark.asyncio
79
- async def test_search_returns_evidence(self, tool):
80
- """Test that search returns Evidence objects."""
81
- mock_study = {
82
- "protocolSection": {
83
- "identificationModule": {
84
- "nctId": "NCT12345678",
85
- "briefTitle": "Metformin for Long COVID Treatment",
86
- },
87
- "statusModule": {
88
- "overallStatus": "COMPLETED",
89
- "startDateStruct": {"date": "2023-01-01"},
90
- },
91
- "descriptionModule": {
92
- "briefSummary": "A study examining metformin for Long COVID symptoms.",
93
- },
94
- "designModule": {
95
- "phases": ["PHASE2", "PHASE3"],
96
- },
97
- "conditionsModule": {
98
- "conditions": ["Long COVID", "PASC"],
99
- },
100
- "armsInterventionsModule": {
101
- "interventions": [{"name": "Metformin"}],
102
- },
103
- }
104
- }
105
-
106
- mock_response = MagicMock()
107
- mock_response.json.return_value = {"studies": [mock_study]}
108
- mock_response.raise_for_status = MagicMock()
109
-
110
- with patch("requests.get", return_value=mock_response):
111
- results = await tool.search("long covid metformin", max_results=5)
112
-
113
- assert len(results) == 1
114
- assert isinstance(results[0], Evidence)
115
- assert "Metformin" in results[0].citation.title
116
- assert "PHASE2" in results[0].content or "Phase" in results[0].content
117
-
118
- @pytest.mark.asyncio
119
- async def test_search_includes_phase_info(self, tool):
120
- """Test that phase information is included in content."""
121
- mock_study = {
122
- "protocolSection": {
123
- "identificationModule": {
124
- "nctId": "NCT12345678",
125
- "briefTitle": "Test Study",
126
- },
127
- "statusModule": {
128
- "overallStatus": "RECRUITING",
129
- "startDateStruct": {"date": "2024-01-01"},
130
- },
131
- "descriptionModule": {
132
- "briefSummary": "Test summary.",
133
- },
134
- "designModule": {
135
- "phases": ["PHASE3"],
136
- },
137
- "conditionsModule": {"conditions": ["Test"]},
138
- "armsInterventionsModule": {"interventions": []},
139
- }
140
- }
141
-
142
- mock_response = MagicMock()
143
- mock_response.json.return_value = {"studies": [mock_study]}
144
- mock_response.raise_for_status = MagicMock()
145
-
146
- with patch("requests.get", return_value=mock_response):
147
- results = await tool.search("test", max_results=5)
148
-
149
- # Phase should be in content
150
- assert "PHASE3" in results[0].content or "Phase 3" in results[0].content
151
-
152
- @pytest.mark.asyncio
153
- async def test_search_empty_results(self, tool):
154
- """Test handling of empty results."""
155
- mock_response = MagicMock()
156
- mock_response.json.return_value = {"studies": []}
157
- mock_response.raise_for_status = MagicMock()
158
-
159
- with patch("requests.get", return_value=mock_response):
160
- results = await tool.search("nonexistent xyz 12345", max_results=5)
161
- assert results == []
162
-
163
-
164
- @pytest.mark.integration
165
- class TestClinicalTrialsIntegration:
166
- """Integration tests with real API."""
167
-
168
- @pytest.mark.asyncio
169
- async def test_real_api_returns_interventional(self):
170
- """Test that real API returns interventional studies."""
171
- tool = ClinicalTrialsTool()
172
- results = await tool.search("long covid treatment", max_results=3)
173
-
174
- # Should get results
175
- assert len(results) > 0
176
-
177
- # Results should mention interventions or treatments
178
- all_content = " ".join([r.content.lower() for r in results])
179
- has_intervention = (
180
- "intervention" in all_content
181
- or "treatment" in all_content
182
- or "drug" in all_content
183
- or "phase" in all_content
184
- )
185
- assert has_intervention
186
- ```
187
-
188
- ### Step 2: Update ClinicalTrials Tool
189
-
190
- **File:** `src/tools/clinicaltrials.py` - Add filters:
191
-
192
- ```python
193
- """ClinicalTrials.gov search tool using API v2."""
194
-
195
- import asyncio
196
- from typing import Any, ClassVar
197
-
198
- import requests
199
- from tenacity import retry, stop_after_attempt, wait_exponential
200
-
201
- from src.utils.exceptions import SearchError
202
- from src.utils.models import Citation, Evidence
203
-
204
-
205
- class ClinicalTrialsTool:
206
- """Search tool for ClinicalTrials.gov.
207
-
208
- Note: Uses `requests` library instead of `httpx` because ClinicalTrials.gov's
209
- WAF blocks httpx's TLS fingerprint. The `requests` library is not blocked.
210
- See: https://clinicaltrials.gov/data-api/api
211
- """
212
-
213
- BASE_URL = "https://clinicaltrials.gov/api/v2/studies"
214
-
215
- # Fields to retrieve
216
- FIELDS: ClassVar[list[str]] = [
217
- "NCTId",
218
- "BriefTitle",
219
- "Phase",
220
- "OverallStatus",
221
- "Condition",
222
- "InterventionName",
223
- "StartDate",
224
- "BriefSummary",
225
- ]
226
-
227
- # Status filter: Only active/completed studies with potential data
228
- STATUS_FILTER = "COMPLETED|ACTIVE_NOT_RECRUITING|RECRUITING|ENROLLING_BY_INVITATION"
229
-
230
- # Study type filter: Only interventional (drug/treatment studies)
231
- STUDY_TYPE_FILTER = "INTERVENTIONAL"
232
-
233
- @property
234
- def name(self) -> str:
235
- return "clinicaltrials"
236
-
237
- @retry(
238
- stop=stop_after_attempt(3),
239
- wait=wait_exponential(multiplier=1, min=1, max=10),
240
- reraise=True,
241
- )
242
- async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
243
- """Search ClinicalTrials.gov for interventional studies.
244
-
245
- Args:
246
- query: Search query (e.g., "metformin alzheimer")
247
- max_results: Maximum results to return (max 100)
248
-
249
- Returns:
250
- List of Evidence objects from clinical trials
251
- """
252
- params: dict[str, str | int] = {
253
- "query.term": query,
254
- "pageSize": min(max_results, 100),
255
- "fields": "|".join(self.FIELDS),
256
- # FILTERS - Only interventional, active/completed studies
257
- "filter.overallStatus": self.STATUS_FILTER,
258
- "filter.studyType": self.STUDY_TYPE_FILTER,
259
- }
260
-
261
- try:
262
- # Run blocking requests.get in a separate thread for async compatibility
263
- response = await asyncio.to_thread(
264
- requests.get,
265
- self.BASE_URL,
266
- params=params,
267
- headers={"User-Agent": "DeepCritical-Research-Agent/1.0"},
268
- timeout=30,
269
- )
270
- response.raise_for_status()
271
-
272
- data = response.json()
273
- studies = data.get("studies", [])
274
- return [self._study_to_evidence(study) for study in studies[:max_results]]
275
-
276
- except requests.HTTPError as e:
277
- raise SearchError(f"ClinicalTrials.gov API error: {e}") from e
278
- except requests.RequestException as e:
279
- raise SearchError(f"ClinicalTrials.gov request failed: {e}") from e
280
-
281
- def _study_to_evidence(self, study: dict[str, Any]) -> Evidence:
282
- """Convert a clinical trial study to Evidence."""
283
- # Navigate nested structure
284
- protocol = study.get("protocolSection", {})
285
- id_module = protocol.get("identificationModule", {})
286
- status_module = protocol.get("statusModule", {})
287
- desc_module = protocol.get("descriptionModule", {})
288
- design_module = protocol.get("designModule", {})
289
- conditions_module = protocol.get("conditionsModule", {})
290
- arms_module = protocol.get("armsInterventionsModule", {})
291
-
292
- nct_id = id_module.get("nctId", "Unknown")
293
- title = id_module.get("briefTitle", "Untitled Study")
294
- status = status_module.get("overallStatus", "Unknown")
295
- start_date = status_module.get("startDateStruct", {}).get("date", "Unknown")
296
-
297
- # Get phase (might be a list)
298
- phases = design_module.get("phases", [])
299
- phase = phases[0] if phases else "Not Applicable"
300
-
301
- # Get conditions
302
- conditions = conditions_module.get("conditions", [])
303
- conditions_str = ", ".join(conditions[:3]) if conditions else "Unknown"
304
-
305
- # Get interventions
306
- interventions = arms_module.get("interventions", [])
307
- intervention_names = [i.get("name", "") for i in interventions[:3]]
308
- interventions_str = ", ".join(intervention_names) if intervention_names else "Unknown"
309
-
310
- # Get summary
311
- summary = desc_module.get("briefSummary", "No summary available.")
312
-
313
- # Build content with key trial info
314
- content = (
315
- f"{summary[:500]}... "
316
- f"Trial Phase: {phase}. "
317
- f"Status: {status}. "
318
- f"Conditions: {conditions_str}. "
319
- f"Interventions: {interventions_str}."
320
- )
321
-
322
- return Evidence(
323
- content=content[:2000],
324
- citation=Citation(
325
- source="clinicaltrials",
326
- title=title[:500],
327
- url=f"https://clinicaltrials.gov/study/{nct_id}",
328
- date=start_date,
329
- authors=[], # Trials don't have traditional authors
330
- ),
331
- relevance=0.85, # Trials are highly relevant for repurposing
332
- )
333
- ```
334
-
335
- ---
336
-
337
- ## Verification
338
-
339
- ```bash
340
- # Run clinicaltrials tests
341
- uv run pytest tests/unit/tools/test_clinicaltrials.py -v
342
-
343
- # Run integration test (real API)
344
- uv run pytest tests/unit/tools/test_clinicaltrials.py::TestClinicalTrialsIntegration -v
345
-
346
- # Run all tests
347
- uv run pytest tests/unit/ -v
348
-
349
- # Manual verification
350
- uv run python -c "
351
- import asyncio
352
- from src.tools.clinicaltrials import ClinicalTrialsTool
353
-
354
- tool = ClinicalTrialsTool()
355
- results = asyncio.run(tool.search('long covid treatment', 3))
356
-
357
- for r in results:
358
- print(f'Title: {r.citation.title}')
359
- print(f'Content: {r.content[:200]}...')
360
- print()
361
- "
362
- ```
363
-
364
- ---
365
-
366
- ## Files Changed
367
-
368
- | File | Action |
369
- |------|--------|
370
- | `src/tools/clinicaltrials.py` | MODIFY (add filters) |
371
- | `tests/unit/tools/test_clinicaltrials.py` | MODIFY (add filter tests) |
372
-
373
- ---
374
-
375
- ## API Filter Reference
376
-
377
- ClinicalTrials.gov API v2 supports these filters:
378
-
379
- | Parameter | Values | Purpose |
380
- |-----------|--------|---------|
381
- | `filter.overallStatus` | COMPLETED, RECRUITING, etc. | Trial status |
382
- | `filter.studyType` | INTERVENTIONAL, OBSERVATIONAL | Study design |
383
- | `filter.phase` | PHASE1, PHASE2, PHASE3, PHASE4 | Trial phase |
384
- | `filter.geo` | Country codes | Geographic filter |
385
-
386
- See: https://clinicaltrials.gov/data-api/api