Spaces:
Running
Running
| # Phase 2 Implementation Spec: Search Vertical Slice | |
| **Goal**: Implement the "Eyes and Ears" of the agent — retrieving real biomedical data. | |
| **Philosophy**: "Real data, mocked connections." | |
| **Prerequisite**: Phase 1 complete (all tests passing) | |
| --- | |
| ## 1. The Slice Definition | |
| This slice covers: | |
| 1. **Input**: A string query (e.g., "metformin Alzheimer's disease"). | |
| 2. **Process**: | |
| - Fetch from PubMed (E-utilities API). | |
| - Fetch from Web (DuckDuckGo). | |
| - Normalize results into `Evidence` models. | |
| 3. **Output**: A list of `Evidence` objects. | |
| **Files to Create**: | |
| - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult) | |
| - `src/tools/pubmed.py` - PubMed E-utilities tool | |
| - `src/tools/websearch.py` - DuckDuckGo search tool | |
| - `src/tools/search_handler.py` - Orchestrates multiple tools | |
| - `src/tools/__init__.py` - Exports | |
| --- | |
| ## 2. PubMed E-utilities API Reference | |
| **Base URL**: `https://eutils.ncbi.nlm.nih.gov/entrez/eutils/` | |
| ### Key Endpoints | |
| | Endpoint | Purpose | Example | | |
| |----------|---------|---------| | |
| | `esearch.fcgi` | Search for article IDs | `?db=pubmed&term=metformin+alzheimer&retmax=10` | | |
| | `efetch.fcgi` | Fetch article details | `?db=pubmed&id=12345,67890&rettype=abstract&retmode=xml` | | |
| ### Rate Limiting (CRITICAL!) | |
| NCBI **requires** rate limiting: | |
| - **Without API key**: 3 requests/second | |
| - **With API key**: 10 requests/second | |
| Get a free API key: https://www.ncbi.nlm.nih.gov/account/settings/ | |
| ```python | |
| # Add to .env | |
| NCBI_API_KEY=your-key-here # Optional but recommended | |
| ``` | |
| ### Example Search Flow | |
| ``` | |
| 1. esearch: "metformin alzheimer" → [PMID: 12345, 67890, ...] | |
| 2. efetch: PMIDs → Full abstracts/metadata | |
| 3. Parse XML → Evidence objects | |
| ``` | |
| --- | |
| ## 3. Models (`src/utils/models.py`) | |
| ```python | |
| """Data models for the Search feature.""" | |
| from pydantic import BaseModel, Field | |
| from typing import Literal | |
| class Citation(BaseModel): | |
| """A citation to a source document.""" | |
| source: Literal["pubmed", "web"] = Field(description="Where this came from") | |
| title: str = Field(min_length=1, max_length=500) | |
| url: str = Field(description="URL to the source") | |
| date: str = Field(description="Publication date (YYYY-MM-DD or 'Unknown')") | |
| authors: list[str] = Field(default_factory=list) | |
| @property | |
| def formatted(self) -> str: | |
| """Format as a citation string.""" | |
| author_str = ", ".join(self.authors[:3]) | |
| if len(self.authors) > 3: | |
| author_str += " et al." | |
| return f"{author_str} ({self.date}). {self.title}. {self.source.upper()}" | |
| class Evidence(BaseModel): | |
| """A piece of evidence retrieved from search.""" | |
| content: str = Field(min_length=1, description="The actual text content") | |
| citation: Citation | |
| relevance: float = Field(default=0.0, ge=0.0, le=1.0, description="Relevance score 0-1") | |
| class Config: | |
| frozen = True # Immutable after creation | |
| class SearchResult(BaseModel): | |
| """Result of a search operation.""" | |
| query: str | |
| evidence: list[Evidence] | |
| sources_searched: list[Literal["pubmed", "web"]] | |
| total_found: int | |
| errors: list[str] = Field(default_factory=list) | |
| ``` | |
| --- | |
| ## 4. Tool Protocol (`src/tools/pubmed.py` and `src/tools/websearch.py`) | |
| ### The Interface (Protocol) - Add to `src/tools/__init__.py` | |
| ```python | |
| """Search tools package.""" | |
| from typing import Protocol, List | |
| # Import implementations | |
| from src.tools.pubmed import PubMedTool | |
| from src.tools.websearch import WebTool | |
| from src.tools.search_handler import SearchHandler | |
| # Re-export | |
| __all__ = ["SearchTool", "PubMedTool", "WebTool", "SearchHandler"] | |
| class SearchTool(Protocol): | |
| """Protocol defining the interface for all search tools.""" | |
| @property | |
| def name(self) -> str: | |
| """Human-readable name of this tool.""" | |
| ... | |
| async def search(self, query: str, max_results: int = 10) -> List["Evidence"]: | |
| """ | |
| Execute a search and return evidence. | |
| Args: | |
| query: The search query string | |
| max_results: Maximum number of results to return | |
| Returns: | |
| List of Evidence objects | |
| Raises: | |
| SearchError: If the search fails | |
| RateLimitError: If we hit rate limits | |
| """ | |
| ... | |
| ``` | |
| ### PubMed Tool Implementation (`src/tools/pubmed.py`) | |
| ```python | |
| """PubMed search tool using NCBI E-utilities.""" | |
| import asyncio | |
| import httpx | |
| import xmltodict | |
| from typing import List | |
| from tenacity import retry, stop_after_attempt, wait_exponential | |
| from src.utils.config import settings | |
| from src.utils.exceptions import SearchError, RateLimitError | |
| from src.utils.models import Evidence, Citation | |
| class PubMedTool: | |
| """Search tool for PubMed/NCBI.""" | |
| BASE_URL = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils" | |
| RATE_LIMIT_DELAY = 0.34 # ~3 requests/sec without API key | |
| def __init__(self, api_key: str | None = None): | |
| self.api_key = api_key or getattr(settings, "ncbi_api_key", None) | |
| self._last_request_time = 0.0 | |
| @property | |
| def name(self) -> str: | |
| return "pubmed" | |
| async def _rate_limit(self) -> None: | |
| """Enforce NCBI rate limiting.""" | |
| now = asyncio.get_event_loop().time() | |
| elapsed = now - self._last_request_time | |
| if elapsed < self.RATE_LIMIT_DELAY: | |
| await asyncio.sleep(self.RATE_LIMIT_DELAY - elapsed) | |
| self._last_request_time = asyncio.get_event_loop().time() | |
| def _build_params(self, **kwargs) -> dict: | |
| """Build request params with optional API key.""" | |
| params = {**kwargs, "retmode": "json"} | |
| if self.api_key: | |
| params["api_key"] = self.api_key | |
| return params | |
| @retry( | |
| stop=stop_after_attempt(3), | |
| wait=wait_exponential(multiplier=1, min=1, max=10), | |
| reraise=True, | |
| ) | |
| async def search(self, query: str, max_results: int = 10) -> List[Evidence]: | |
| """ | |
| Search PubMed and return evidence. | |
| 1. ESearch: Get PMIDs matching query | |
| 2. EFetch: Get abstracts for those PMIDs | |
| 3. Parse and return Evidence objects | |
| """ | |
| await self._rate_limit() | |
| async with httpx.AsyncClient(timeout=30.0) as client: | |
| # Step 1: Search for PMIDs | |
| search_params = self._build_params( | |
| db="pubmed", | |
| term=query, | |
| retmax=max_results, | |
| sort="relevance", | |
| ) | |
| try: | |
| search_resp = await client.get( | |
| f"{self.BASE_URL}/esearch.fcgi", | |
| params=search_params, | |
| ) | |
| search_resp.raise_for_status() | |
| except httpx.HTTPStatusError as e: | |
| if e.response.status_code == 429: | |
| raise RateLimitError("PubMed rate limit exceeded") | |
| raise SearchError(f"PubMed search failed: {e}") | |
| search_data = search_resp.json() | |
| pmids = search_data.get("esearchresult", {}).get("idlist", []) | |
| if not pmids: | |
| return [] | |
| # Step 2: Fetch abstracts | |
| await self._rate_limit() | |
| fetch_params = self._build_params( | |
| db="pubmed", | |
| id=",".join(pmids), | |
| rettype="abstract", | |
| ) | |
| # Use XML for fetch (more reliable parsing) | |
| fetch_params["retmode"] = "xml" | |
| fetch_resp = await client.get( | |
| f"{self.BASE_URL}/efetch.fcgi", | |
| params=fetch_params, | |
| ) | |
| fetch_resp.raise_for_status() | |
| # Step 3: Parse XML to Evidence | |
| return self._parse_pubmed_xml(fetch_resp.text) | |
| def _parse_pubmed_xml(self, xml_text: str) -> List[Evidence]: | |
| """Parse PubMed XML into Evidence objects.""" | |
| try: | |
| data = xmltodict.parse(xml_text) | |
| except Exception as e: | |
| raise SearchError(f"Failed to parse PubMed XML: {e}") | |
| articles = data.get("PubmedArticleSet", {}).get("PubmedArticle", []) | |
| # Handle single article (xmltodict returns dict instead of list) | |
| if isinstance(articles, dict): | |
| articles = [articles] | |
| evidence_list = [] | |
| for article in articles: | |
| try: | |
| evidence = self._article_to_evidence(article) | |
| if evidence: | |
| evidence_list.append(evidence) | |
| except Exception: | |
| continue # Skip malformed articles | |
| return evidence_list | |
| def _article_to_evidence(self, article: dict) -> Evidence | None: | |
| """Convert a single PubMed article to Evidence.""" | |
| medline = article.get("MedlineCitation", {}) | |
| article_data = medline.get("Article", {}) | |
| # Extract PMID | |
| pmid = medline.get("PMID", {}) | |
| if isinstance(pmid, dict): | |
| pmid = pmid.get("#text", "") | |
| # Extract title | |
| title = article_data.get("ArticleTitle", "") | |
| if isinstance(title, dict): | |
| title = title.get("#text", str(title)) | |
| # Extract abstract | |
| abstract_data = article_data.get("Abstract", {}).get("AbstractText", "") | |
| if isinstance(abstract_data, list): | |
| abstract = " ".join( | |
| item.get("#text", str(item)) if isinstance(item, dict) else str(item) | |
| for item in abstract_data | |
| ) | |
| elif isinstance(abstract_data, dict): | |
| abstract = abstract_data.get("#text", str(abstract_data)) | |
| else: | |
| abstract = str(abstract_data) | |
| if not abstract or not title: | |
| return None | |
| # Extract date | |
| pub_date = article_data.get("Journal", {}).get("JournalIssue", {}).get("PubDate", {}) | |
| year = pub_date.get("Year", "Unknown") | |
| month = pub_date.get("Month", "01") | |
| day = pub_date.get("Day", "01") | |
| date_str = f"{year}-{month}-{day}" if year != "Unknown" else "Unknown" | |
| # Extract authors | |
| author_list = article_data.get("AuthorList", {}).get("Author", []) | |
| if isinstance(author_list, dict): | |
| author_list = [author_list] | |
| authors = [] | |
| for author in author_list[:5]: # Limit to 5 authors | |
| last = author.get("LastName", "") | |
| first = author.get("ForeName", "") | |
| if last: | |
| authors.append(f"{last} {first}".strip()) | |
| return Evidence( | |
| content=abstract[:2000], # Truncate long abstracts | |
| citation=Citation( | |
| source="pubmed", | |
| title=title[:500], | |
| url=f"https://pubmed.ncbi.nlm.nih.gov/{pmid}/", | |
| date=date_str, | |
| authors=authors, | |
| ), | |
| ) | |
| ``` | |
| ### DuckDuckGo Tool Implementation (`src/tools/websearch.py`) | |
| ```python | |
| """Web search tool using DuckDuckGo.""" | |
| from typing import List | |
| from duckduckgo_search import DDGS | |
| from src.utils.exceptions import SearchError | |
| from src.utils.models import Evidence, Citation | |
| class WebTool: | |
| """Search tool for general web search via DuckDuckGo.""" | |
| def __init__(self): | |
| pass | |
| @property | |
| def name(self) -> str: | |
| return "web" | |
| async def search(self, query: str, max_results: int = 10) -> List[Evidence]: | |
| """ | |
| Search DuckDuckGo and return evidence. | |
| Note: duckduckgo-search is synchronous, so we run it in executor. | |
| """ | |
| import asyncio | |
| loop = asyncio.get_event_loop() | |
| try: | |
| results = await loop.run_in_executor( | |
| None, | |
| lambda: self._sync_search(query, max_results), | |
| ) | |
| return results | |
| except Exception as e: | |
| raise SearchError(f"Web search failed: {e}") | |
| def _sync_search(self, query: str, max_results: int) -> List[Evidence]: | |
| """Synchronous search implementation.""" | |
| evidence_list = [] | |
| with DDGS() as ddgs: | |
| results = list(ddgs.text(query, max_results=max_results)) | |
| for result in results: | |
| evidence_list.append( | |
| Evidence( | |
| content=result.get("body", "")[:1000], | |
| citation=Citation( | |
| source="web", | |
| title=result.get("title", "Unknown")[:500], | |
| url=result.get("href", ""), | |
| date="Unknown", | |
| authors=[], | |
| ), | |
| ) | |
| ) | |
| return evidence_list | |
| ``` | |
| --- | |
| ## 5. Search Handler (`src/tools/search_handler.py`) | |
| The handler orchestrates multiple tools using the **Scatter-Gather** pattern. | |
| ```python | |
| """Search handler - orchestrates multiple search tools.""" | |
| import asyncio | |
| from typing import List, Protocol | |
| import structlog | |
| from src.utils.exceptions import SearchError | |
| from src.utils.models import Evidence, SearchResult | |
| logger = structlog.get_logger() | |
| class SearchTool(Protocol): | |
| """Protocol defining the interface for all search tools.""" | |
| @property | |
| def name(self) -> str: | |
| ... | |
| async def search(self, query: str, max_results: int = 10) -> List[Evidence]: | |
| ... | |
| def flatten(nested: List[List[Evidence]]) -> List[Evidence]: | |
| """Flatten a list of lists into a single list.""" | |
| return [item for sublist in nested for item in sublist] | |
| class SearchHandler: | |
| """Orchestrates parallel searches across multiple tools.""" | |
| def __init__(self, tools: List[SearchTool], timeout: float = 30.0): | |
| """ | |
| Initialize the search handler. | |
| Args: | |
| tools: List of search tools to use | |
| timeout: Timeout for each search in seconds | |
| """ | |
| self.tools = tools | |
| self.timeout = timeout | |
| async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult: | |
| """ | |
| Execute search across all tools in parallel. | |
| Args: | |
| query: The search query | |
| max_results_per_tool: Max results from each tool | |
| Returns: | |
| SearchResult containing all evidence and metadata | |
| """ | |
| logger.info("Starting search", query=query, tools=[t.name for t in self.tools]) | |
| # Create tasks for parallel execution | |
| tasks = [ | |
| self._search_with_timeout(tool, query, max_results_per_tool) | |
| for tool in self.tools | |
| ] | |
| # Gather results (don't fail if one tool fails) | |
| results = await asyncio.gather(*tasks, return_exceptions=True) | |
| # Process results | |
| all_evidence: List[Evidence] = [] | |
| sources_searched: List[str] = [] | |
| errors: List[str] = [] | |
| for tool, result in zip(self.tools, results): | |
| if isinstance(result, Exception): | |
| errors.append(f"{tool.name}: {str(result)}") | |
| logger.warning("Search tool failed", tool=tool.name, error=str(result)) | |
| else: | |
| all_evidence.extend(result) | |
| sources_searched.append(tool.name) | |
| logger.info("Search tool succeeded", tool=tool.name, count=len(result)) | |
| return SearchResult( | |
| query=query, | |
| evidence=all_evidence, | |
| sources_searched=sources_searched, | |
| total_found=len(all_evidence), | |
| errors=errors, | |
| ) | |
| async def _search_with_timeout( | |
| self, | |
| tool: SearchTool, | |
| query: str, | |
| max_results: int, | |
| ) -> List[Evidence]: | |
| """Execute a single tool search with timeout.""" | |
| try: | |
| return await asyncio.wait_for( | |
| tool.search(query, max_results), | |
| timeout=self.timeout, | |
| ) | |
| except asyncio.TimeoutError: | |
| raise SearchError(f"{tool.name} search timed out after {self.timeout}s") | |
| ``` | |
| --- | |
| ## 6. TDD Workflow | |
| ### Test File: `tests/unit/tools/test_pubmed.py` | |
| ```python | |
| """Unit tests for PubMed tool.""" | |
| import pytest | |
| from unittest.mock import AsyncMock, MagicMock | |
| # Sample PubMed XML response for mocking | |
| SAMPLE_PUBMED_XML = """<?xml version="1.0" ?> | |
| <PubmedArticleSet> | |
| <PubmedArticle> | |
| <MedlineCitation> | |
| <PMID>12345678</PMID> | |
| <Article> | |
| <ArticleTitle>Metformin in Alzheimer's Disease: A Systematic Review</ArticleTitle> | |
| <Abstract> | |
| <AbstractText>Metformin shows neuroprotective properties...</AbstractText> | |
| </Abstract> | |
| <AuthorList> | |
| <Author> | |
| <LastName>Smith</LastName> | |
| <ForeName>John</ForeName> | |
| </Author> | |
| </AuthorList> | |
| <Journal> | |
| <JournalIssue> | |
| <PubDate> | |
| <Year>2024</Year> | |
| <Month>01</Month> | |
| </PubDate> | |
| </JournalIssue> | |
| </Journal> | |
| </Article> | |
| </MedlineCitation> | |
| </PubmedArticle> | |
| </PubmedArticleSet> | |
| """ | |
| class TestPubMedTool: | |
| """Tests for PubMedTool.""" | |
| @pytest.mark.asyncio | |
| async def test_search_returns_evidence(self, mocker): | |
| """PubMedTool should return Evidence objects from search.""" | |
| from src.tools.pubmed import PubMedTool | |
| # Mock the HTTP responses | |
| mock_search_response = MagicMock() | |
| mock_search_response.json.return_value = { | |
| "esearchresult": {"idlist": ["12345678"]} | |
| } | |
| mock_search_response.raise_for_status = MagicMock() | |
| mock_fetch_response = MagicMock() | |
| mock_fetch_response.text = SAMPLE_PUBMED_XML | |
| mock_fetch_response.raise_for_status = MagicMock() | |
| mock_client = AsyncMock() | |
| mock_client.get = AsyncMock(side_effect=[mock_search_response, mock_fetch_response]) | |
| mock_client.__aenter__ = AsyncMock(return_value=mock_client) | |
| mock_client.__aexit__ = AsyncMock(return_value=None) | |
| mocker.patch("httpx.AsyncClient", return_value=mock_client) | |
| # Act | |
| tool = PubMedTool() | |
| results = await tool.search("metformin alzheimer") | |
| # Assert | |
| assert len(results) == 1 | |
| assert results[0].citation.source == "pubmed" | |
| assert "Metformin" in results[0].citation.title | |
| assert "12345678" in results[0].citation.url | |
| @pytest.mark.asyncio | |
| async def test_search_empty_results(self, mocker): | |
| """PubMedTool should return empty list when no results.""" | |
| from src.tools.pubmed import PubMedTool | |
| mock_response = MagicMock() | |
| mock_response.json.return_value = {"esearchresult": {"idlist": []}} | |
| mock_response.raise_for_status = MagicMock() | |
| mock_client = AsyncMock() | |
| mock_client.get = AsyncMock(return_value=mock_response) | |
| mock_client.__aenter__ = AsyncMock(return_value=mock_client) | |
| mock_client.__aexit__ = AsyncMock(return_value=None) | |
| mocker.patch("httpx.AsyncClient", return_value=mock_client) | |
| tool = PubMedTool() | |
| results = await tool.search("xyznonexistentquery123") | |
| assert results == [] | |
| def test_parse_pubmed_xml(self): | |
| """PubMedTool should correctly parse XML.""" | |
| from src.tools.pubmed import PubMedTool | |
| tool = PubMedTool() | |
| results = tool._parse_pubmed_xml(SAMPLE_PUBMED_XML) | |
| assert len(results) == 1 | |
| assert results[0].citation.source == "pubmed" | |
| assert "Smith John" in results[0].citation.authors | |
| ``` | |
| ### Test File: `tests/unit/tools/test_websearch.py` | |
| ```python | |
| """Unit tests for WebTool.""" | |
| import pytest | |
| from unittest.mock import MagicMock | |
| class TestWebTool: | |
| """Tests for WebTool.""" | |
| @pytest.mark.asyncio | |
| async def test_search_returns_evidence(self, mocker): | |
| """WebTool should return Evidence objects from search.""" | |
| from src.tools.websearch import WebTool | |
| mock_results = [ | |
| { | |
| "title": "Drug Repurposing Article", | |
| "href": "https://example.com/article", | |
| "body": "Some content about drug repurposing...", | |
| } | |
| ] | |
| mock_ddgs = MagicMock() | |
| mock_ddgs.__enter__ = MagicMock(return_value=mock_ddgs) | |
| mock_ddgs.__exit__ = MagicMock(return_value=None) | |
| mock_ddgs.text = MagicMock(return_value=mock_results) | |
| mocker.patch("src.tools.websearch.DDGS", return_value=mock_ddgs) | |
| tool = WebTool() | |
| results = await tool.search("drug repurposing") | |
| assert len(results) == 1 | |
| assert results[0].citation.source == "web" | |
| assert "Drug Repurposing" in results[0].citation.title | |
| ``` | |
| ### Test File: `tests/unit/tools/test_search_handler.py` | |
| ```python | |
| """Unit tests for SearchHandler.""" | |
| import pytest | |
| from unittest.mock import AsyncMock | |
| from src.utils.models import Evidence, Citation | |
| from src.utils.exceptions import SearchError | |
| class TestSearchHandler: | |
| """Tests for SearchHandler.""" | |
| @pytest.mark.asyncio | |
| async def test_execute_aggregates_results(self): | |
| """SearchHandler should aggregate results from all tools.""" | |
| from src.tools.search_handler import SearchHandler | |
| # Create mock tools | |
| mock_tool_1 = AsyncMock() | |
| mock_tool_1.name = "mock1" | |
| mock_tool_1.search = AsyncMock(return_value=[ | |
| Evidence( | |
| content="Result 1", | |
| citation=Citation(source="pubmed", title="T1", url="u1", date="2024"), | |
| ) | |
| ]) | |
| mock_tool_2 = AsyncMock() | |
| mock_tool_2.name = "mock2" | |
| mock_tool_2.search = AsyncMock(return_value=[ | |
| Evidence( | |
| content="Result 2", | |
| citation=Citation(source="web", title="T2", url="u2", date="2024"), | |
| ) | |
| ]) | |
| handler = SearchHandler(tools=[mock_tool_1, mock_tool_2]) | |
| result = await handler.execute("test query") | |
| assert result.total_found == 2 | |
| assert "mock1" in result.sources_searched | |
| assert "mock2" in result.sources_searched | |
| assert len(result.errors) == 0 | |
| @pytest.mark.asyncio | |
| async def test_execute_handles_tool_failure(self): | |
| """SearchHandler should continue if one tool fails.""" | |
| from src.tools.search_handler import SearchHandler | |
| mock_tool_ok = AsyncMock() | |
| mock_tool_ok.name = "ok_tool" | |
| mock_tool_ok.search = AsyncMock(return_value=[ | |
| Evidence( | |
| content="Good result", | |
| citation=Citation(source="pubmed", title="T", url="u", date="2024"), | |
| ) | |
| ]) | |
| mock_tool_fail = AsyncMock() | |
| mock_tool_fail.name = "fail_tool" | |
| mock_tool_fail.search = AsyncMock(side_effect=SearchError("API down")) | |
| handler = SearchHandler(tools=[mock_tool_ok, mock_tool_fail]) | |
| result = await handler.execute("test") | |
| assert result.total_found == 1 | |
| assert "ok_tool" in result.sources_searched | |
| assert len(result.errors) == 1 | |
| assert "fail_tool" in result.errors[0] | |
| ``` | |
| --- | |
| ## 7. Integration Test (Optional, Real API) | |
| ```python | |
| # tests/integration/test_pubmed_live.py | |
| """Integration tests that hit real APIs (run manually).""" | |
| import pytest | |
| @pytest.mark.integration | |
| @pytest.mark.slow | |
| @pytest.mark.asyncio | |
| async def test_pubmed_live_search(): | |
| """Test real PubMed search (requires network).""" | |
| from src.tools.pubmed import PubMedTool | |
| tool = PubMedTool() | |
| results = await tool.search("metformin diabetes", max_results=3) | |
| assert len(results) > 0 | |
| assert results[0].citation.source == "pubmed" | |
| assert "pubmed.ncbi.nlm.nih.gov" in results[0].citation.url | |
| # Run with: uv run pytest tests/integration -m integration | |
| ``` | |
| --- | |
| ## 8. Implementation Checklist | |
| - [ ] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult) | |
| - [ ] Create `src/tools/__init__.py` with SearchTool Protocol and exports | |
| - [ ] Implement `src/tools/pubmed.py` with PubMedTool class | |
| - [ ] Implement `src/tools/websearch.py` with WebTool class | |
| - [ ] Create `src/tools/search_handler.py` with SearchHandler class | |
| - [ ] Write tests in `tests/unit/tools/test_pubmed.py` | |
| - [ ] Write tests in `tests/unit/tools/test_websearch.py` | |
| - [ ] Write tests in `tests/unit/tools/test_search_handler.py` | |
| - [ ] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS** | |
| - [ ] (Optional) Run integration test: `uv run pytest -m integration` | |
| - [ ] Commit: `git commit -m "feat: phase 2 search slice complete"` | |
| --- | |
| ## 9. Definition of Done | |
| Phase 2 is **COMPLETE** when: | |
| 1. All unit tests pass: `uv run pytest tests/unit/tools/ -v` | |
| 2. `SearchHandler` can execute with both tools | |
| 3. Graceful degradation: if PubMed fails, WebTool results still return | |
| 4. Rate limiting is enforced (verify no 429 errors) | |
| 5. Can run this in Python REPL: | |
| ```python | |
| import asyncio | |
| from src.tools.pubmed import PubMedTool | |
| from src.tools.websearch import WebTool | |
| from src.tools.search_handler import SearchHandler | |
| async def test(): | |
| handler = SearchHandler([PubMedTool(), WebTool()]) | |
| result = await handler.execute("metformin alzheimer") | |
| print(f"Found {result.total_found} results") | |
| for e in result.evidence[:3]: | |
| print(f"- {e.citation.title}") | |
| asyncio.run(test()) | |
| ``` | |
| **Proceed to Phase 3 ONLY after all checkboxes are complete.** | |