Spaces:
Running
Running
| # Phase 13 Implementation Spec: Modal Pipeline Integration | |
| **Goal**: Wire existing Modal code execution into the agent pipeline. | |
| **Philosophy**: "Sandboxed execution makes AI-generated code trustworthy." | |
| **Prerequisite**: Phase 12 complete (MCP server working) | |
| **Priority**: P1 - HIGH VALUE ($2,500 Modal Innovation Award) | |
| **Estimated Time**: 2-3 hours | |
| --- | |
| ## 1. Why Modal Integration? | |
| ### Current State Analysis | |
| Mario already implemented `src/tools/code_execution.py`: | |
| | Component | Status | Notes | | |
| |-----------|--------|-------| | |
| | `ModalCodeExecutor` class | Built | Executes Python in Modal sandbox | | |
| | `SANDBOX_LIBRARIES` | Defined | pandas, numpy, scipy, etc. | | |
| | `execute()` method | Implemented | Stdout/stderr capture | | |
| | `execute_with_return()` | Implemented | Returns `result` variable | | |
| | `AnalysisAgent` | Built | Uses Modal for statistical analysis | | |
| | **Pipeline Integration** | **MISSING** | Not wired into main orchestrator | | |
| ### What's Missing | |
| ```text | |
| Current Flow: | |
| User Query β Orchestrator β Search β Judge β [Report] β Done | |
| With Modal: | |
| User Query β Orchestrator β Search β Judge β [Analysis*] β Report β Done | |
| β | |
| Modal Sandbox Execution | |
| ``` | |
| *The AnalysisAgent exists but is NOT called by either orchestrator. | |
| --- | |
| ## 2. Critical Dependency Analysis | |
| ### The Problem (Senior Feedback) | |
| ```python | |
| # src/agents/analysis_agent.py - Line 8 | |
| from agent_framework import ( | |
| AgentRunResponse, | |
| BaseAgent, | |
| ... | |
| ) | |
| ``` | |
| ```toml | |
| # pyproject.toml - agent-framework is OPTIONAL | |
| [project.optional-dependencies] | |
| magentic = [ | |
| "agent-framework-core", | |
| ] | |
| ``` | |
| **If we import `AnalysisAgent` in the simple orchestrator without the `magentic` extra installed, the app CRASHES on startup.** | |
| ### The SOLID Solution | |
| **Single Responsibility Principle**: Decouple Modal execution logic from `agent_framework`. | |
| ```text | |
| BEFORE (Coupled): | |
| AnalysisAgent (requires agent_framework) | |
| β | |
| ModalCodeExecutor | |
| AFTER (Decoupled): | |
| StatisticalAnalyzer (no agent_framework dependency) β Simple mode uses this | |
| β | |
| ModalCodeExecutor | |
| β | |
| AnalysisAgent (wraps StatisticalAnalyzer) β Magentic mode uses this | |
| ``` | |
| **Key insight**: Create `src/services/statistical_analyzer.py` with ZERO agent_framework imports. | |
| --- | |
| ## 3. Prize Opportunity | |
| ### Modal Innovation Award: $2,500 | |
| **Judging Criteria**: | |
| 1. **Sandbox Isolation** - Code runs in container, not local | |
| 2. **Scientific Computing** - Real pandas/scipy analysis | |
| 3. **Safety** - Can't access local filesystem | |
| 4. **Speed** - Modal's fast cold starts | |
| ### What We Need to Show | |
| ```python | |
| # LLM generates analysis code | |
| code = """ | |
| import pandas as pd | |
| import scipy.stats as stats | |
| data = pd.DataFrame({ | |
| 'study': ['Study1', 'Study2', 'Study3'], | |
| 'effect_size': [0.45, 0.52, 0.38], | |
| 'sample_size': [120, 85, 200] | |
| }) | |
| weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum() | |
| t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0) | |
| print(f"Weighted Effect Size: {weighted_mean:.3f}") | |
| print(f"P-value: {p_value:.4f}") | |
| result = "SUPPORTED" if p_value < 0.05 else "INCONCLUSIVE" | |
| """ | |
| # Executed SAFELY in Modal sandbox | |
| executor = get_code_executor() | |
| output = executor.execute(code) # Runs in isolated container! | |
| ``` | |
| --- | |
| ## 4. Technical Specification | |
| ### 4.1 Dependencies | |
| ```toml | |
| # pyproject.toml - NO CHANGES to dependencies | |
| # StatisticalAnalyzer uses only: | |
| # - pydantic-ai (already in main deps) | |
| # - modal (already in main deps) | |
| # - src.tools.code_execution (no agent_framework) | |
| ``` | |
| ### 4.2 Environment Variables | |
| ```bash | |
| # .env | |
| MODAL_TOKEN_ID=your-token-id | |
| MODAL_TOKEN_SECRET=your-token-secret | |
| ``` | |
| ### 4.3 Integration Points | |
| | Integration Point | File | Change Required | | |
| |-------------------|------|-----------------| | |
| | New Service | `src/services/statistical_analyzer.py` | CREATE (no agent_framework) | | |
| | Simple Orchestrator | `src/orchestrator.py` | Use `StatisticalAnalyzer` | | |
| | Config | `src/utils/config.py` | Add `enable_modal_analysis` setting | | |
| | AnalysisAgent | `src/agents/analysis_agent.py` | Refactor to wrap `StatisticalAnalyzer` | | |
| | MCP Tool | `src/mcp_tools.py` | Add `analyze_hypothesis` tool | | |
| --- | |
| ## 5. Implementation | |
| ### 5.1 Configuration Update (`src/utils/config.py`) | |
| ```python | |
| class Settings(BaseSettings): | |
| # ... existing settings ... | |
| # Modal Configuration | |
| modal_token_id: str | None = None | |
| modal_token_secret: str | None = None | |
| enable_modal_analysis: bool = False # Opt-in for hackathon demo | |
| @property | |
| def modal_available(self) -> bool: | |
| """Check if Modal credentials are configured.""" | |
| return bool(self.modal_token_id and self.modal_token_secret) | |
| ``` | |
| ### 5.2 StatisticalAnalyzer Service (`src/services/statistical_analyzer.py`) | |
| **This is the key fix - NO agent_framework imports.** | |
| ```python | |
| """Statistical analysis service using Modal code execution. | |
| This module provides Modal-based statistical analysis WITHOUT depending on | |
| agent_framework. This allows it to be used in the simple orchestrator mode | |
| without requiring the magentic optional dependency. | |
| The AnalysisAgent (in src/agents/) wraps this service for magentic mode. | |
| """ | |
| import asyncio | |
| import re | |
| from functools import partial | |
| from typing import Any | |
| from pydantic import BaseModel, Field | |
| from pydantic_ai import Agent | |
| from src.agent_factory.judges import get_model | |
| from src.tools.code_execution import ( | |
| CodeExecutionError, | |
| get_code_executor, | |
| get_sandbox_library_prompt, | |
| ) | |
| from src.utils.models import Evidence | |
| class AnalysisResult(BaseModel): | |
| """Result of statistical analysis.""" | |
| verdict: str = Field( | |
| description="SUPPORTED, REFUTED, or INCONCLUSIVE", | |
| ) | |
| confidence: float = Field(ge=0.0, le=1.0, description="Confidence in verdict (0-1)") | |
| statistical_evidence: str = Field( | |
| description="Summary of statistical findings from code execution" | |
| ) | |
| code_generated: str = Field(description="Python code that was executed") | |
| execution_output: str = Field(description="Output from code execution") | |
| key_findings: list[str] = Field(default_factory=list, description="Key takeaways") | |
| limitations: list[str] = Field(default_factory=list, description="Limitations") | |
| class StatisticalAnalyzer: | |
| """Performs statistical analysis using Modal code execution. | |
| This service: | |
| 1. Generates Python code for statistical analysis using LLM | |
| 2. Executes code in Modal sandbox | |
| 3. Interprets results | |
| 4. Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE) | |
| Note: This class has NO agent_framework dependency, making it safe | |
| to use in the simple orchestrator without the magentic extra. | |
| """ | |
| def __init__(self) -> None: | |
| """Initialize the analyzer.""" | |
| self._code_executor: Any = None | |
| self._agent: Agent[None, str] | None = None | |
| def _get_code_executor(self) -> Any: | |
| """Lazy initialization of code executor.""" | |
| if self._code_executor is None: | |
| self._code_executor = get_code_executor() | |
| return self._code_executor | |
| def _get_agent(self) -> Agent[None, str]: | |
| """Lazy initialization of LLM agent for code generation.""" | |
| if self._agent is None: | |
| library_versions = get_sandbox_library_prompt() | |
| self._agent = Agent( | |
| model=get_model(), | |
| output_type=str, | |
| system_prompt=f"""You are a biomedical data scientist. | |
| Generate Python code to analyze research evidence and test hypotheses. | |
| Guidelines: | |
| 1. Use pandas, numpy, scipy.stats for analysis | |
| 2. Print clear, interpretable results | |
| 3. Include statistical tests (t-tests, chi-square, etc.) | |
| 4. Calculate effect sizes and confidence intervals | |
| 5. Keep code concise (<50 lines) | |
| 6. Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE | |
| Available libraries: | |
| {library_versions} | |
| Output format: Return ONLY executable Python code, no explanations.""", | |
| ) | |
| return self._agent | |
| async def analyze( | |
| self, | |
| query: str, | |
| evidence: list[Evidence], | |
| hypothesis: dict[str, Any] | None = None, | |
| ) -> AnalysisResult: | |
| """Run statistical analysis on evidence. | |
| Args: | |
| query: The research question | |
| evidence: List of Evidence objects to analyze | |
| hypothesis: Optional hypothesis dict with drug, target, pathway, effect | |
| Returns: | |
| AnalysisResult with verdict and statistics | |
| """ | |
| # Build analysis prompt | |
| evidence_summary = self._summarize_evidence(evidence[:10]) | |
| hypothesis_text = "" | |
| if hypothesis: | |
| hypothesis_text = f""" | |
| Hypothesis: {hypothesis.get('drug', 'Unknown')} β {hypothesis.get('target', '?')} β {hypothesis.get('pathway', '?')} β {hypothesis.get('effect', '?')} | |
| Confidence: {hypothesis.get('confidence', 0.5):.0%} | |
| """ | |
| prompt = f"""Generate Python code to statistically analyze: | |
| **Research Question**: {query} | |
| {hypothesis_text} | |
| **Evidence Summary**: | |
| {evidence_summary} | |
| Generate executable Python code to analyze this evidence.""" | |
| try: | |
| # Generate code | |
| agent = self._get_agent() | |
| code_result = await agent.run(prompt) | |
| generated_code = code_result.output | |
| # Execute in Modal sandbox | |
| loop = asyncio.get_running_loop() | |
| executor = self._get_code_executor() | |
| execution = await loop.run_in_executor( | |
| None, partial(executor.execute, generated_code, timeout=120) | |
| ) | |
| if not execution["success"]: | |
| return AnalysisResult( | |
| verdict="INCONCLUSIVE", | |
| confidence=0.0, | |
| statistical_evidence=f"Execution failed: {execution['error']}", | |
| code_generated=generated_code, | |
| execution_output=execution.get("stderr", ""), | |
| key_findings=[], | |
| limitations=["Code execution failed"], | |
| ) | |
| # Interpret results | |
| return self._interpret_results(generated_code, execution) | |
| except CodeExecutionError as e: | |
| return AnalysisResult( | |
| verdict="INCONCLUSIVE", | |
| confidence=0.0, | |
| statistical_evidence=str(e), | |
| code_generated="", | |
| execution_output="", | |
| key_findings=[], | |
| limitations=[f"Analysis error: {e}"], | |
| ) | |
| def _summarize_evidence(self, evidence: list[Evidence]) -> str: | |
| """Summarize evidence for code generation prompt.""" | |
| if not evidence: | |
| return "No evidence available." | |
| lines = [] | |
| for i, ev in enumerate(evidence[:5], 1): | |
| lines.append(f"{i}. {ev.content[:200]}...") | |
| lines.append(f" Source: {ev.citation.title}") | |
| lines.append(f" Relevance: {ev.relevance:.0%}\n") | |
| return "\n".join(lines) | |
| def _interpret_results( | |
| self, | |
| code: str, | |
| execution: dict[str, Any], | |
| ) -> AnalysisResult: | |
| """Interpret code execution results.""" | |
| stdout = execution["stdout"] | |
| stdout_upper = stdout.upper() | |
| # Extract verdict with robust word-boundary matching | |
| verdict = "INCONCLUSIVE" | |
| if re.search(r"\bSUPPORTED\b", stdout_upper) and not re.search( | |
| r"\b(?:NOT|UN)SUPPORTED\b", stdout_upper | |
| ): | |
| verdict = "SUPPORTED" | |
| elif re.search(r"\bREFUTED\b", stdout_upper): | |
| verdict = "REFUTED" | |
| # Extract key findings | |
| key_findings = [] | |
| for line in stdout.split("\n"): | |
| line_lower = line.lower() | |
| if any(kw in line_lower for kw in ["p-value", "significant", "effect", "mean"]): | |
| key_findings.append(line.strip()) | |
| # Calculate confidence from p-values | |
| confidence = self._calculate_confidence(stdout) | |
| return AnalysisResult( | |
| verdict=verdict, | |
| confidence=confidence, | |
| statistical_evidence=stdout.strip(), | |
| code_generated=code, | |
| execution_output=stdout, | |
| key_findings=key_findings[:5], | |
| limitations=[ | |
| "Analysis based on summary data only", | |
| "Limited to available evidence", | |
| "Statistical tests assume data independence", | |
| ], | |
| ) | |
| def _calculate_confidence(self, output: str) -> float: | |
| """Calculate confidence based on statistical results.""" | |
| p_values = re.findall(r"p[-\s]?value[:\s]+(\d+\.?\d*)", output.lower()) | |
| if p_values: | |
| try: | |
| min_p = min(float(p) for p in p_values) | |
| if min_p < 0.001: | |
| return 0.95 | |
| elif min_p < 0.01: | |
| return 0.90 | |
| elif min_p < 0.05: | |
| return 0.80 | |
| else: | |
| return 0.60 | |
| except ValueError: | |
| pass | |
| return 0.70 # Default | |
| # Singleton for reuse | |
| _analyzer: StatisticalAnalyzer | None = None | |
| def get_statistical_analyzer() -> StatisticalAnalyzer: | |
| """Get or create singleton StatisticalAnalyzer instance.""" | |
| global _analyzer | |
| if _analyzer is None: | |
| _analyzer = StatisticalAnalyzer() | |
| return _analyzer | |
| ``` | |
| ### 5.3 Simple Orchestrator Update (`src/orchestrator.py`) | |
| **Uses `StatisticalAnalyzer` directly - NO agent_framework import.** | |
| ```python | |
| """Main orchestrator with optional Modal analysis.""" | |
| from src.utils.config import settings | |
| # ... existing imports ... | |
| class Orchestrator: | |
| """Search-Judge-Analyze orchestration loop.""" | |
| def __init__( | |
| self, | |
| search_handler: SearchHandlerProtocol, | |
| judge_handler: JudgeHandlerProtocol, | |
| config: OrchestratorConfig | None = None, | |
| enable_analysis: bool = False, # New parameter | |
| ) -> None: | |
| self.search = search_handler | |
| self.judge = judge_handler | |
| self.config = config or OrchestratorConfig() | |
| self.history: list[dict[str, Any]] = [] | |
| self._enable_analysis = enable_analysis and settings.modal_available | |
| # Lazy-load analysis (NO agent_framework dependency!) | |
| self._analyzer: Any = None | |
| def _get_analyzer(self) -> Any: | |
| """Lazy initialization of StatisticalAnalyzer. | |
| Note: This imports from src.services, NOT src.agents, | |
| so it works without the magentic optional dependency. | |
| """ | |
| if self._analyzer is None: | |
| from src.services.statistical_analyzer import get_statistical_analyzer | |
| self._analyzer = get_statistical_analyzer() | |
| return self._analyzer | |
| async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]: | |
| """Main orchestration loop with optional Modal analysis.""" | |
| # ... existing search/judge loop ... | |
| # After judge says "synthesize", optionally run analysis | |
| if self._enable_analysis and assessment.recommendation == "synthesize": | |
| yield AgentEvent( | |
| type="analyzing", | |
| message="Running statistical analysis in Modal sandbox...", | |
| data={}, | |
| iteration=iteration, | |
| ) | |
| try: | |
| analyzer = self._get_analyzer() | |
| # Run Modal analysis (no agent_framework needed!) | |
| analysis_result = await analyzer.analyze( | |
| query=query, | |
| evidence=all_evidence, | |
| hypothesis=None, # Could add hypothesis generation later | |
| ) | |
| yield AgentEvent( | |
| type="analysis_complete", | |
| message=f"Analysis verdict: {analysis_result.verdict}", | |
| data=analysis_result.model_dump(), | |
| iteration=iteration, | |
| ) | |
| except Exception as e: | |
| yield AgentEvent( | |
| type="error", | |
| message=f"Modal analysis failed: {e}", | |
| data={"error": str(e)}, | |
| iteration=iteration, | |
| ) | |
| # Continue to synthesis... | |
| ``` | |
| ### 5.4 Refactor AnalysisAgent (`src/agents/analysis_agent.py`) | |
| **Wrap `StatisticalAnalyzer` for magentic mode.** | |
| ```python | |
| """Analysis agent for statistical analysis using Modal code execution. | |
| This agent wraps StatisticalAnalyzer for use in magentic multi-agent mode. | |
| The core logic is in src/services/statistical_analyzer.py to avoid | |
| coupling agent_framework to the simple orchestrator. | |
| """ | |
| from collections.abc import AsyncIterable | |
| from typing import TYPE_CHECKING, Any | |
| from agent_framework import ( | |
| AgentRunResponse, | |
| AgentRunResponseUpdate, | |
| AgentThread, | |
| BaseAgent, | |
| ChatMessage, | |
| Role, | |
| ) | |
| from src.services.statistical_analyzer import ( | |
| AnalysisResult, | |
| get_statistical_analyzer, | |
| ) | |
| from src.utils.models import Evidence | |
| if TYPE_CHECKING: | |
| from src.services.embeddings import EmbeddingService | |
| class AnalysisAgent(BaseAgent): # type: ignore[misc] | |
| """Wraps StatisticalAnalyzer for magentic multi-agent mode.""" | |
| def __init__( | |
| self, | |
| evidence_store: dict[str, Any], | |
| embedding_service: "EmbeddingService | None" = None, | |
| ) -> None: | |
| super().__init__( | |
| name="AnalysisAgent", | |
| description="Performs statistical analysis using Modal sandbox", | |
| ) | |
| self._evidence_store = evidence_store | |
| self._embeddings = embedding_service | |
| self._analyzer = get_statistical_analyzer() | |
| async def run( | |
| self, | |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, | |
| *, | |
| thread: AgentThread | None = None, | |
| **kwargs: Any, | |
| ) -> AgentRunResponse: | |
| """Analyze evidence and return verdict.""" | |
| query = self._extract_query(messages) | |
| hypotheses = self._evidence_store.get("hypotheses", []) | |
| evidence = self._evidence_store.get("current", []) | |
| if not evidence: | |
| return self._error_response("No evidence available.") | |
| # Get primary hypothesis if available | |
| hypothesis_dict = None | |
| if hypotheses: | |
| h = hypotheses[0] | |
| hypothesis_dict = { | |
| "drug": getattr(h, "drug", "Unknown"), | |
| "target": getattr(h, "target", "?"), | |
| "pathway": getattr(h, "pathway", "?"), | |
| "effect": getattr(h, "effect", "?"), | |
| "confidence": getattr(h, "confidence", 0.5), | |
| } | |
| # Delegate to StatisticalAnalyzer | |
| result = await self._analyzer.analyze( | |
| query=query, | |
| evidence=evidence, | |
| hypothesis=hypothesis_dict, | |
| ) | |
| # Store in shared context | |
| self._evidence_store["analysis"] = result.model_dump() | |
| # Format response | |
| response_text = self._format_response(result) | |
| return AgentRunResponse( | |
| messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)], | |
| response_id=f"analysis-{result.verdict.lower()}", | |
| additional_properties={"analysis": result.model_dump()}, | |
| ) | |
| def _format_response(self, result: AnalysisResult) -> str: | |
| """Format analysis result as markdown.""" | |
| lines = [ | |
| "## Statistical Analysis Complete\n", | |
| f"### Verdict: **{result.verdict}**", | |
| f"**Confidence**: {result.confidence:.0%}\n", | |
| "### Key Findings", | |
| ] | |
| for finding in result.key_findings: | |
| lines.append(f"- {finding}") | |
| lines.extend([ | |
| "\n### Statistical Evidence", | |
| "```", | |
| result.statistical_evidence, | |
| "```", | |
| ]) | |
| return "\n".join(lines) | |
| def _error_response(self, message: str) -> AgentRunResponse: | |
| """Create error response.""" | |
| return AgentRunResponse( | |
| messages=[ChatMessage(role=Role.ASSISTANT, text=f"**Error**: {message}")], | |
| response_id="analysis-error", | |
| ) | |
| def _extract_query( | |
| self, messages: str | ChatMessage | list[str] | list[ChatMessage] | None | |
| ) -> str: | |
| """Extract query from messages.""" | |
| if isinstance(messages, str): | |
| return messages | |
| elif isinstance(messages, ChatMessage): | |
| return messages.text or "" | |
| elif isinstance(messages, list): | |
| for msg in reversed(messages): | |
| if isinstance(msg, ChatMessage) and msg.role == Role.USER: | |
| return msg.text or "" | |
| elif isinstance(msg, str): | |
| return msg | |
| return "" | |
| async def run_stream( | |
| self, | |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, | |
| *, | |
| thread: AgentThread | None = None, | |
| **kwargs: Any, | |
| ) -> AsyncIterable[AgentRunResponseUpdate]: | |
| """Streaming wrapper.""" | |
| result = await self.run(messages, thread=thread, **kwargs) | |
| yield AgentRunResponseUpdate(messages=result.messages, response_id=result.response_id) | |
| ``` | |
| ### 5.5 MCP Tool for Modal Analysis (`src/mcp_tools.py`) | |
| Add to existing MCP tools: | |
| ```python | |
| async def analyze_hypothesis( | |
| drug: str, | |
| condition: str, | |
| evidence_summary: str, | |
| ) -> str: | |
| """Perform statistical analysis of drug repurposing hypothesis using Modal. | |
| Executes AI-generated Python code in a secure Modal sandbox to analyze | |
| the statistical evidence for a drug repurposing hypothesis. | |
| Args: | |
| drug: The drug being evaluated (e.g., "metformin") | |
| condition: The target condition (e.g., "Alzheimer's disease") | |
| evidence_summary: Summary of evidence to analyze | |
| Returns: | |
| Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics | |
| """ | |
| from src.services.statistical_analyzer import get_statistical_analyzer | |
| from src.utils.config import settings | |
| from src.utils.models import Citation, Evidence | |
| if not settings.modal_available: | |
| return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET." | |
| # Create evidence from summary | |
| evidence = [ | |
| Evidence( | |
| content=evidence_summary, | |
| citation=Citation( | |
| source="pubmed", | |
| title=f"Evidence for {drug} in {condition}", | |
| url="https://example.com", | |
| date="2024-01-01", | |
| authors=["User Provided"], | |
| ), | |
| relevance=0.9, | |
| ) | |
| ] | |
| analyzer = get_statistical_analyzer() | |
| result = await analyzer.analyze( | |
| query=f"Can {drug} treat {condition}?", | |
| evidence=evidence, | |
| hypothesis={"drug": drug, "target": "unknown", "pathway": "unknown", "effect": condition}, | |
| ) | |
| return f"""## Statistical Analysis: {drug} for {condition} | |
| ### Verdict: **{result.verdict}** | |
| **Confidence**: {result.confidence:.0%} | |
| ### Key Findings | |
| {chr(10).join(f"- {f}" for f in result.key_findings) or "- No specific findings extracted"} | |
| ### Execution Output | |
| ``` | |
| {result.execution_output} | |
| ``` | |
| ### Generated Code | |
| ```python | |
| {result.code_generated} | |
| ``` | |
| **Executed in Modal Sandbox** - Isolated, secure, reproducible. | |
| """ | |
| ``` | |
| ### 5.6 Demo Scripts | |
| #### `examples/modal_demo/verify_sandbox.py` | |
| ```python | |
| #!/usr/bin/env python3 | |
| """Verify that Modal sandbox is properly isolated. | |
| This script proves to judges that code runs in Modal, not locally. | |
| NO agent_framework dependency - uses only src.tools.code_execution. | |
| Usage: | |
| uv run python examples/modal_demo/verify_sandbox.py | |
| """ | |
| import asyncio | |
| from functools import partial | |
| from src.tools.code_execution import get_code_executor | |
| from src.utils.config import settings | |
| async def main() -> None: | |
| """Verify Modal sandbox isolation.""" | |
| if not settings.modal_available: | |
| print("Error: Modal credentials not configured.") | |
| print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env") | |
| return | |
| executor = get_code_executor() | |
| loop = asyncio.get_running_loop() | |
| print("=" * 60) | |
| print("Modal Sandbox Isolation Verification") | |
| print("=" * 60 + "\n") | |
| # Test 1: Hostname | |
| print("Test 1: Check hostname (should NOT be your machine)") | |
| code1 = "import socket; print(f'Hostname: {socket.gethostname()}')" | |
| result1 = await loop.run_in_executor(None, partial(executor.execute, code1)) | |
| print(f" {result1['stdout'].strip()}\n") | |
| # Test 2: Scientific libraries | |
| print("Test 2: Verify scientific libraries") | |
| code2 = """ | |
| import pandas as pd | |
| import numpy as np | |
| import scipy | |
| print(f"pandas: {pd.__version__}") | |
| print(f"numpy: {np.__version__}") | |
| print(f"scipy: {scipy.__version__}") | |
| """ | |
| result2 = await loop.run_in_executor(None, partial(executor.execute, code2)) | |
| print(f" {result2['stdout'].strip()}\n") | |
| # Test 3: Network blocked | |
| print("Test 3: Verify network isolation") | |
| code3 = """ | |
| import urllib.request | |
| try: | |
| urllib.request.urlopen("https://google.com", timeout=2) | |
| print("Network: ALLOWED (unexpected!)") | |
| except Exception: | |
| print("Network: BLOCKED (as expected)") | |
| """ | |
| result3 = await loop.run_in_executor(None, partial(executor.execute, code3)) | |
| print(f" {result3['stdout'].strip()}\n") | |
| # Test 4: Real statistics | |
| print("Test 4: Execute statistical analysis") | |
| code4 = """ | |
| import pandas as pd | |
| import scipy.stats as stats | |
| data = pd.DataFrame({'effect': [0.42, 0.38, 0.51]}) | |
| mean = data['effect'].mean() | |
| t_stat, p_val = stats.ttest_1samp(data['effect'], 0) | |
| print(f"Mean Effect: {mean:.3f}") | |
| print(f"P-value: {p_val:.4f}") | |
| print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}") | |
| """ | |
| result4 = await loop.run_in_executor(None, partial(executor.execute, code4)) | |
| print(f" {result4['stdout'].strip()}\n") | |
| print("=" * 60) | |
| print("All tests complete - Modal sandbox verified!") | |
| print("=" * 60) | |
| if __name__ == "__main__": | |
| asyncio.run(main()) | |
| ``` | |
| #### `examples/modal_demo/run_analysis.py` | |
| ```python | |
| #!/usr/bin/env python3 | |
| """Demo: Modal-powered statistical analysis. | |
| This script uses StatisticalAnalyzer directly (NO agent_framework dependency). | |
| Usage: | |
| uv run python examples/modal_demo/run_analysis.py "metformin alzheimer" | |
| """ | |
| import argparse | |
| import asyncio | |
| import os | |
| import sys | |
| from src.services.statistical_analyzer import get_statistical_analyzer | |
| from src.tools.pubmed import PubMedTool | |
| from src.utils.config import settings | |
| async def main() -> None: | |
| """Run the Modal analysis demo.""" | |
| parser = argparse.ArgumentParser(description="Modal Analysis Demo") | |
| parser.add_argument("query", help="Research query") | |
| args = parser.parse_args() | |
| if not settings.modal_available: | |
| print("Error: Modal credentials not configured.") | |
| sys.exit(1) | |
| if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")): | |
| print("Error: No LLM API key found.") | |
| sys.exit(1) | |
| print(f"\n{'=' * 60}") | |
| print("DeepCritical Modal Analysis Demo") | |
| print(f"Query: {args.query}") | |
| print(f"{'=' * 60}\n") | |
| # Step 1: Gather Evidence | |
| print("Step 1: Gathering evidence from PubMed...") | |
| pubmed = PubMedTool() | |
| evidence = await pubmed.search(args.query, max_results=5) | |
| print(f" Found {len(evidence)} papers\n") | |
| # Step 2: Run Modal Analysis | |
| print("Step 2: Running statistical analysis in Modal sandbox...") | |
| analyzer = get_statistical_analyzer() | |
| result = await analyzer.analyze(query=args.query, evidence=evidence) | |
| # Step 3: Display Results | |
| print("\n" + "=" * 60) | |
| print("ANALYSIS RESULTS") | |
| print("=" * 60) | |
| print(f"\nVerdict: {result.verdict}") | |
| print(f"Confidence: {result.confidence:.0%}") | |
| print("\nKey Findings:") | |
| for finding in result.key_findings: | |
| print(f" - {finding}") | |
| print("\n[Demo Complete - Code executed in Modal, not locally]") | |
| if __name__ == "__main__": | |
| asyncio.run(main()) | |
| ``` | |
| --- | |
| ## 6. TDD Test Suite | |
| ### 6.1 Unit Tests (`tests/unit/services/test_statistical_analyzer.py`) | |
| ```python | |
| """Unit tests for StatisticalAnalyzer service.""" | |
| from unittest.mock import AsyncMock, MagicMock, patch | |
| import pytest | |
| from src.services.statistical_analyzer import ( | |
| AnalysisResult, | |
| StatisticalAnalyzer, | |
| get_statistical_analyzer, | |
| ) | |
| from src.utils.models import Citation, Evidence | |
| @pytest.fixture | |
| def sample_evidence() -> list[Evidence]: | |
| """Sample evidence for testing.""" | |
| return [ | |
| Evidence( | |
| content="Metformin shows effect size of 0.45.", | |
| citation=Citation( | |
| source="pubmed", | |
| title="Metformin Study", | |
| url="https://pubmed.ncbi.nlm.nih.gov/12345/", | |
| date="2024-01-15", | |
| authors=["Smith J"], | |
| ), | |
| relevance=0.9, | |
| ) | |
| ] | |
| class TestStatisticalAnalyzer: | |
| """Tests for StatisticalAnalyzer (no agent_framework dependency).""" | |
| def test_no_agent_framework_import(self) -> None: | |
| """StatisticalAnalyzer must NOT import agent_framework.""" | |
| import src.services.statistical_analyzer as module | |
| # Check module doesn't import agent_framework | |
| source = open(module.__file__).read() | |
| assert "agent_framework" not in source | |
| assert "BaseAgent" not in source | |
| @pytest.mark.asyncio | |
| async def test_analyze_returns_result( | |
| self, sample_evidence: list[Evidence] | |
| ) -> None: | |
| """analyze() should return AnalysisResult.""" | |
| analyzer = StatisticalAnalyzer() | |
| with patch.object(analyzer, "_get_agent") as mock_agent, \ | |
| patch.object(analyzer, "_get_code_executor") as mock_executor: | |
| # Mock LLM | |
| mock_agent.return_value.run = AsyncMock( | |
| return_value=MagicMock(output="print('SUPPORTED')") | |
| ) | |
| # Mock Modal | |
| mock_executor.return_value.execute.return_value = { | |
| "stdout": "SUPPORTED\np-value: 0.01", | |
| "stderr": "", | |
| "success": True, | |
| } | |
| result = await analyzer.analyze("test query", sample_evidence) | |
| assert isinstance(result, AnalysisResult) | |
| assert result.verdict == "SUPPORTED" | |
| def test_singleton(self) -> None: | |
| """get_statistical_analyzer should return singleton.""" | |
| a1 = get_statistical_analyzer() | |
| a2 = get_statistical_analyzer() | |
| assert a1 is a2 | |
| class TestAnalysisResult: | |
| """Tests for AnalysisResult model.""" | |
| def test_verdict_values(self) -> None: | |
| """Verdict should be one of the expected values.""" | |
| for verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]: | |
| result = AnalysisResult( | |
| verdict=verdict, | |
| confidence=0.8, | |
| statistical_evidence="test", | |
| code_generated="print('test')", | |
| execution_output="test", | |
| ) | |
| assert result.verdict == verdict | |
| def test_confidence_bounds(self) -> None: | |
| """Confidence must be 0.0-1.0.""" | |
| with pytest.raises(ValueError): | |
| AnalysisResult( | |
| verdict="SUPPORTED", | |
| confidence=1.5, # Invalid | |
| statistical_evidence="test", | |
| code_generated="test", | |
| execution_output="test", | |
| ) | |
| ``` | |
| ### 6.2 Integration Test (`tests/integration/test_modal.py`) | |
| ```python | |
| """Integration tests for Modal (requires credentials).""" | |
| import pytest | |
| from src.utils.config import settings | |
| @pytest.mark.integration | |
| @pytest.mark.skipif(not settings.modal_available, reason="Modal not configured") | |
| class TestModalIntegration: | |
| """Integration tests requiring Modal credentials.""" | |
| @pytest.mark.asyncio | |
| async def test_sandbox_executes_code(self) -> None: | |
| """Modal sandbox should execute Python code.""" | |
| import asyncio | |
| from functools import partial | |
| from src.tools.code_execution import get_code_executor | |
| executor = get_code_executor() | |
| code = "import pandas as pd; print(pd.DataFrame({'a': [1,2,3]})['a'].sum())" | |
| loop = asyncio.get_running_loop() | |
| result = await loop.run_in_executor( | |
| None, partial(executor.execute, code, timeout=30) | |
| ) | |
| assert result["success"] | |
| assert "6" in result["stdout"] | |
| @pytest.mark.asyncio | |
| async def test_statistical_analyzer_works(self) -> None: | |
| """StatisticalAnalyzer should work end-to-end.""" | |
| from src.services.statistical_analyzer import get_statistical_analyzer | |
| from src.utils.models import Citation, Evidence | |
| evidence = [ | |
| Evidence( | |
| content="Drug shows 40% improvement in trial.", | |
| citation=Citation( | |
| source="pubmed", | |
| title="Test", | |
| url="https://test.com", | |
| date="2024-01-01", | |
| authors=["Test"], | |
| ), | |
| relevance=0.9, | |
| ) | |
| ] | |
| analyzer = get_statistical_analyzer() | |
| result = await analyzer.analyze("test drug efficacy", evidence) | |
| assert result.verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"] | |
| assert 0.0 <= result.confidence <= 1.0 | |
| ``` | |
| --- | |
| ## 7. Verification Commands | |
| ```bash | |
| # 1. Verify NO agent_framework in StatisticalAnalyzer | |
| grep -r "agent_framework" src/services/statistical_analyzer.py | |
| # Should return nothing! | |
| # 2. Run unit tests (no Modal needed) | |
| uv run pytest tests/unit/services/test_statistical_analyzer.py -v | |
| # 3. Run verification script (requires Modal) | |
| uv run python examples/modal_demo/verify_sandbox.py | |
| # 4. Run analysis demo (requires Modal + LLM) | |
| uv run python examples/modal_demo/run_analysis.py "metformin alzheimer" | |
| # 5. Run integration tests | |
| uv run pytest tests/integration/test_modal.py -v -m integration | |
| # 6. Full test suite | |
| make check | |
| ``` | |
| --- | |
| ## 8. Definition of Done | |
| Phase 13 is **COMPLETE** when: | |
| - [ ] `src/services/statistical_analyzer.py` created (NO agent_framework) | |
| - [ ] `src/utils/config.py` has `enable_modal_analysis` setting | |
| - [ ] `src/orchestrator.py` uses `StatisticalAnalyzer` directly | |
| - [ ] `src/agents/analysis_agent.py` refactored to wrap `StatisticalAnalyzer` | |
| - [ ] `src/mcp_tools.py` has `analyze_hypothesis` tool | |
| - [ ] `examples/modal_demo/verify_sandbox.py` working | |
| - [ ] `examples/modal_demo/run_analysis.py` working | |
| - [ ] Unit tests pass WITHOUT magentic extra installed | |
| - [ ] Integration tests pass WITH Modal credentials | |
| - [ ] All lints pass | |
| --- | |
| ## 9. Architecture After Phase 13 | |
| ```text | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β MCP Clients β | |
| β (Claude Desktop, Cursor, etc.) β | |
| βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ | |
| β MCP Protocol | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Gradio App + MCP Server β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β MCP Tools: search_pubmed, search_trials, search_biorxiv β β | |
| β β search_all, analyze_hypothesis β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βββββββββββββββββββββ΄ββββββββββββββββββββ | |
| β β | |
| βΌ βΌ | |
| βββββββββββββββββββββββββ βββββββββββββββββββββββββββββ | |
| β Simple Orchestrator β β Magentic Orchestrator β | |
| β (no agent_framework) β β (with agent_framework) β | |
| β β β β | |
| β SearchHandler β β SearchAgent β | |
| β JudgeHandler β β JudgeAgent β | |
| β StatisticalAnalyzer ββΌβββββββββββββΌβ AnalysisAgent ββββββββββββ€ | |
| β β β (wraps StatisticalAnalyzer) | |
| βββββββββββββ¬ββββββββββββ βββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β StatisticalAnalyzer β | |
| β (src/services/statistical_analyzer.py) β | |
| β NO agent_framework dependency β | |
| β β | |
| β 1. Generate code with pydantic-ai β | |
| β 2. Execute in Modal sandbox β | |
| β 3. Return AnalysisResult β | |
| βββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βΌ | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Modal Sandbox β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β - pandas, numpy, scipy, sklearn, statsmodels β β | |
| β β - Network: BLOCKED β β | |
| β β - Filesystem: ISOLATED β β | |
| β β - Timeout: ENFORCED β β | |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| **This is the dependency-safe Modal stack.** | |
| --- | |
| ## 10. Files Summary | |
| | File | Action | Purpose | | |
| |------|--------|---------| | |
| | `src/services/statistical_analyzer.py` | **CREATE** | Core analysis (no agent_framework) | | |
| | `src/utils/config.py` | MODIFY | Add `enable_modal_analysis` | | |
| | `src/orchestrator.py` | MODIFY | Use `StatisticalAnalyzer` | | |
| | `src/agents/analysis_agent.py` | MODIFY | Wrap `StatisticalAnalyzer` | | |
| | `src/mcp_tools.py` | MODIFY | Add `analyze_hypothesis` | | |
| | `examples/modal_demo/verify_sandbox.py` | CREATE | Sandbox verification | | |
| | `examples/modal_demo/run_analysis.py` | CREATE | Demo script | | |
| | `tests/unit/services/test_statistical_analyzer.py` | CREATE | Unit tests | | |
| | `tests/integration/test_modal.py` | CREATE | Integration tests | | |
| **Key Fix**: `StatisticalAnalyzer` has ZERO agent_framework imports, making it safe for the simple orchestrator. | |