Spaces:
Running
feat: implement dual-mode architecture (Simple + Advanced) (#45)
Browse files* docs: add dual-mode architecture specification
Senior agent reviewed and approved. Key documents:
- 00_SITUATION_AND_PLAN.md: Problem analysis, branch states, recommended path
- 01_ARCHITECTURE_SPEC.md: Dual-mode architecture (Simple + Advanced)
- 02_IMPLEMENTATION_PHASES.md: 6-phase implementation plan
- 03_IMMEDIATE_ACTIONS.md: Quick reference checklist
Architecture: pydantic-ai (structured outputs) + Microsoft Agent Framework
(orchestration) are COMPLEMENTARY, not competing. Dual-mode allows
graceful degradation to free tier when no API keys available.
* docs: add follow-up review request for senior agent verification
* feat: implement dual-mode architecture (Simple + Advanced)
Phase 1 - Pydantic-AI Improvements (Simple Mode):
- Add HuggingFace provider support in judges.py with get_model()
- Add huggingface_model and hf_token config fields
- Tests in test_judges_factory.py
Phase 2 - Orchestrator Factory:
- Implement create_orchestrator() with auto-detection logic
- Simple mode for free tier, Advanced mode when OpenAI key present
- Lazy loading of MagenticOrchestrator to avoid hard dependency
- Tests in test_orchestrator_factory.py
Phase 3 - Agent Framework Integration:
- Use agent-framework-core from PyPI (Microsoft package)
- Verify imports work with test_agent_imports.py
Phase 4 - UI Updates:
- Rename "magentic" to "advanced" in app.py
- Update mode selection labels and descriptions
All 126 unit tests pass. Lint and type checks clean.
* fix: address CodeRabbit review feedback
- Add pytestmark to integration tests (integration, slow markers)
- Add pytestmark to unit tests (unit marker)
- Fix unused OpenAIChatClient import by adding assertion
- Update docs spec to match actual factory implementation
- Add code fence languages (text) to markdown blocks
Note: CodeRabbit incorrectly flagged has_openai_key as a method
when it's actually a @property that returns bool correctly.
All 126 unit tests pass.
* fix: address remaining CodeRabbit nitpicks
- Add 'text' language to ASCII diagram code blocks in docs
- Update Advanced Mode trigger description to clarify OpenAI-only
- Rename and clarify test_advanced_mode_explicit_instantiation
- Improve test docstring explaining explicit vs auto-detect path
All 128 tests pass.
- docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +189 -0
- docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +289 -0
- docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md +112 -0
- docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md +112 -0
- docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md +158 -0
- docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md +113 -0
- pyproject.toml +1 -1
- src/agent_factory/judges.py +8 -0
- src/app.py +11 -7
- src/orchestrator_factory.py +42 -15
- src/utils/config.py +14 -2
- tests/integration/test_dual_mode_e2e.py +82 -0
- tests/unit/agent_factory/test_judges_factory.py +64 -0
- tests/unit/agents/test_agent_imports.py +32 -0
- tests/unit/test_orchestrator_factory.py +66 -0
|
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
|
| 2 |
+
|
| 3 |
+
**Date:** November 27, 2025
|
| 4 |
+
**Status:** ACTIVE DECISION REQUIRED
|
| 5 |
+
**Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 1. The Problem
|
| 10 |
+
|
| 11 |
+
We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
|
| 12 |
+
|
| 13 |
+
**They are not.** They are complementary:
|
| 14 |
+
- **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
|
| 15 |
+
- **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## 2. Current Branch State
|
| 20 |
+
|
| 21 |
+
| Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
|
| 22 |
+
|--------|----------|---------------------|------------------------------|--------|
|
| 23 |
+
| `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
|
| 24 |
+
| `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
|
| 25 |
+
| `origin/main` | GitHub | YES | NO | **SAFE** |
|
| 26 |
+
| `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
|
| 27 |
+
| `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
|
| 28 |
+
| Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
|
| 29 |
+
|
| 30 |
+
### Key Files at Risk
|
| 31 |
+
|
| 32 |
+
**On `origin/dev` (PRESERVED):**
|
| 33 |
+
```text
|
| 34 |
+
src/agents/
|
| 35 |
+
├── analysis_agent.py # StatisticalAnalyzer wrapper
|
| 36 |
+
├── hypothesis_agent.py # Hypothesis generation
|
| 37 |
+
├── judge_agent.py # JudgeHandler wrapper
|
| 38 |
+
├── magentic_agents.py # Multi-agent definitions
|
| 39 |
+
├── report_agent.py # Report synthesis
|
| 40 |
+
├── search_agent.py # SearchHandler wrapper
|
| 41 |
+
├── state.py # Thread-safe state management
|
| 42 |
+
└── tools.py # @ai_function decorated tools
|
| 43 |
+
|
| 44 |
+
src/orchestrator_magentic.py # Multi-agent orchestrator
|
| 45 |
+
src/utils/llm_factory.py # Centralized LLM client factory
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
**Deleted in refactor branch (would be lost if merged):**
|
| 49 |
+
- All of the above
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## 3. Target Architecture
|
| 54 |
+
|
| 55 |
+
```text
|
| 56 |
+
┌─────────────────────────────────────────────────────────────────┐
|
| 57 |
+
│ Microsoft Agent Framework (Orchestration Layer) │
|
| 58 |
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
| 59 |
+
│ │ SearchAgent │→ │ JudgeAgent │→ │ ReportAgent │ │
|
| 60 |
+
│ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
|
| 61 |
+
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
|
| 62 |
+
│ │ │ │ │
|
| 63 |
+
│ ▼ ▼ ▼ │
|
| 64 |
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
| 65 |
+
│ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
|
| 66 |
+
│ │ Agent() │ │ Agent() │ │ Agent() │ │
|
| 67 |
+
│ │ output_type= │ │ output_type= │ │ output_type= │ │
|
| 68 |
+
│ │ SearchResult │ │ JudgeAssess │ │ Report │ │
|
| 69 |
+
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
| 70 |
+
└─────────────────────────────────────────────────────────────────┘
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
**Why this architecture:**
|
| 74 |
+
1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
|
| 75 |
+
2. **pydantic-ai** handles: type-safe LLM calls within each agent
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
|
| 79 |
+
## 4. CRITICAL: Naming Confusion Clarification
|
| 80 |
+
|
| 81 |
+
> **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
|
| 82 |
+
|
| 83 |
+
**The naming confusion:**
|
| 84 |
+
- `magentic` (PyPI package): A different library for structured LLM outputs
|
| 85 |
+
- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
|
| 86 |
+
- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
|
| 87 |
+
|
| 88 |
+
**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## 5. What the Refactor DID Get Right
|
| 93 |
+
|
| 94 |
+
The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
|
| 95 |
+
|
| 96 |
+
1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
|
| 97 |
+
2. **HuggingFace free tier support** - `HuggingFaceModel` integration
|
| 98 |
+
3. **Test fix** - Properly mocks `HuggingFaceModel` class
|
| 99 |
+
4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
|
| 100 |
+
|
| 101 |
+
**What it got WRONG:**
|
| 102 |
+
1. Deleted `src/agents/` entirely instead of refactoring them
|
| 103 |
+
2. Deleted `src/orchestrator_magentic.py` instead of fixing it
|
| 104 |
+
3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## 6. Options for Path Forward
|
| 109 |
+
|
| 110 |
+
### Option A: Abandon Refactor, Start Fresh
|
| 111 |
+
- Close PR #41
|
| 112 |
+
- Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
|
| 113 |
+
- Reset local `dev` to match `origin/dev`
|
| 114 |
+
- Cherry-pick ONLY the good parts (judges.py improvements, HF support)
|
| 115 |
+
- **Pros:** Clean, safe
|
| 116 |
+
- **Cons:** Lose some work, need to redo carefully
|
| 117 |
+
|
| 118 |
+
### Option B: Cherry-Pick Good Parts to origin/dev
|
| 119 |
+
- Do NOT merge PR #41
|
| 120 |
+
- Create new branch from `origin/dev`
|
| 121 |
+
- Cherry-pick specific commits/changes that improve pydantic-ai usage
|
| 122 |
+
- Keep agent framework code intact
|
| 123 |
+
- **Pros:** Preserves both, surgical
|
| 124 |
+
- **Cons:** Requires careful file-by-file review
|
| 125 |
+
|
| 126 |
+
### Option C: Revert Deletions in Refactor Branch
|
| 127 |
+
- On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
|
| 128 |
+
- Keep the pydantic-ai improvements
|
| 129 |
+
- Merge THAT to dev
|
| 130 |
+
- **Pros:** Gets both
|
| 131 |
+
- **Cons:** Complex git operations, risk of conflicts
|
| 132 |
+
|
| 133 |
+
---
|
| 134 |
+
|
| 135 |
+
## 7. Recommended Action: Option B (Cherry-Pick)
|
| 136 |
+
|
| 137 |
+
**Step-by-step:**
|
| 138 |
+
|
| 139 |
+
1. **Close PR #41** (do not merge)
|
| 140 |
+
2. **Delete redundant branches:**
|
| 141 |
+
- `refactor/pydantic-unification` (local)
|
| 142 |
+
- Reset local `dev` to `origin/dev`
|
| 143 |
+
3. **Create new branch from origin/dev:**
|
| 144 |
+
```bash
|
| 145 |
+
git checkout -b feat/pydantic-ai-improvements origin/dev
|
| 146 |
+
```
|
| 147 |
+
4. **Cherry-pick or manually port these improvements:**
|
| 148 |
+
- `src/agent_factory/judges.py` - the unified `get_model()` function
|
| 149 |
+
- `examples/free_tier_demo.py` - HuggingFace demo
|
| 150 |
+
- Test improvements
|
| 151 |
+
5. **Do NOT delete any agent framework files**
|
| 152 |
+
6. **Create PR for review**
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
+
|
| 156 |
+
## 8. Files to Cherry-Pick (Safe Improvements)
|
| 157 |
+
|
| 158 |
+
| File | What Changed | Safe to Port? |
|
| 159 |
+
|------|-------------|---------------|
|
| 160 |
+
| `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
|
| 161 |
+
| `examples/free_tier_demo.py` | New demo for HF inference | YES |
|
| 162 |
+
| `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
|
| 163 |
+
| `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
## 9. Questions to Answer Before Proceeding
|
| 168 |
+
|
| 169 |
+
1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
|
| 170 |
+
2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
|
| 171 |
+
3. **Timeline**: How much time do we have to get this right?
|
| 172 |
+
|
| 173 |
+
---
|
| 174 |
+
|
| 175 |
+
## 10. Immediate Actions (DO NOW)
|
| 176 |
+
|
| 177 |
+
- [ ] **DO NOT merge PR #41**
|
| 178 |
+
- [ ] Close PR #41 with comment explaining the situation
|
| 179 |
+
- [ ] Do not push local `dev` branch anywhere
|
| 180 |
+
- [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
|
| 181 |
+
|
| 182 |
+
---
|
| 183 |
+
|
| 184 |
+
## 11. Decision Log
|
| 185 |
+
|
| 186 |
+
| Date | Decision | Rationale |
|
| 187 |
+
|------|----------|-----------|
|
| 188 |
+
| 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
|
| 189 |
+
| TBD | ? | Awaiting decision on path forward |
|
|
@@ -0,0 +1,289 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Architecture Specification: Dual-Mode Agent System
|
| 2 |
+
|
| 3 |
+
**Date:** November 27, 2025
|
| 4 |
+
**Status:** SPECIFICATION
|
| 5 |
+
**Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 1. Core Concept: Two Operating Modes
|
| 10 |
+
|
| 11 |
+
```text
|
| 12 |
+
┌─────────────────────────────────────────────────────────────────────┐
|
| 13 |
+
│ USER REQUEST │
|
| 14 |
+
│ │ │
|
| 15 |
+
│ ▼ │
|
| 16 |
+
│ ┌─────────────────┐ │
|
| 17 |
+
│ │ Mode Selection │ │
|
| 18 |
+
│ │ (Auto-detect) │ │
|
| 19 |
+
│ └────────┬────────┘ │
|
| 20 |
+
│ │ │
|
| 21 |
+
│ ┌───────────────┴───────────────┐ │
|
| 22 |
+
│ │ │ │
|
| 23 |
+
│ ▼ ▼ │
|
| 24 |
+
│ ┌─────────────────┐ ┌─────────────────┐ │
|
| 25 |
+
│ │ SIMPLE MODE │ │ ADVANCED MODE │ │
|
| 26 |
+
│ │ (Free Tier) │ │ (Paid Tier) │ │
|
| 27 |
+
│ │ │ │ │ │
|
| 28 |
+
│ │ pydantic-ai │ │ MS Agent Fwk │ │
|
| 29 |
+
│ │ single-agent │ │ + pydantic-ai │ │
|
| 30 |
+
│ │ loop │ │ multi-agent │ │
|
| 31 |
+
│ └─────────────────┘ └─────────────────┘ │
|
| 32 |
+
│ │ │ │
|
| 33 |
+
│ └───────────────┬───────────────┘ │
|
| 34 |
+
│ ▼ │
|
| 35 |
+
│ ┌─────────────────┐ │
|
| 36 |
+
│ │ Research Report │ │
|
| 37 |
+
│ │ with Citations │ │
|
| 38 |
+
│ └─────────────────┘ │
|
| 39 |
+
└─────────────────────────────────────────────────────────────────────┘
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
|
| 44 |
+
## 2. Mode Comparison
|
| 45 |
+
|
| 46 |
+
| Aspect | Simple Mode | Advanced Mode |
|
| 47 |
+
|--------|-------------|---------------|
|
| 48 |
+
| **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
|
| 49 |
+
| **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
|
| 50 |
+
| **Architecture** | Single orchestrator loop | Multi-agent coordination |
|
| 51 |
+
| **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
|
| 52 |
+
| **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
|
| 53 |
+
| **Quality** | Good (functional) | Better (specialized agents, coordination) |
|
| 54 |
+
| **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
|
| 55 |
+
| **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
|
| 56 |
+
|
| 57 |
+
---
|
| 58 |
+
|
| 59 |
+
## 3. Simple Mode Architecture (pydantic-ai Only)
|
| 60 |
+
|
| 61 |
+
```text
|
| 62 |
+
┌─────────────────────────────────────────────────────┐
|
| 63 |
+
│ Orchestrator │
|
| 64 |
+
│ │
|
| 65 |
+
│ while not sufficient and iteration < max: │
|
| 66 |
+
│ 1. SearchHandler.execute(query) │
|
| 67 |
+
│ 2. JudgeHandler.assess(evidence) ◄── pydantic-ai Agent │
|
| 68 |
+
│ 3. if sufficient: break │
|
| 69 |
+
│ 4. query = judge.next_queries │
|
| 70 |
+
│ │
|
| 71 |
+
│ return ReportGenerator.generate(evidence) │
|
| 72 |
+
└─────────────────────────────────────────────────────┘
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
**Components:**
|
| 76 |
+
- `src/orchestrator.py` - Simple loop orchestrator
|
| 77 |
+
- `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
|
| 78 |
+
- `src/tools/search_handler.py` - Scatter-gather search
|
| 79 |
+
- `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
|
| 84 |
+
|
| 85 |
+
```text
|
| 86 |
+
┌─────────────────────────────────────────────────────────────────────┐
|
| 87 |
+
│ Microsoft Agent Framework Orchestrator │
|
| 88 |
+
│ │
|
| 89 |
+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
| 90 |
+
│ │ SearchAgent │───▶│ JudgeAgent │───▶│ ReportAgent │ │
|
| 91 |
+
│ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
|
| 92 |
+
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
|
| 93 |
+
│ │ │ │ │
|
| 94 |
+
│ ▼ ▼ ▼ │
|
| 95 |
+
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
| 96 |
+
│ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
|
| 97 |
+
│ │ Agent() │ │ Agent() │ │ Agent() │ │
|
| 98 |
+
│ │ output_type=│ │ output_type=│ │ output_type=│ │
|
| 99 |
+
│ │ SearchResult│ │ JudgeAssess │ │ Report │ │
|
| 100 |
+
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
| 101 |
+
│ │
|
| 102 |
+
│ Shared State: MagenticState (thread-safe via contextvars) │
|
| 103 |
+
│ - evidence: list[Evidence] │
|
| 104 |
+
│ - embedding_service: EmbeddingService │
|
| 105 |
+
└─────────────────────────────────────────────────────────────────────┘
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
**Components:**
|
| 109 |
+
- `src/orchestrator_magentic.py` - Multi-agent orchestrator
|
| 110 |
+
- `src/agents/search_agent.py` - SearchAgent (BaseAgent)
|
| 111 |
+
- `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
|
| 112 |
+
- `src/agents/report_agent.py` - ReportAgent (BaseAgent)
|
| 113 |
+
- `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
|
| 114 |
+
- `src/agents/state.py` - Thread-safe state management
|
| 115 |
+
- `src/agents/tools.py` - @ai_function decorated tools
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
+
|
| 119 |
+
## 5. Mode Selection Logic
|
| 120 |
+
|
| 121 |
+
```python
|
| 122 |
+
# src/orchestrator_factory.py (actual implementation)
|
| 123 |
+
|
| 124 |
+
def create_orchestrator(
|
| 125 |
+
search_handler: SearchHandlerProtocol | None = None,
|
| 126 |
+
judge_handler: JudgeHandlerProtocol | None = None,
|
| 127 |
+
config: OrchestratorConfig | None = None,
|
| 128 |
+
mode: Literal["simple", "magentic", "advanced"] | None = None,
|
| 129 |
+
) -> Any:
|
| 130 |
+
"""
|
| 131 |
+
Auto-select orchestrator based on available credentials.
|
| 132 |
+
|
| 133 |
+
Priority:
|
| 134 |
+
1. If mode explicitly set, use that
|
| 135 |
+
2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
|
| 136 |
+
3. Otherwise -> Simple Mode (HuggingFace free tier)
|
| 137 |
+
"""
|
| 138 |
+
effective_mode = _determine_mode(mode)
|
| 139 |
+
|
| 140 |
+
if effective_mode == "advanced":
|
| 141 |
+
orchestrator_cls = _get_magentic_orchestrator_class()
|
| 142 |
+
return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
|
| 143 |
+
|
| 144 |
+
# Simple mode requires handlers
|
| 145 |
+
if search_handler is None or judge_handler is None:
|
| 146 |
+
raise ValueError("Simple mode requires search_handler and judge_handler")
|
| 147 |
+
|
| 148 |
+
return Orchestrator(
|
| 149 |
+
search_handler=search_handler,
|
| 150 |
+
judge_handler=judge_handler,
|
| 151 |
+
config=config,
|
| 152 |
+
)
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## 6. Shared Components (Both Modes Use)
|
| 158 |
+
|
| 159 |
+
These components work in both modes:
|
| 160 |
+
|
| 161 |
+
| Component | Purpose |
|
| 162 |
+
|-----------|---------|
|
| 163 |
+
| `src/tools/pubmed.py` | PubMed search |
|
| 164 |
+
| `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
|
| 165 |
+
| `src/tools/europepmc.py` | Europe PMC search |
|
| 166 |
+
| `src/tools/search_handler.py` | Scatter-gather orchestration |
|
| 167 |
+
| `src/tools/rate_limiter.py` | Rate limiting |
|
| 168 |
+
| `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
|
| 169 |
+
| `src/utils/config.py` | Settings |
|
| 170 |
+
| `src/services/embeddings.py` | Vector search (optional) |
|
| 171 |
+
|
| 172 |
+
---
|
| 173 |
+
|
| 174 |
+
## 7. pydantic-ai Integration Points
|
| 175 |
+
|
| 176 |
+
Both modes use pydantic-ai for structured LLM outputs:
|
| 177 |
+
|
| 178 |
+
```python
|
| 179 |
+
# In JudgeHandler (both modes)
|
| 180 |
+
from pydantic_ai import Agent
|
| 181 |
+
from pydantic_ai.models.huggingface import HuggingFaceModel
|
| 182 |
+
from pydantic_ai.models.openai import OpenAIModel
|
| 183 |
+
from pydantic_ai.models.anthropic import AnthropicModel
|
| 184 |
+
|
| 185 |
+
class JudgeHandler:
|
| 186 |
+
def __init__(self, model: Any = None):
|
| 187 |
+
self.model = model or get_model() # Auto-selects based on config
|
| 188 |
+
self.agent = Agent(
|
| 189 |
+
model=self.model,
|
| 190 |
+
output_type=JudgeAssessment, # Structured output!
|
| 191 |
+
system_prompt=SYSTEM_PROMPT,
|
| 192 |
+
)
|
| 193 |
+
|
| 194 |
+
async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
|
| 195 |
+
result = await self.agent.run(format_prompt(question, evidence))
|
| 196 |
+
return result.output # Guaranteed to be JudgeAssessment
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## 8. Microsoft Agent Framework Integration Points
|
| 202 |
+
|
| 203 |
+
Advanced mode wraps pydantic-ai agents in BaseAgent:
|
| 204 |
+
|
| 205 |
+
```python
|
| 206 |
+
# In JudgeAgent (advanced mode only)
|
| 207 |
+
from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
|
| 208 |
+
|
| 209 |
+
class JudgeAgent(BaseAgent):
|
| 210 |
+
def __init__(self, judge_handler: JudgeHandlerProtocol):
|
| 211 |
+
super().__init__(
|
| 212 |
+
name="JudgeAgent",
|
| 213 |
+
description="Evaluates evidence quality",
|
| 214 |
+
)
|
| 215 |
+
self._handler = judge_handler # Uses pydantic-ai internally
|
| 216 |
+
|
| 217 |
+
async def run(self, messages, **kwargs) -> AgentRunResponse:
|
| 218 |
+
question = extract_question(messages)
|
| 219 |
+
evidence = self._evidence_store.get("current", [])
|
| 220 |
+
|
| 221 |
+
# Delegate to pydantic-ai powered handler
|
| 222 |
+
assessment = await self._handler.assess(question, evidence)
|
| 223 |
+
|
| 224 |
+
return AgentRunResponse(
|
| 225 |
+
messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
|
| 226 |
+
additional_properties={"assessment": assessment.model_dump()},
|
| 227 |
+
)
|
| 228 |
+
```
|
| 229 |
+
|
| 230 |
+
---
|
| 231 |
+
|
| 232 |
+
## 9. Benefits of This Architecture
|
| 233 |
+
|
| 234 |
+
1. **Graceful Degradation**: Works without API keys (free tier)
|
| 235 |
+
2. **Progressive Enhancement**: Better with API keys (orchestration)
|
| 236 |
+
3. **Code Reuse**: pydantic-ai handlers shared between modes
|
| 237 |
+
4. **Hackathon Ready**: Demo works without requiring paid keys
|
| 238 |
+
5. **Production Ready**: Full orchestration available when needed
|
| 239 |
+
6. **Future Proof**: Can add more agents to advanced mode
|
| 240 |
+
7. **Testable**: Simple mode is easier to unit test
|
| 241 |
+
|
| 242 |
+
---
|
| 243 |
+
|
| 244 |
+
## 10. Known Risks and Mitigations
|
| 245 |
+
|
| 246 |
+
> **From Senior Agent Review**
|
| 247 |
+
|
| 248 |
+
### 10.1 Bridge Complexity (MEDIUM)
|
| 249 |
+
|
| 250 |
+
**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
|
| 251 |
+
|
| 252 |
+
**Mitigation:**
|
| 253 |
+
- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
|
| 254 |
+
- Test context propagation explicitly in integration tests
|
| 255 |
+
- If issues arise, pass state explicitly rather than via context vars
|
| 256 |
+
|
| 257 |
+
### 10.2 Integration Drift (MEDIUM)
|
| 258 |
+
|
| 259 |
+
**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
|
| 260 |
+
|
| 261 |
+
**Mitigation:**
|
| 262 |
+
- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
|
| 263 |
+
- Handlers are the single source of truth for business logic
|
| 264 |
+
- Agents are thin wrappers that delegate to handlers
|
| 265 |
+
|
| 266 |
+
### 10.3 Testing Burden (LOW-MEDIUM)
|
| 267 |
+
|
| 268 |
+
**Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
|
| 269 |
+
|
| 270 |
+
**Mitigation:**
|
| 271 |
+
- Unit test handlers independently (shared code)
|
| 272 |
+
- Integration tests for each mode separately
|
| 273 |
+
- End-to-end tests verify same output for same input (determinism permitting)
|
| 274 |
+
|
| 275 |
+
### 10.4 Dependency Conflicts (LOW)
|
| 276 |
+
|
| 277 |
+
**Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
|
| 278 |
+
|
| 279 |
+
**Status:** Both use `pydantic>=2.x`. Should be compatible.
|
| 280 |
+
|
| 281 |
+
---
|
| 282 |
+
|
| 283 |
+
## 11. Naming Clarification
|
| 284 |
+
|
| 285 |
+
> See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
|
| 286 |
+
|
| 287 |
+
**Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
|
| 288 |
+
|
| 289 |
+
**Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.
|
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Implementation Phases: Dual-Mode Agent System
|
| 2 |
+
|
| 3 |
+
**Date:** November 27, 2025
|
| 4 |
+
**Status:** IMPLEMENTATION PLAN (REVISED)
|
| 5 |
+
**Strategy:** TDD (Test-Driven Development), SOLID Principles
|
| 6 |
+
**Dependency Strategy:** PyPI (agent-framework-core)
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Phase 0: Environment Validation & Cleanup
|
| 11 |
+
|
| 12 |
+
**Goal:** Ensure clean state and dependencies are correctly installed.
|
| 13 |
+
|
| 14 |
+
### Step 0.1: Verify PyPI Package
|
| 15 |
+
The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
|
| 16 |
+
|
| 17 |
+
```bash
|
| 18 |
+
uv sync --all-extras
|
| 19 |
+
python -c "from agent_framework import ChatAgent; print('OK')"
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
### Step 0.2: Branch State
|
| 23 |
+
We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
|
| 24 |
+
|
| 25 |
+
**Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
|
| 26 |
+
The production dependency uses the official PyPI release.
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Phase 1: Pydantic-AI Improvements (Simple Mode)
|
| 31 |
+
|
| 32 |
+
**Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
|
| 33 |
+
|
| 34 |
+
### Step 1.1: Test First (Red)
|
| 35 |
+
Create `tests/unit/agent_factory/test_judges_factory.py`:
|
| 36 |
+
- Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
|
| 37 |
+
- Test `get_model()` respects `HF_TOKEN`.
|
| 38 |
+
- Test fallback to OpenAI.
|
| 39 |
+
|
| 40 |
+
### Step 1.2: Implementation (Green)
|
| 41 |
+
Update `src/utils/config.py`:
|
| 42 |
+
- Add `huggingface_model` and `hf_token` fields.
|
| 43 |
+
|
| 44 |
+
Update `src/agent_factory/judges.py`:
|
| 45 |
+
- Implement `get_model` with the logic derived from the tests.
|
| 46 |
+
- Use dependency injection for the model where possible.
|
| 47 |
+
|
| 48 |
+
### Step 1.3: Refactor
|
| 49 |
+
Ensure `JudgeHandler` is loosely coupled from the specific model provider.
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## Phase 2: Orchestrator Factory (The Switch)
|
| 54 |
+
|
| 55 |
+
**Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
|
| 56 |
+
|
| 57 |
+
### Step 2.1: Test First (Red)
|
| 58 |
+
Create `tests/unit/test_orchestrator_factory.py`:
|
| 59 |
+
- Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
|
| 60 |
+
- Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
|
| 61 |
+
- Test explicit mode override.
|
| 62 |
+
|
| 63 |
+
### Step 2.2: Implementation (Green)
|
| 64 |
+
Update `src/orchestrator_factory.py` to implement the selection logic.
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## Phase 3: Agent Framework Integration (Advanced Mode)
|
| 69 |
+
|
| 70 |
+
**Goal:** Integrate Microsoft Agent Framework from PyPI.
|
| 71 |
+
|
| 72 |
+
### Step 3.1: Dependency Management
|
| 73 |
+
The `agent-framework-core` package is installed from PyPI:
|
| 74 |
+
```toml
|
| 75 |
+
[project.optional-dependencies]
|
| 76 |
+
magentic = [
|
| 77 |
+
"agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
|
| 78 |
+
]
|
| 79 |
+
```
|
| 80 |
+
Install with: `uv sync --all-extras`
|
| 81 |
+
|
| 82 |
+
### Step 3.2: Verify Imports (Test First)
|
| 83 |
+
Create `tests/unit/agents/test_agent_imports.py`:
|
| 84 |
+
- Verify `from agent_framework import ChatAgent` works.
|
| 85 |
+
- Verify instantiation of `ChatAgent` with a mock client.
|
| 86 |
+
|
| 87 |
+
### Step 3.3: Update Agents
|
| 88 |
+
Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
|
| 89 |
+
- **SOLID:** Ensure agents have single responsibilities.
|
| 90 |
+
- **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## Phase 4: UI & End-to-End Verification
|
| 95 |
+
|
| 96 |
+
**Goal:** Update Gradio to reflect the active mode.
|
| 97 |
+
|
| 98 |
+
### Step 4.1: UI Updates
|
| 99 |
+
Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
|
| 100 |
+
|
| 101 |
+
### Step 4.2: End-to-End Test
|
| 102 |
+
Run the full loop:
|
| 103 |
+
1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
|
| 104 |
+
2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## Phase 5: Cleanup & Documentation
|
| 109 |
+
|
| 110 |
+
- Remove unused code.
|
| 111 |
+
- Update main README.md.
|
| 112 |
+
- Final `make check`.
|
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Immediate Actions Checklist
|
| 2 |
+
|
| 3 |
+
**Date:** November 27, 2025
|
| 4 |
+
**Priority:** Execute in order
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Before Starting Implementation
|
| 9 |
+
|
| 10 |
+
### 1. Close PR #41 (CRITICAL)
|
| 11 |
+
|
| 12 |
+
```bash
|
| 13 |
+
gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
### 2. Verify HuggingFace Spaces is Safe
|
| 17 |
+
|
| 18 |
+
```bash
|
| 19 |
+
# Should show agent framework files exist
|
| 20 |
+
git ls-tree --name-only huggingface-upstream/dev -- src/agents/
|
| 21 |
+
git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
Expected output: Files should exist (they do as of this writing).
|
| 25 |
+
|
| 26 |
+
### 3. Clean Local Environment
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
# Switch to main first
|
| 30 |
+
git checkout main
|
| 31 |
+
|
| 32 |
+
# Delete problematic branches
|
| 33 |
+
git branch -D refactor/pydantic-unification 2>/dev/null || true
|
| 34 |
+
git branch -D feat/pubmed-fulltext 2>/dev/null || true
|
| 35 |
+
|
| 36 |
+
# Reset local dev to origin/dev
|
| 37 |
+
git branch -D dev 2>/dev/null || true
|
| 38 |
+
git checkout -b dev origin/dev
|
| 39 |
+
|
| 40 |
+
# Verify agent framework code exists
|
| 41 |
+
ls src/agents/
|
| 42 |
+
# Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
|
| 43 |
+
# magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
|
| 44 |
+
|
| 45 |
+
ls src/orchestrator_magentic.py
|
| 46 |
+
# Expected: file exists
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
### 4. Create Fresh Feature Branch
|
| 50 |
+
|
| 51 |
+
```bash
|
| 52 |
+
git checkout -b feat/dual-mode-architecture origin/dev
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Decision Points
|
| 58 |
+
|
| 59 |
+
Before proceeding, confirm:
|
| 60 |
+
|
| 61 |
+
1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
|
| 62 |
+
- Simple mode = faster to implement, works today
|
| 63 |
+
- Advanced mode = better quality, more work
|
| 64 |
+
|
| 65 |
+
2. **Timeline**: How much time do we have?
|
| 66 |
+
- If < 1 day: Focus on simple mode only
|
| 67 |
+
- If > 1 day: Implement dual-mode
|
| 68 |
+
|
| 69 |
+
3. **Dependencies**: Is `agent-framework-core` available?
|
| 70 |
+
- Check: `pip index versions agent-framework-core`
|
| 71 |
+
- If not on PyPI, may need to install from GitHub
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## Quick Start (Simple Mode Only)
|
| 76 |
+
|
| 77 |
+
If time is limited, implement only simple mode improvements:
|
| 78 |
+
|
| 79 |
+
```bash
|
| 80 |
+
# On feat/dual-mode-architecture branch
|
| 81 |
+
|
| 82 |
+
# 1. Update judges.py to add HuggingFace support
|
| 83 |
+
# 2. Update config.py to add HF settings
|
| 84 |
+
# 3. Create free_tier_demo.py
|
| 85 |
+
# 4. Run make check
|
| 86 |
+
# 5. Create PR to dev
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
This gives you free-tier capability without touching agent framework code.
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## Quick Start (Full Dual-Mode)
|
| 94 |
+
|
| 95 |
+
If time permits, implement full dual-mode:
|
| 96 |
+
|
| 97 |
+
Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## Emergency Rollback
|
| 102 |
+
|
| 103 |
+
If anything goes wrong:
|
| 104 |
+
|
| 105 |
+
```bash
|
| 106 |
+
# Reset to safe state
|
| 107 |
+
git checkout main
|
| 108 |
+
git branch -D feat/dual-mode-architecture
|
| 109 |
+
git checkout -b feat/dual-mode-architecture origin/dev
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
Origin/dev is the safe fallback - it has agent framework intact.
|
|
@@ -0,0 +1,158 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Follow-Up Review Request: Did We Implement Your Feedback?
|
| 2 |
+
|
| 3 |
+
**Date:** November 27, 2025
|
| 4 |
+
**Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Your Original Feedback vs Our Changes
|
| 9 |
+
|
| 10 |
+
### 1. Naming Confusion Clarification
|
| 11 |
+
|
| 12 |
+
**Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
|
| 13 |
+
|
| 14 |
+
**Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
|
| 15 |
+
```markdown
|
| 16 |
+
## 4. CRITICAL: Naming Confusion Clarification
|
| 17 |
+
|
| 18 |
+
> **Senior Agent Review Finding:** The codebase uses "magentic" in file names
|
| 19 |
+
> (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
|
| 20 |
+
> the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
|
| 21 |
+
|
| 22 |
+
**The naming confusion:**
|
| 23 |
+
- `magentic` (PyPI package): A different library for structured LLM outputs
|
| 24 |
+
- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
|
| 25 |
+
- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
|
| 26 |
+
|
| 27 |
+
**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
**Status:** ✅ IMPLEMENTED
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
### 2. Bridge Complexity Warning
|
| 35 |
+
|
| 36 |
+
**Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
|
| 37 |
+
|
| 38 |
+
**Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
|
| 39 |
+
```markdown
|
| 40 |
+
### 10.1 Bridge Complexity (MEDIUM)
|
| 41 |
+
|
| 42 |
+
**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
|
| 43 |
+
Both are async. Context variables (`MagenticState`) must propagate correctly.
|
| 44 |
+
|
| 45 |
+
**Mitigation:**
|
| 46 |
+
- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
|
| 47 |
+
- Test context propagation explicitly in integration tests
|
| 48 |
+
- If issues arise, pass state explicitly rather than via context vars
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
**Status:** ✅ IMPLEMENTED
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
### 3. Integration Drift Warning
|
| 56 |
+
|
| 57 |
+
**Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
|
| 58 |
+
|
| 59 |
+
**Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
|
| 60 |
+
```markdown
|
| 61 |
+
### 10.2 Integration Drift (MEDIUM)
|
| 62 |
+
|
| 63 |
+
**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
|
| 64 |
+
|
| 65 |
+
**Mitigation:**
|
| 66 |
+
- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
|
| 67 |
+
- Handlers are the single source of truth for business logic
|
| 68 |
+
- Agents are thin wrappers that delegate to handlers
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
**Status:** ✅ IMPLEMENTED
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
### 4. Testing Burden Warning
|
| 76 |
+
|
| 77 |
+
**Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
|
| 78 |
+
|
| 79 |
+
**Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
|
| 80 |
+
```markdown
|
| 81 |
+
### 10.3 Testing Burden (LOW-MEDIUM)
|
| 82 |
+
|
| 83 |
+
**Risk:** Two distinct orchestrators doubles integration testing surface area.
|
| 84 |
+
|
| 85 |
+
**Mitigation:**
|
| 86 |
+
- Unit test handlers independently (shared code)
|
| 87 |
+
- Integration tests for each mode separately
|
| 88 |
+
- End-to-end tests verify same output for same input
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
**Status:** ✅ IMPLEMENTED
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
### 5. Rename Recommendation
|
| 96 |
+
|
| 97 |
+
**Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
|
| 98 |
+
|
| 99 |
+
**Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
|
| 100 |
+
```markdown
|
| 101 |
+
### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
|
| 102 |
+
|
| 103 |
+
> **Senior Agent Recommendation:** Rename files to eliminate confusion.
|
| 104 |
+
|
| 105 |
+
git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
|
| 106 |
+
git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
|
| 107 |
+
|
| 108 |
+
**Note:** This is optional for the hackathon. Can be done in a follow-up PR.
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
**Status:** ✅ DOCUMENTED (marked as optional for hackathon)
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
### 6. Standardize Wrapper Recommendation
|
| 116 |
+
|
| 117 |
+
**Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
|
| 118 |
+
|
| 119 |
+
**Our change:** NOT YET DOCUMENTED
|
| 120 |
+
|
| 121 |
+
**Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## Questions for Your Review
|
| 126 |
+
|
| 127 |
+
1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
|
| 128 |
+
|
| 129 |
+
2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
|
| 130 |
+
|
| 131 |
+
3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
|
| 132 |
+
- Pin to a specific version?
|
| 133 |
+
- Use a version range?
|
| 134 |
+
- Install from GitHub source?
|
| 135 |
+
|
| 136 |
+
4. **Anything else we missed?**
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
## Files to Re-Review
|
| 141 |
+
|
| 142 |
+
1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
|
| 143 |
+
2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
|
| 144 |
+
3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
|
| 145 |
+
|
| 146 |
+
---
|
| 147 |
+
|
| 148 |
+
## Current Branch State
|
| 149 |
+
|
| 150 |
+
We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
|
| 151 |
+
- ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
|
| 152 |
+
- ✅ Documentation committed
|
| 153 |
+
- ❌ PR #41 still open (need to close it)
|
| 154 |
+
- ❌ Cherry-pick of pydantic-ai improvements not yet done
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?
|
|
@@ -0,0 +1,113 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Senior Agent Review Prompt
|
| 2 |
+
|
| 3 |
+
Copy and paste everything below this line to a fresh Claude/AI session:
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Context
|
| 8 |
+
|
| 9 |
+
I am a junior developer working on a HuggingFace hackathon project called DeepCritical. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
|
| 10 |
+
|
| 11 |
+
## The Situation
|
| 12 |
+
|
| 13 |
+
We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
|
| 14 |
+
|
| 15 |
+
**They are not.** They are complementary:
|
| 16 |
+
- `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
|
| 17 |
+
- `agent-framework` orchestrates multiple agents working together (coordination layer)
|
| 18 |
+
|
| 19 |
+
We now want to implement a **dual-mode architecture** where:
|
| 20 |
+
- **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
|
| 21 |
+
- **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
|
| 22 |
+
|
| 23 |
+
## Your Task
|
| 24 |
+
|
| 25 |
+
Please perform a **deep, critical review** of:
|
| 26 |
+
|
| 27 |
+
1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
|
| 28 |
+
2. **Our documentation** (4 files listed below)
|
| 29 |
+
3. **The actual codebase** to verify our claims
|
| 30 |
+
|
| 31 |
+
## Specific Questions to Answer
|
| 32 |
+
|
| 33 |
+
### Architecture Validation
|
| 34 |
+
1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
|
| 35 |
+
2. Does the dual-mode architecture diagram accurately represent how these should integrate?
|
| 36 |
+
3. Are there any architectural flaws or anti-patterns in our proposed design?
|
| 37 |
+
|
| 38 |
+
### Documentation Accuracy
|
| 39 |
+
4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
|
| 40 |
+
5. Is our understanding of what code exists where correct?
|
| 41 |
+
6. Are the implementation phases realistic and in the correct order?
|
| 42 |
+
7. Are there any missing steps or dependencies we overlooked?
|
| 43 |
+
|
| 44 |
+
### Codebase Reality Check
|
| 45 |
+
8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
|
| 46 |
+
- `git ls-tree origin/dev -- src/agents/`
|
| 47 |
+
- `git ls-tree origin/dev -- src/orchestrator_magentic.py`
|
| 48 |
+
9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
|
| 49 |
+
10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
|
| 50 |
+
|
| 51 |
+
### Implementation Feasibility
|
| 52 |
+
11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
|
| 53 |
+
12. Is the mode auto-detection logic sound?
|
| 54 |
+
13. What are the risks we haven't identified?
|
| 55 |
+
|
| 56 |
+
### Critical Errors Check
|
| 57 |
+
14. Did we miss anything critical in our analysis?
|
| 58 |
+
15. Are there any factual errors in our documentation?
|
| 59 |
+
16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
|
| 60 |
+
|
| 61 |
+
## Files to Review
|
| 62 |
+
|
| 63 |
+
Please read these files in order:
|
| 64 |
+
|
| 65 |
+
1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
|
| 66 |
+
2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
|
| 67 |
+
3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
|
| 68 |
+
4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
|
| 69 |
+
|
| 70 |
+
And the architecture diagram:
|
| 71 |
+
5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/assets/magentic-pydantic.png`
|
| 72 |
+
|
| 73 |
+
## Reference Repositories to Consult
|
| 74 |
+
|
| 75 |
+
We have local clones of the source-of-truth repositories:
|
| 76 |
+
|
| 77 |
+
- **Original DeepCritical:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/DeepCritical/`
|
| 78 |
+
- **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/agent-framework/`
|
| 79 |
+
- **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/autogen-microsoft/`
|
| 80 |
+
|
| 81 |
+
Please cross-reference our hackathon fork against these to verify architectural alignment.
|
| 82 |
+
|
| 83 |
+
## Codebase to Analyze
|
| 84 |
+
|
| 85 |
+
Our hackathon fork is at:
|
| 86 |
+
`/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/`
|
| 87 |
+
|
| 88 |
+
Key files to examine:
|
| 89 |
+
- `src/agents/` - Agent framework integration
|
| 90 |
+
- `src/agent_factory/judges.py` - pydantic-ai integration
|
| 91 |
+
- `src/orchestrator.py` - Simple mode orchestrator
|
| 92 |
+
- `src/orchestrator_magentic.py` - Advanced mode orchestrator
|
| 93 |
+
- `src/orchestrator_factory.py` - Mode selection
|
| 94 |
+
- `pyproject.toml` - Dependencies
|
| 95 |
+
|
| 96 |
+
## Expected Output
|
| 97 |
+
|
| 98 |
+
Please provide:
|
| 99 |
+
|
| 100 |
+
1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
|
| 101 |
+
2. **Errors Found:** List any factual errors in our documentation
|
| 102 |
+
3. **Missing Items:** What did we overlook?
|
| 103 |
+
4. **Risk Assessment:** What could go wrong?
|
| 104 |
+
5. **Recommended Changes:** Specific edits to our documentation or plan
|
| 105 |
+
6. **Go/No-Go Recommendation:** Should we proceed with this plan?
|
| 106 |
+
|
| 107 |
+
## Tone
|
| 108 |
+
|
| 109 |
+
Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
END OF PROMPT
|
|
@@ -44,7 +44,7 @@ dev = [
|
|
| 44 |
"pre-commit>=3.7",
|
| 45 |
]
|
| 46 |
magentic = [
|
| 47 |
-
"agent-framework-core>=1.0.0b251120,<2.0.0", #
|
| 48 |
]
|
| 49 |
embeddings = [
|
| 50 |
"chromadb>=0.4.0",
|
|
|
|
| 44 |
"pre-commit>=3.7",
|
| 45 |
]
|
| 46 |
magentic = [
|
| 47 |
+
"agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
|
| 48 |
]
|
| 49 |
embeddings = [
|
| 50 |
"chromadb>=0.4.0",
|
|
@@ -8,8 +8,10 @@ import structlog
|
|
| 8 |
from huggingface_hub import InferenceClient
|
| 9 |
from pydantic_ai import Agent
|
| 10 |
from pydantic_ai.models.anthropic import AnthropicModel
|
|
|
|
| 11 |
from pydantic_ai.models.openai import OpenAIModel
|
| 12 |
from pydantic_ai.providers.anthropic import AnthropicProvider
|
|
|
|
| 13 |
from pydantic_ai.providers.openai import OpenAIProvider
|
| 14 |
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
|
| 15 |
|
|
@@ -36,6 +38,12 @@ def get_model() -> Any:
|
|
| 36 |
provider = AnthropicProvider(api_key=settings.anthropic_api_key)
|
| 37 |
return AnthropicModel(settings.anthropic_model, provider=provider)
|
| 38 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
if llm_provider != "openai":
|
| 40 |
logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
|
| 41 |
|
|
|
|
| 8 |
from huggingface_hub import InferenceClient
|
| 9 |
from pydantic_ai import Agent
|
| 10 |
from pydantic_ai.models.anthropic import AnthropicModel
|
| 11 |
+
from pydantic_ai.models.huggingface import HuggingFaceModel
|
| 12 |
from pydantic_ai.models.openai import OpenAIModel
|
| 13 |
from pydantic_ai.providers.anthropic import AnthropicProvider
|
| 14 |
+
from pydantic_ai.providers.huggingface import HuggingFaceProvider
|
| 15 |
from pydantic_ai.providers.openai import OpenAIProvider
|
| 16 |
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
|
| 17 |
|
|
|
|
| 38 |
provider = AnthropicProvider(api_key=settings.anthropic_api_key)
|
| 39 |
return AnthropicModel(settings.anthropic_model, provider=provider)
|
| 40 |
|
| 41 |
+
if llm_provider == "huggingface":
|
| 42 |
+
# Free tier - uses HF_TOKEN from environment if available
|
| 43 |
+
model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
|
| 44 |
+
hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
|
| 45 |
+
return HuggingFaceModel(model_name, provider=hf_provider)
|
| 46 |
+
|
| 47 |
if llm_provider != "openai":
|
| 48 |
logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
|
| 49 |
|
|
@@ -31,7 +31,7 @@ def configure_orchestrator(
|
|
| 31 |
|
| 32 |
Args:
|
| 33 |
use_mock: If True, use MockJudgeHandler (no API key needed)
|
| 34 |
-
mode: Orchestrator mode ("simple" or "
|
| 35 |
user_api_key: Optional user-provided API key (BYOK)
|
| 36 |
api_provider: API provider ("openai" or "anthropic")
|
| 37 |
|
|
@@ -115,7 +115,7 @@ async def research_agent(
|
|
| 115 |
Args:
|
| 116 |
message: User's research question
|
| 117 |
history: Chat history (Gradio format)
|
| 118 |
-
mode: Orchestrator mode ("simple" or "
|
| 119 |
api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
|
| 120 |
api_provider: API provider ("openai" or "anthropic")
|
| 121 |
|
|
@@ -135,10 +135,11 @@ async def research_agent(
|
|
| 135 |
has_user_key = bool(user_api_key)
|
| 136 |
has_paid_key = has_openai or has_anthropic or has_user_key
|
| 137 |
|
| 138 |
-
#
|
| 139 |
-
if mode == "
|
| 140 |
yield (
|
| 141 |
-
"⚠️ **Warning**:
|
|
|
|
| 142 |
)
|
| 143 |
mode = "simple"
|
| 144 |
|
|
@@ -227,10 +228,13 @@ def create_demo() -> gr.ChatInterface:
|
|
| 227 |
additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
|
| 228 |
additional_inputs=[
|
| 229 |
gr.Radio(
|
| 230 |
-
choices=["simple", "
|
| 231 |
value="simple",
|
| 232 |
label="Orchestrator Mode",
|
| 233 |
-
info=
|
|
|
|
|
|
|
|
|
|
| 234 |
),
|
| 235 |
gr.Textbox(
|
| 236 |
label="🔑 API Key (Optional - BYOK)",
|
|
|
|
| 31 |
|
| 32 |
Args:
|
| 33 |
use_mock: If True, use MockJudgeHandler (no API key needed)
|
| 34 |
+
mode: Orchestrator mode ("simple" or "advanced")
|
| 35 |
user_api_key: Optional user-provided API key (BYOK)
|
| 36 |
api_provider: API provider ("openai" or "anthropic")
|
| 37 |
|
|
|
|
| 115 |
Args:
|
| 116 |
message: User's research question
|
| 117 |
history: Chat history (Gradio format)
|
| 118 |
+
mode: Orchestrator mode ("simple" or "advanced")
|
| 119 |
api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
|
| 120 |
api_provider: API provider ("openai" or "anthropic")
|
| 121 |
|
|
|
|
| 135 |
has_user_key = bool(user_api_key)
|
| 136 |
has_paid_key = has_openai or has_anthropic or has_user_key
|
| 137 |
|
| 138 |
+
# Advanced mode requires OpenAI specifically (due to agent-framework binding)
|
| 139 |
+
if mode == "advanced" and not (has_openai or (has_user_key and api_provider == "openai")):
|
| 140 |
yield (
|
| 141 |
+
"⚠️ **Warning**: Advanced mode currently requires OpenAI API key. "
|
| 142 |
+
"Falling back to simple mode.\n\n"
|
| 143 |
)
|
| 144 |
mode = "simple"
|
| 145 |
|
|
|
|
| 228 |
additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
|
| 229 |
additional_inputs=[
|
| 230 |
gr.Radio(
|
| 231 |
+
choices=["simple", "advanced"],
|
| 232 |
value="simple",
|
| 233 |
label="Orchestrator Mode",
|
| 234 |
+
info=(
|
| 235 |
+
"Simple: Linear (Free Tier Friendly) | "
|
| 236 |
+
"Advanced: Multi-Agent (Requires OpenAI)"
|
| 237 |
+
),
|
| 238 |
),
|
| 239 |
gr.Textbox(
|
| 240 |
label="🔑 API Key (Optional - BYOK)",
|
|
@@ -2,15 +2,34 @@
|
|
| 2 |
|
| 3 |
from typing import Any, Literal
|
| 4 |
|
|
|
|
|
|
|
| 5 |
from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
|
|
|
|
| 6 |
from src.utils.models import OrchestratorConfig
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
def create_orchestrator(
|
| 10 |
search_handler: SearchHandlerProtocol | None = None,
|
| 11 |
judge_handler: JudgeHandlerProtocol | None = None,
|
| 12 |
config: OrchestratorConfig | None = None,
|
| 13 |
-
mode: Literal["simple", "magentic"] =
|
| 14 |
) -> Any:
|
| 15 |
"""
|
| 16 |
Create an orchestrator instance.
|
|
@@ -19,25 +38,19 @@ def create_orchestrator(
|
|
| 19 |
search_handler: The search handler (required for simple mode)
|
| 20 |
judge_handler: The judge handler (required for simple mode)
|
| 21 |
config: Optional configuration
|
| 22 |
-
mode: "simple"
|
| 23 |
|
| 24 |
Returns:
|
| 25 |
Orchestrator instance
|
| 26 |
-
|
| 27 |
-
Note:
|
| 28 |
-
Magentic mode does NOT use search_handler/judge_handler.
|
| 29 |
-
It creates ChatAgent instances with internal LLMs that call tools directly.
|
| 30 |
"""
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
from src.orchestrator_magentic import MagenticOrchestrator
|
| 34 |
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
pass
|
| 41 |
|
| 42 |
# Simple mode requires handlers
|
| 43 |
if search_handler is None or judge_handler is None:
|
|
@@ -48,3 +61,17 @@ def create_orchestrator(
|
|
| 48 |
judge_handler=judge_handler,
|
| 49 |
config=config,
|
| 50 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
from typing import Any, Literal
|
| 4 |
|
| 5 |
+
import structlog
|
| 6 |
+
|
| 7 |
from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
|
| 8 |
+
from src.utils.config import settings
|
| 9 |
from src.utils.models import OrchestratorConfig
|
| 10 |
|
| 11 |
+
logger = structlog.get_logger()
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def _get_magentic_orchestrator_class() -> Any:
|
| 15 |
+
"""Import MagenticOrchestrator lazily to avoid hard dependency."""
|
| 16 |
+
try:
|
| 17 |
+
from src.orchestrator_magentic import MagenticOrchestrator
|
| 18 |
+
|
| 19 |
+
return MagenticOrchestrator
|
| 20 |
+
except ImportError as e:
|
| 21 |
+
logger.error("Failed to import MagenticOrchestrator", error=str(e))
|
| 22 |
+
raise ValueError(
|
| 23 |
+
"Advanced mode requires agent-framework-core. "
|
| 24 |
+
"Please install it or use mode='simple'."
|
| 25 |
+
) from e
|
| 26 |
+
|
| 27 |
|
| 28 |
def create_orchestrator(
|
| 29 |
search_handler: SearchHandlerProtocol | None = None,
|
| 30 |
judge_handler: JudgeHandlerProtocol | None = None,
|
| 31 |
config: OrchestratorConfig | None = None,
|
| 32 |
+
mode: Literal["simple", "magentic", "advanced"] | None = None,
|
| 33 |
) -> Any:
|
| 34 |
"""
|
| 35 |
Create an orchestrator instance.
|
|
|
|
| 38 |
search_handler: The search handler (required for simple mode)
|
| 39 |
judge_handler: The judge handler (required for simple mode)
|
| 40 |
config: Optional configuration
|
| 41 |
+
mode: "simple", "magentic", "advanced" or None (auto-detect)
|
| 42 |
|
| 43 |
Returns:
|
| 44 |
Orchestrator instance
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
"""
|
| 46 |
+
effective_mode = _determine_mode(mode)
|
| 47 |
+
logger.info("Creating orchestrator", mode=effective_mode)
|
|
|
|
| 48 |
|
| 49 |
+
if effective_mode == "advanced":
|
| 50 |
+
orchestrator_cls = _get_magentic_orchestrator_class()
|
| 51 |
+
return orchestrator_cls(
|
| 52 |
+
max_rounds=config.max_iterations if config else 10,
|
| 53 |
+
)
|
|
|
|
| 54 |
|
| 55 |
# Simple mode requires handlers
|
| 56 |
if search_handler is None or judge_handler is None:
|
|
|
|
| 61 |
judge_handler=judge_handler,
|
| 62 |
config=config,
|
| 63 |
)
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def _determine_mode(explicit_mode: str | None) -> str:
|
| 67 |
+
"""Determine which mode to use."""
|
| 68 |
+
if explicit_mode:
|
| 69 |
+
if explicit_mode in ("magentic", "advanced"):
|
| 70 |
+
return "advanced"
|
| 71 |
+
return "simple"
|
| 72 |
+
|
| 73 |
+
# Auto-detect: advanced if paid API key available
|
| 74 |
+
if settings.has_openai_key:
|
| 75 |
+
return "advanced"
|
| 76 |
+
|
| 77 |
+
return "simple"
|
|
@@ -23,13 +23,20 @@ class Settings(BaseSettings):
|
|
| 23 |
# LLM Configuration
|
| 24 |
openai_api_key: str | None = Field(default=None, description="OpenAI API key")
|
| 25 |
anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
|
| 26 |
-
llm_provider: Literal["openai", "anthropic"] = Field(
|
| 27 |
default="openai", description="Which LLM provider to use"
|
| 28 |
)
|
| 29 |
openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
|
| 30 |
anthropic_model: str = Field(
|
| 31 |
default="claude-sonnet-4-5-20250929", description="Anthropic model"
|
| 32 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
# Embedding Configuration
|
| 35 |
# Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
|
|
@@ -97,10 +104,15 @@ class Settings(BaseSettings):
|
|
| 97 |
"""Check if Anthropic API key is available."""
|
| 98 |
return bool(self.anthropic_api_key)
|
| 99 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
@property
|
| 101 |
def has_any_llm_key(self) -> bool:
|
| 102 |
"""Check if any LLM API key is available."""
|
| 103 |
-
return self.has_openai_key or self.has_anthropic_key
|
| 104 |
|
| 105 |
|
| 106 |
def get_settings() -> Settings:
|
|
|
|
| 23 |
# LLM Configuration
|
| 24 |
openai_api_key: str | None = Field(default=None, description="OpenAI API key")
|
| 25 |
anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
|
| 26 |
+
llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
|
| 27 |
default="openai", description="Which LLM provider to use"
|
| 28 |
)
|
| 29 |
openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
|
| 30 |
anthropic_model: str = Field(
|
| 31 |
default="claude-sonnet-4-5-20250929", description="Anthropic model"
|
| 32 |
)
|
| 33 |
+
# HuggingFace (free tier)
|
| 34 |
+
huggingface_model: str | None = Field(
|
| 35 |
+
default="meta-llama/Llama-3.1-70B-Instruct", description="HuggingFace model name"
|
| 36 |
+
)
|
| 37 |
+
hf_token: str | None = Field(
|
| 38 |
+
default=None, alias="HF_TOKEN", description="HuggingFace API token"
|
| 39 |
+
)
|
| 40 |
|
| 41 |
# Embedding Configuration
|
| 42 |
# Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
|
|
|
|
| 104 |
"""Check if Anthropic API key is available."""
|
| 105 |
return bool(self.anthropic_api_key)
|
| 106 |
|
| 107 |
+
@property
|
| 108 |
+
def has_huggingface_key(self) -> bool:
|
| 109 |
+
"""Check if HuggingFace token is available."""
|
| 110 |
+
return bool(self.hf_token)
|
| 111 |
+
|
| 112 |
@property
|
| 113 |
def has_any_llm_key(self) -> bool:
|
| 114 |
"""Check if any LLM API key is available."""
|
| 115 |
+
return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
|
| 116 |
|
| 117 |
|
| 118 |
def get_settings() -> Settings:
|
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""End-to-End Integration Tests for Dual-Mode Architecture."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import AsyncMock, MagicMock, patch
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
+
pytestmark = [pytest.mark.integration, pytest.mark.slow]
|
| 8 |
+
|
| 9 |
+
from src.orchestrator_factory import create_orchestrator
|
| 10 |
+
from src.utils.models import Citation, Evidence, OrchestratorConfig
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
@pytest.fixture
|
| 14 |
+
def mock_search_handler():
|
| 15 |
+
handler = MagicMock()
|
| 16 |
+
handler.execute = AsyncMock(
|
| 17 |
+
return_value=[
|
| 18 |
+
Evidence(
|
| 19 |
+
citation=Citation(
|
| 20 |
+
title="Test Paper", url="http://test", date="2024", source="pubmed"
|
| 21 |
+
),
|
| 22 |
+
content="Metformin increases lifespan in mice.",
|
| 23 |
+
)
|
| 24 |
+
]
|
| 25 |
+
)
|
| 26 |
+
return handler
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
@pytest.fixture
|
| 30 |
+
def mock_judge_handler():
|
| 31 |
+
handler = MagicMock()
|
| 32 |
+
# Mock return value of assess
|
| 33 |
+
assessment = MagicMock()
|
| 34 |
+
assessment.sufficient = True
|
| 35 |
+
assessment.recommendation = "synthesize"
|
| 36 |
+
handler.assess = AsyncMock(return_value=assessment)
|
| 37 |
+
return handler
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
@pytest.mark.asyncio
|
| 41 |
+
async def test_simple_mode_e2e(mock_search_handler, mock_judge_handler):
|
| 42 |
+
"""Test Simple Mode Orchestration flow."""
|
| 43 |
+
orch = create_orchestrator(
|
| 44 |
+
search_handler=mock_search_handler,
|
| 45 |
+
judge_handler=mock_judge_handler,
|
| 46 |
+
mode="simple",
|
| 47 |
+
config=OrchestratorConfig(max_iterations=1),
|
| 48 |
+
)
|
| 49 |
+
|
| 50 |
+
# Run
|
| 51 |
+
results = []
|
| 52 |
+
async for event in orch.run("Test query"):
|
| 53 |
+
results.append(event)
|
| 54 |
+
|
| 55 |
+
assert len(results) > 0
|
| 56 |
+
assert mock_search_handler.execute.called
|
| 57 |
+
assert mock_judge_handler.assess.called
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
@pytest.mark.asyncio
|
| 61 |
+
async def test_advanced_mode_explicit_instantiation():
|
| 62 |
+
"""Test explicit Advanced Mode instantiation (not auto-detect).
|
| 63 |
+
|
| 64 |
+
This tests the explicit mode="advanced" path, verifying that
|
| 65 |
+
MagenticOrchestrator can be instantiated when explicitly requested.
|
| 66 |
+
The settings patch ensures any internal checks pass.
|
| 67 |
+
"""
|
| 68 |
+
with patch("src.orchestrator_factory.settings") as mock_settings:
|
| 69 |
+
# Settings patch ensures factory checks pass (even though mode is explicit)
|
| 70 |
+
mock_settings.has_openai_key = True
|
| 71 |
+
|
| 72 |
+
with patch("src.agents.magentic_agents.OpenAIChatClient"):
|
| 73 |
+
# Mock agent creation to avoid real API calls during init
|
| 74 |
+
with (
|
| 75 |
+
patch("src.orchestrator_magentic.create_search_agent"),
|
| 76 |
+
patch("src.orchestrator_magentic.create_judge_agent"),
|
| 77 |
+
patch("src.orchestrator_magentic.create_hypothesis_agent"),
|
| 78 |
+
patch("src.orchestrator_magentic.create_report_agent"),
|
| 79 |
+
):
|
| 80 |
+
# Explicit mode="advanced" - tests the explicit path, not auto-detect
|
| 81 |
+
orch = create_orchestrator(mode="advanced")
|
| 82 |
+
assert orch is not None
|
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Unit tests for Judge Factory and Model Selection."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import patch
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
+
pytestmark = pytest.mark.unit
|
| 8 |
+
from pydantic_ai.models.anthropic import AnthropicModel
|
| 9 |
+
|
| 10 |
+
# We expect this import to exist after we implement it, or we mock it if it's not there yet
|
| 11 |
+
# For TDD, we assume we will use the library class
|
| 12 |
+
from pydantic_ai.models.huggingface import HuggingFaceModel
|
| 13 |
+
from pydantic_ai.models.openai import OpenAIModel
|
| 14 |
+
|
| 15 |
+
from src.agent_factory.judges import get_model
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
@pytest.fixture
|
| 19 |
+
def mock_settings():
|
| 20 |
+
with patch("src.agent_factory.judges.settings", autospec=True) as mock_settings:
|
| 21 |
+
yield mock_settings
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def test_get_model_openai(mock_settings):
|
| 25 |
+
"""Test that OpenAI model is returned when provider is openai."""
|
| 26 |
+
mock_settings.llm_provider = "openai"
|
| 27 |
+
mock_settings.openai_api_key = "sk-test"
|
| 28 |
+
mock_settings.openai_model = "gpt-4o"
|
| 29 |
+
|
| 30 |
+
model = get_model()
|
| 31 |
+
assert isinstance(model, OpenAIModel)
|
| 32 |
+
assert model.model_name == "gpt-4o"
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def test_get_model_anthropic(mock_settings):
|
| 36 |
+
"""Test that Anthropic model is returned when provider is anthropic."""
|
| 37 |
+
mock_settings.llm_provider = "anthropic"
|
| 38 |
+
mock_settings.anthropic_api_key = "sk-ant-test"
|
| 39 |
+
mock_settings.anthropic_model = "claude-3-5-sonnet"
|
| 40 |
+
|
| 41 |
+
model = get_model()
|
| 42 |
+
assert isinstance(model, AnthropicModel)
|
| 43 |
+
assert model.model_name == "claude-3-5-sonnet"
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
def test_get_model_huggingface(mock_settings):
|
| 47 |
+
"""Test that HuggingFace model is returned when provider is huggingface."""
|
| 48 |
+
mock_settings.llm_provider = "huggingface"
|
| 49 |
+
mock_settings.hf_token = "hf_test_token"
|
| 50 |
+
mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
|
| 51 |
+
|
| 52 |
+
model = get_model()
|
| 53 |
+
assert isinstance(model, HuggingFaceModel)
|
| 54 |
+
assert model.model_name == "meta-llama/Llama-3.1-70B-Instruct"
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
def test_get_model_default_fallback(mock_settings):
|
| 58 |
+
"""Test fallback to OpenAI if provider is unknown."""
|
| 59 |
+
mock_settings.llm_provider = "unknown_provider"
|
| 60 |
+
mock_settings.openai_api_key = "sk-test"
|
| 61 |
+
mock_settings.openai_model = "gpt-4o"
|
| 62 |
+
|
| 63 |
+
model = get_model()
|
| 64 |
+
assert isinstance(model, OpenAIModel)
|
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Test that agent framework dependencies are importable and usable."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import MagicMock
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
+
pytestmark = pytest.mark.unit
|
| 8 |
+
|
| 9 |
+
# Import conditional on package availability, but for this test we expect it to be there
|
| 10 |
+
try:
|
| 11 |
+
from agent_framework import ChatAgent
|
| 12 |
+
from agent_framework.openai import OpenAIChatClient
|
| 13 |
+
except ImportError:
|
| 14 |
+
ChatAgent = None
|
| 15 |
+
OpenAIChatClient = None
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
@pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
|
| 19 |
+
def test_agent_framework_import():
|
| 20 |
+
"""Test that agent_framework can be imported."""
|
| 21 |
+
assert ChatAgent is not None
|
| 22 |
+
assert OpenAIChatClient is not None # Verify both imports work
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
@pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
|
| 26 |
+
def test_chat_agent_instantiation():
|
| 27 |
+
"""Test that ChatAgent can be instantiated with a mock client."""
|
| 28 |
+
mock_client = MagicMock()
|
| 29 |
+
# We assume ChatAgent takes chat_client as first argument based on _agents.py source
|
| 30 |
+
agent = ChatAgent(chat_client=mock_client, name="TestAgent")
|
| 31 |
+
assert agent.name == "TestAgent"
|
| 32 |
+
assert agent.chat_client == mock_client
|
|
@@ -0,0 +1,66 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Unit tests for Orchestrator Factory."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
|
| 7 |
+
pytestmark = pytest.mark.unit
|
| 8 |
+
|
| 9 |
+
from src.orchestrator import Orchestrator
|
| 10 |
+
from src.orchestrator_factory import create_orchestrator
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
@pytest.fixture
|
| 14 |
+
def mock_settings():
|
| 15 |
+
with patch("src.orchestrator_factory.settings", autospec=True) as mock_settings:
|
| 16 |
+
yield mock_settings
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
@pytest.fixture
|
| 20 |
+
def mock_magentic_cls():
|
| 21 |
+
with patch("src.orchestrator_factory._get_magentic_orchestrator_class") as mock:
|
| 22 |
+
# The mock returns a class (callable), which returns an instance
|
| 23 |
+
mock_class = MagicMock()
|
| 24 |
+
mock.return_value = mock_class
|
| 25 |
+
yield mock_class
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
@pytest.fixture
|
| 29 |
+
def mock_handlers():
|
| 30 |
+
return MagicMock(), MagicMock()
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
def test_create_orchestrator_simple_explicit(mock_settings, mock_handlers):
|
| 34 |
+
"""Test explicit simple mode."""
|
| 35 |
+
search, judge = mock_handlers
|
| 36 |
+
orch = create_orchestrator(search_handler=search, judge_handler=judge, mode="simple")
|
| 37 |
+
assert isinstance(orch, Orchestrator)
|
| 38 |
+
|
| 39 |
+
|
| 40 |
+
def test_create_orchestrator_advanced_explicit(mock_settings, mock_handlers, mock_magentic_cls):
|
| 41 |
+
"""Test explicit advanced mode."""
|
| 42 |
+
# Ensure has_openai_key is True so it doesn't error if we add checks
|
| 43 |
+
mock_settings.has_openai_key = True
|
| 44 |
+
|
| 45 |
+
orch = create_orchestrator(mode="advanced")
|
| 46 |
+
# verify instantiated
|
| 47 |
+
mock_magentic_cls.assert_called_once()
|
| 48 |
+
assert orch == mock_magentic_cls.return_value
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
def test_create_orchestrator_auto_advanced(mock_settings, mock_magentic_cls):
|
| 52 |
+
"""Test auto-detect advanced mode when OpenAI key exists."""
|
| 53 |
+
mock_settings.has_openai_key = True
|
| 54 |
+
|
| 55 |
+
orch = create_orchestrator()
|
| 56 |
+
mock_magentic_cls.assert_called_once()
|
| 57 |
+
assert orch == mock_magentic_cls.return_value
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def test_create_orchestrator_auto_simple(mock_settings, mock_handlers):
|
| 61 |
+
"""Test auto-detect simple mode when no paid keys."""
|
| 62 |
+
mock_settings.has_openai_key = False
|
| 63 |
+
|
| 64 |
+
search, judge = mock_handlers
|
| 65 |
+
orch = create_orchestrator(search_handler=search, judge_handler=judge)
|
| 66 |
+
assert isinstance(orch, Orchestrator)
|