VibecoderMcSwaggins commited on
Commit
b2929fc
·
unverified ·
1 Parent(s): b72f9f1

feat: implement dual-mode architecture (Simple + Advanced) (#45)

Browse files

* docs: add dual-mode architecture specification

Senior agent reviewed and approved. Key documents:
- 00_SITUATION_AND_PLAN.md: Problem analysis, branch states, recommended path
- 01_ARCHITECTURE_SPEC.md: Dual-mode architecture (Simple + Advanced)
- 02_IMPLEMENTATION_PHASES.md: 6-phase implementation plan
- 03_IMMEDIATE_ACTIONS.md: Quick reference checklist

Architecture: pydantic-ai (structured outputs) + Microsoft Agent Framework
(orchestration) are COMPLEMENTARY, not competing. Dual-mode allows
graceful degradation to free tier when no API keys available.

* docs: add follow-up review request for senior agent verification

* feat: implement dual-mode architecture (Simple + Advanced)

Phase 1 - Pydantic-AI Improvements (Simple Mode):
- Add HuggingFace provider support in judges.py with get_model()
- Add huggingface_model and hf_token config fields
- Tests in test_judges_factory.py

Phase 2 - Orchestrator Factory:
- Implement create_orchestrator() with auto-detection logic
- Simple mode for free tier, Advanced mode when OpenAI key present
- Lazy loading of MagenticOrchestrator to avoid hard dependency
- Tests in test_orchestrator_factory.py

Phase 3 - Agent Framework Integration:
- Use agent-framework-core from PyPI (Microsoft package)
- Verify imports work with test_agent_imports.py

Phase 4 - UI Updates:
- Rename "magentic" to "advanced" in app.py
- Update mode selection labels and descriptions

All 126 unit tests pass. Lint and type checks clean.

* fix: address CodeRabbit review feedback

- Add pytestmark to integration tests (integration, slow markers)
- Add pytestmark to unit tests (unit marker)
- Fix unused OpenAIChatClient import by adding assertion
- Update docs spec to match actual factory implementation
- Add code fence languages (text) to markdown blocks

Note: CodeRabbit incorrectly flagged has_openai_key as a method
when it's actually a @property that returns bool correctly.

All 126 unit tests pass.

* fix: address remaining CodeRabbit nitpicks

- Add 'text' language to ASCII diagram code blocks in docs
- Update Advanced Mode trigger description to clarify OpenAI-only
- Rename and clarify test_advanced_mode_explicit_instantiation
- Improve test docstring explaining explicit vs auto-detect path

All 128 tests pass.

docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
2
+
3
+ **Date:** November 27, 2025
4
+ **Status:** ACTIVE DECISION REQUIRED
5
+ **Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
6
+
7
+ ---
8
+
9
+ ## 1. The Problem
10
+
11
+ We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
12
+
13
+ **They are not.** They are complementary:
14
+ - **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
15
+ - **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
16
+
17
+ ---
18
+
19
+ ## 2. Current Branch State
20
+
21
+ | Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
22
+ |--------|----------|---------------------|------------------------------|--------|
23
+ | `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
24
+ | `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
25
+ | `origin/main` | GitHub | YES | NO | **SAFE** |
26
+ | `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
27
+ | `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
28
+ | Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
29
+
30
+ ### Key Files at Risk
31
+
32
+ **On `origin/dev` (PRESERVED):**
33
+ ```text
34
+ src/agents/
35
+ ├── analysis_agent.py # StatisticalAnalyzer wrapper
36
+ ├── hypothesis_agent.py # Hypothesis generation
37
+ ├── judge_agent.py # JudgeHandler wrapper
38
+ ├── magentic_agents.py # Multi-agent definitions
39
+ ├── report_agent.py # Report synthesis
40
+ ├── search_agent.py # SearchHandler wrapper
41
+ ├── state.py # Thread-safe state management
42
+ └── tools.py # @ai_function decorated tools
43
+
44
+ src/orchestrator_magentic.py # Multi-agent orchestrator
45
+ src/utils/llm_factory.py # Centralized LLM client factory
46
+ ```
47
+
48
+ **Deleted in refactor branch (would be lost if merged):**
49
+ - All of the above
50
+
51
+ ---
52
+
53
+ ## 3. Target Architecture
54
+
55
+ ```text
56
+ ┌─────────────────────────────────────────────────────────────────┐
57
+ │ Microsoft Agent Framework (Orchestration Layer) │
58
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
59
+ │ │ SearchAgent │→ │ JudgeAgent │→ │ ReportAgent │ │
60
+ │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
61
+ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
62
+ │ │ │ │ │
63
+ │ ▼ ▼ ▼ │
64
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
65
+ │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
66
+ │ │ Agent() │ │ Agent() │ │ Agent() │ │
67
+ │ │ output_type= │ │ output_type= │ │ output_type= │ │
68
+ │ │ SearchResult │ │ JudgeAssess │ │ Report │ │
69
+ │ └──────────────┘ └──────────────┘ └──────────────┘ │
70
+ └─────────────────────────────────────────────────────────────────┘
71
+ ```
72
+
73
+ **Why this architecture:**
74
+ 1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
75
+ 2. **pydantic-ai** handles: type-safe LLM calls within each agent
76
+
77
+ ---
78
+
79
+ ## 4. CRITICAL: Naming Confusion Clarification
80
+
81
+ > **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
82
+
83
+ **The naming confusion:**
84
+ - `magentic` (PyPI package): A different library for structured LLM outputs
85
+ - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
86
+ - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
87
+
88
+ **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
89
+
90
+ ---
91
+
92
+ ## 5. What the Refactor DID Get Right
93
+
94
+ The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
95
+
96
+ 1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
97
+ 2. **HuggingFace free tier support** - `HuggingFaceModel` integration
98
+ 3. **Test fix** - Properly mocks `HuggingFaceModel` class
99
+ 4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
100
+
101
+ **What it got WRONG:**
102
+ 1. Deleted `src/agents/` entirely instead of refactoring them
103
+ 2. Deleted `src/orchestrator_magentic.py` instead of fixing it
104
+ 3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
105
+
106
+ ---
107
+
108
+ ## 6. Options for Path Forward
109
+
110
+ ### Option A: Abandon Refactor, Start Fresh
111
+ - Close PR #41
112
+ - Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
113
+ - Reset local `dev` to match `origin/dev`
114
+ - Cherry-pick ONLY the good parts (judges.py improvements, HF support)
115
+ - **Pros:** Clean, safe
116
+ - **Cons:** Lose some work, need to redo carefully
117
+
118
+ ### Option B: Cherry-Pick Good Parts to origin/dev
119
+ - Do NOT merge PR #41
120
+ - Create new branch from `origin/dev`
121
+ - Cherry-pick specific commits/changes that improve pydantic-ai usage
122
+ - Keep agent framework code intact
123
+ - **Pros:** Preserves both, surgical
124
+ - **Cons:** Requires careful file-by-file review
125
+
126
+ ### Option C: Revert Deletions in Refactor Branch
127
+ - On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
128
+ - Keep the pydantic-ai improvements
129
+ - Merge THAT to dev
130
+ - **Pros:** Gets both
131
+ - **Cons:** Complex git operations, risk of conflicts
132
+
133
+ ---
134
+
135
+ ## 7. Recommended Action: Option B (Cherry-Pick)
136
+
137
+ **Step-by-step:**
138
+
139
+ 1. **Close PR #41** (do not merge)
140
+ 2. **Delete redundant branches:**
141
+ - `refactor/pydantic-unification` (local)
142
+ - Reset local `dev` to `origin/dev`
143
+ 3. **Create new branch from origin/dev:**
144
+ ```bash
145
+ git checkout -b feat/pydantic-ai-improvements origin/dev
146
+ ```
147
+ 4. **Cherry-pick or manually port these improvements:**
148
+ - `src/agent_factory/judges.py` - the unified `get_model()` function
149
+ - `examples/free_tier_demo.py` - HuggingFace demo
150
+ - Test improvements
151
+ 5. **Do NOT delete any agent framework files**
152
+ 6. **Create PR for review**
153
+
154
+ ---
155
+
156
+ ## 8. Files to Cherry-Pick (Safe Improvements)
157
+
158
+ | File | What Changed | Safe to Port? |
159
+ |------|-------------|---------------|
160
+ | `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
161
+ | `examples/free_tier_demo.py` | New demo for HF inference | YES |
162
+ | `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
163
+ | `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
164
+
165
+ ---
166
+
167
+ ## 9. Questions to Answer Before Proceeding
168
+
169
+ 1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
170
+ 2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
171
+ 3. **Timeline**: How much time do we have to get this right?
172
+
173
+ ---
174
+
175
+ ## 10. Immediate Actions (DO NOW)
176
+
177
+ - [ ] **DO NOT merge PR #41**
178
+ - [ ] Close PR #41 with comment explaining the situation
179
+ - [ ] Do not push local `dev` branch anywhere
180
+ - [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
181
+
182
+ ---
183
+
184
+ ## 11. Decision Log
185
+
186
+ | Date | Decision | Rationale |
187
+ |------|----------|-----------|
188
+ | 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
189
+ | TBD | ? | Awaiting decision on path forward |
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Architecture Specification: Dual-Mode Agent System
2
+
3
+ **Date:** November 27, 2025
4
+ **Status:** SPECIFICATION
5
+ **Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
6
+
7
+ ---
8
+
9
+ ## 1. Core Concept: Two Operating Modes
10
+
11
+ ```text
12
+ ┌─────────────────────────────────────────────────────────────────────┐
13
+ │ USER REQUEST │
14
+ │ │ │
15
+ │ ▼ │
16
+ │ ┌─────────────────┐ │
17
+ │ │ Mode Selection │ │
18
+ │ │ (Auto-detect) │ │
19
+ │ └────────┬────────┘ │
20
+ │ │ │
21
+ │ ┌───────────────┴───────────────┐ │
22
+ │ │ │ │
23
+ │ ▼ ▼ │
24
+ │ ┌─────────────────┐ ┌─────────────────┐ │
25
+ │ │ SIMPLE MODE │ │ ADVANCED MODE │ │
26
+ │ │ (Free Tier) │ │ (Paid Tier) │ │
27
+ │ │ │ │ │ │
28
+ │ │ pydantic-ai │ │ MS Agent Fwk │ │
29
+ │ │ single-agent │ │ + pydantic-ai │ │
30
+ │ │ loop │ │ multi-agent │ │
31
+ │ └─────────────────┘ └─────────────────┘ │
32
+ │ │ │ │
33
+ │ └───────────────┬───────────────┘ │
34
+ │ ▼ │
35
+ │ ┌─────────────────┐ │
36
+ │ │ Research Report │ │
37
+ │ │ with Citations │ │
38
+ │ └─────────────────┘ │
39
+ └─────────────────────────────────────────────────────────────────────┘
40
+ ```
41
+
42
+ ---
43
+
44
+ ## 2. Mode Comparison
45
+
46
+ | Aspect | Simple Mode | Advanced Mode |
47
+ |--------|-------------|---------------|
48
+ | **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
49
+ | **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
50
+ | **Architecture** | Single orchestrator loop | Multi-agent coordination |
51
+ | **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
52
+ | **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
53
+ | **Quality** | Good (functional) | Better (specialized agents, coordination) |
54
+ | **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
55
+ | **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
56
+
57
+ ---
58
+
59
+ ## 3. Simple Mode Architecture (pydantic-ai Only)
60
+
61
+ ```text
62
+ ┌─────────────────────────────────────────────────────┐
63
+ │ Orchestrator │
64
+ │ │
65
+ │ while not sufficient and iteration < max: │
66
+ │ 1. SearchHandler.execute(query) │
67
+ │ 2. JudgeHandler.assess(evidence) ◄── pydantic-ai Agent │
68
+ │ 3. if sufficient: break │
69
+ │ 4. query = judge.next_queries │
70
+ │ │
71
+ │ return ReportGenerator.generate(evidence) │
72
+ └─────────────────────────────────────────────────────┘
73
+ ```
74
+
75
+ **Components:**
76
+ - `src/orchestrator.py` - Simple loop orchestrator
77
+ - `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
78
+ - `src/tools/search_handler.py` - Scatter-gather search
79
+ - `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
80
+
81
+ ---
82
+
83
+ ## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
84
+
85
+ ```text
86
+ ┌─────────────────────────────────────────────────────────────────────┐
87
+ │ Microsoft Agent Framework Orchestrator │
88
+ │ │
89
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
90
+ │ │ SearchAgent │───▶│ JudgeAgent │───▶│ ReportAgent │ │
91
+ │ │ (BaseAgent) │ │ (BaseAgent) │ │ (BaseAgent) │ │
92
+ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
93
+ │ │ │ │ │
94
+ │ ▼ ▼ ▼ │
95
+ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
96
+ │ │ pydantic-ai │ │ pydantic-ai │ │ pydantic-ai │ │
97
+ │ │ Agent() │ │ Agent() │ │ Agent() │ │
98
+ │ │ output_type=│ │ output_type=│ │ output_type=│ │
99
+ │ │ SearchResult│ │ JudgeAssess │ │ Report │ │
100
+ │ └─────────────┘ └─────────────┘ └─────────────┘ │
101
+ │ │
102
+ │ Shared State: MagenticState (thread-safe via contextvars) │
103
+ │ - evidence: list[Evidence] │
104
+ │ - embedding_service: EmbeddingService │
105
+ └─────────────────────────────────────────────────────────────────────┘
106
+ ```
107
+
108
+ **Components:**
109
+ - `src/orchestrator_magentic.py` - Multi-agent orchestrator
110
+ - `src/agents/search_agent.py` - SearchAgent (BaseAgent)
111
+ - `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
112
+ - `src/agents/report_agent.py` - ReportAgent (BaseAgent)
113
+ - `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
114
+ - `src/agents/state.py` - Thread-safe state management
115
+ - `src/agents/tools.py` - @ai_function decorated tools
116
+
117
+ ---
118
+
119
+ ## 5. Mode Selection Logic
120
+
121
+ ```python
122
+ # src/orchestrator_factory.py (actual implementation)
123
+
124
+ def create_orchestrator(
125
+ search_handler: SearchHandlerProtocol | None = None,
126
+ judge_handler: JudgeHandlerProtocol | None = None,
127
+ config: OrchestratorConfig | None = None,
128
+ mode: Literal["simple", "magentic", "advanced"] | None = None,
129
+ ) -> Any:
130
+ """
131
+ Auto-select orchestrator based on available credentials.
132
+
133
+ Priority:
134
+ 1. If mode explicitly set, use that
135
+ 2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
136
+ 3. Otherwise -> Simple Mode (HuggingFace free tier)
137
+ """
138
+ effective_mode = _determine_mode(mode)
139
+
140
+ if effective_mode == "advanced":
141
+ orchestrator_cls = _get_magentic_orchestrator_class()
142
+ return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
143
+
144
+ # Simple mode requires handlers
145
+ if search_handler is None or judge_handler is None:
146
+ raise ValueError("Simple mode requires search_handler and judge_handler")
147
+
148
+ return Orchestrator(
149
+ search_handler=search_handler,
150
+ judge_handler=judge_handler,
151
+ config=config,
152
+ )
153
+ ```
154
+
155
+ ---
156
+
157
+ ## 6. Shared Components (Both Modes Use)
158
+
159
+ These components work in both modes:
160
+
161
+ | Component | Purpose |
162
+ |-----------|---------|
163
+ | `src/tools/pubmed.py` | PubMed search |
164
+ | `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
165
+ | `src/tools/europepmc.py` | Europe PMC search |
166
+ | `src/tools/search_handler.py` | Scatter-gather orchestration |
167
+ | `src/tools/rate_limiter.py` | Rate limiting |
168
+ | `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
169
+ | `src/utils/config.py` | Settings |
170
+ | `src/services/embeddings.py` | Vector search (optional) |
171
+
172
+ ---
173
+
174
+ ## 7. pydantic-ai Integration Points
175
+
176
+ Both modes use pydantic-ai for structured LLM outputs:
177
+
178
+ ```python
179
+ # In JudgeHandler (both modes)
180
+ from pydantic_ai import Agent
181
+ from pydantic_ai.models.huggingface import HuggingFaceModel
182
+ from pydantic_ai.models.openai import OpenAIModel
183
+ from pydantic_ai.models.anthropic import AnthropicModel
184
+
185
+ class JudgeHandler:
186
+ def __init__(self, model: Any = None):
187
+ self.model = model or get_model() # Auto-selects based on config
188
+ self.agent = Agent(
189
+ model=self.model,
190
+ output_type=JudgeAssessment, # Structured output!
191
+ system_prompt=SYSTEM_PROMPT,
192
+ )
193
+
194
+ async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
195
+ result = await self.agent.run(format_prompt(question, evidence))
196
+ return result.output # Guaranteed to be JudgeAssessment
197
+ ```
198
+
199
+ ---
200
+
201
+ ## 8. Microsoft Agent Framework Integration Points
202
+
203
+ Advanced mode wraps pydantic-ai agents in BaseAgent:
204
+
205
+ ```python
206
+ # In JudgeAgent (advanced mode only)
207
+ from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
208
+
209
+ class JudgeAgent(BaseAgent):
210
+ def __init__(self, judge_handler: JudgeHandlerProtocol):
211
+ super().__init__(
212
+ name="JudgeAgent",
213
+ description="Evaluates evidence quality",
214
+ )
215
+ self._handler = judge_handler # Uses pydantic-ai internally
216
+
217
+ async def run(self, messages, **kwargs) -> AgentRunResponse:
218
+ question = extract_question(messages)
219
+ evidence = self._evidence_store.get("current", [])
220
+
221
+ # Delegate to pydantic-ai powered handler
222
+ assessment = await self._handler.assess(question, evidence)
223
+
224
+ return AgentRunResponse(
225
+ messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
226
+ additional_properties={"assessment": assessment.model_dump()},
227
+ )
228
+ ```
229
+
230
+ ---
231
+
232
+ ## 9. Benefits of This Architecture
233
+
234
+ 1. **Graceful Degradation**: Works without API keys (free tier)
235
+ 2. **Progressive Enhancement**: Better with API keys (orchestration)
236
+ 3. **Code Reuse**: pydantic-ai handlers shared between modes
237
+ 4. **Hackathon Ready**: Demo works without requiring paid keys
238
+ 5. **Production Ready**: Full orchestration available when needed
239
+ 6. **Future Proof**: Can add more agents to advanced mode
240
+ 7. **Testable**: Simple mode is easier to unit test
241
+
242
+ ---
243
+
244
+ ## 10. Known Risks and Mitigations
245
+
246
+ > **From Senior Agent Review**
247
+
248
+ ### 10.1 Bridge Complexity (MEDIUM)
249
+
250
+ **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
251
+
252
+ **Mitigation:**
253
+ - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
254
+ - Test context propagation explicitly in integration tests
255
+ - If issues arise, pass state explicitly rather than via context vars
256
+
257
+ ### 10.2 Integration Drift (MEDIUM)
258
+
259
+ **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
260
+
261
+ **Mitigation:**
262
+ - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
263
+ - Handlers are the single source of truth for business logic
264
+ - Agents are thin wrappers that delegate to handlers
265
+
266
+ ### 10.3 Testing Burden (LOW-MEDIUM)
267
+
268
+ **Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
269
+
270
+ **Mitigation:**
271
+ - Unit test handlers independently (shared code)
272
+ - Integration tests for each mode separately
273
+ - End-to-end tests verify same output for same input (determinism permitting)
274
+
275
+ ### 10.4 Dependency Conflicts (LOW)
276
+
277
+ **Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
278
+
279
+ **Status:** Both use `pydantic>=2.x`. Should be compatible.
280
+
281
+ ---
282
+
283
+ ## 11. Naming Clarification
284
+
285
+ > See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
286
+
287
+ **Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
288
+
289
+ **Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.
docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementation Phases: Dual-Mode Agent System
2
+
3
+ **Date:** November 27, 2025
4
+ **Status:** IMPLEMENTATION PLAN (REVISED)
5
+ **Strategy:** TDD (Test-Driven Development), SOLID Principles
6
+ **Dependency Strategy:** PyPI (agent-framework-core)
7
+
8
+ ---
9
+
10
+ ## Phase 0: Environment Validation & Cleanup
11
+
12
+ **Goal:** Ensure clean state and dependencies are correctly installed.
13
+
14
+ ### Step 0.1: Verify PyPI Package
15
+ The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
16
+
17
+ ```bash
18
+ uv sync --all-extras
19
+ python -c "from agent_framework import ChatAgent; print('OK')"
20
+ ```
21
+
22
+ ### Step 0.2: Branch State
23
+ We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
24
+
25
+ **Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
26
+ The production dependency uses the official PyPI release.
27
+
28
+ ---
29
+
30
+ ## Phase 1: Pydantic-AI Improvements (Simple Mode)
31
+
32
+ **Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
33
+
34
+ ### Step 1.1: Test First (Red)
35
+ Create `tests/unit/agent_factory/test_judges_factory.py`:
36
+ - Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
37
+ - Test `get_model()` respects `HF_TOKEN`.
38
+ - Test fallback to OpenAI.
39
+
40
+ ### Step 1.2: Implementation (Green)
41
+ Update `src/utils/config.py`:
42
+ - Add `huggingface_model` and `hf_token` fields.
43
+
44
+ Update `src/agent_factory/judges.py`:
45
+ - Implement `get_model` with the logic derived from the tests.
46
+ - Use dependency injection for the model where possible.
47
+
48
+ ### Step 1.3: Refactor
49
+ Ensure `JudgeHandler` is loosely coupled from the specific model provider.
50
+
51
+ ---
52
+
53
+ ## Phase 2: Orchestrator Factory (The Switch)
54
+
55
+ **Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
56
+
57
+ ### Step 2.1: Test First (Red)
58
+ Create `tests/unit/test_orchestrator_factory.py`:
59
+ - Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
60
+ - Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
61
+ - Test explicit mode override.
62
+
63
+ ### Step 2.2: Implementation (Green)
64
+ Update `src/orchestrator_factory.py` to implement the selection logic.
65
+
66
+ ---
67
+
68
+ ## Phase 3: Agent Framework Integration (Advanced Mode)
69
+
70
+ **Goal:** Integrate Microsoft Agent Framework from PyPI.
71
+
72
+ ### Step 3.1: Dependency Management
73
+ The `agent-framework-core` package is installed from PyPI:
74
+ ```toml
75
+ [project.optional-dependencies]
76
+ magentic = [
77
+ "agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
78
+ ]
79
+ ```
80
+ Install with: `uv sync --all-extras`
81
+
82
+ ### Step 3.2: Verify Imports (Test First)
83
+ Create `tests/unit/agents/test_agent_imports.py`:
84
+ - Verify `from agent_framework import ChatAgent` works.
85
+ - Verify instantiation of `ChatAgent` with a mock client.
86
+
87
+ ### Step 3.3: Update Agents
88
+ Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
89
+ - **SOLID:** Ensure agents have single responsibilities.
90
+ - **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
91
+
92
+ ---
93
+
94
+ ## Phase 4: UI & End-to-End Verification
95
+
96
+ **Goal:** Update Gradio to reflect the active mode.
97
+
98
+ ### Step 4.1: UI Updates
99
+ Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
100
+
101
+ ### Step 4.2: End-to-End Test
102
+ Run the full loop:
103
+ 1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
104
+ 2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
105
+
106
+ ---
107
+
108
+ ## Phase 5: Cleanup & Documentation
109
+
110
+ - Remove unused code.
111
+ - Update main README.md.
112
+ - Final `make check`.
docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Immediate Actions Checklist
2
+
3
+ **Date:** November 27, 2025
4
+ **Priority:** Execute in order
5
+
6
+ ---
7
+
8
+ ## Before Starting Implementation
9
+
10
+ ### 1. Close PR #41 (CRITICAL)
11
+
12
+ ```bash
13
+ gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
14
+ ```
15
+
16
+ ### 2. Verify HuggingFace Spaces is Safe
17
+
18
+ ```bash
19
+ # Should show agent framework files exist
20
+ git ls-tree --name-only huggingface-upstream/dev -- src/agents/
21
+ git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
22
+ ```
23
+
24
+ Expected output: Files should exist (they do as of this writing).
25
+
26
+ ### 3. Clean Local Environment
27
+
28
+ ```bash
29
+ # Switch to main first
30
+ git checkout main
31
+
32
+ # Delete problematic branches
33
+ git branch -D refactor/pydantic-unification 2>/dev/null || true
34
+ git branch -D feat/pubmed-fulltext 2>/dev/null || true
35
+
36
+ # Reset local dev to origin/dev
37
+ git branch -D dev 2>/dev/null || true
38
+ git checkout -b dev origin/dev
39
+
40
+ # Verify agent framework code exists
41
+ ls src/agents/
42
+ # Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
43
+ # magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
44
+
45
+ ls src/orchestrator_magentic.py
46
+ # Expected: file exists
47
+ ```
48
+
49
+ ### 4. Create Fresh Feature Branch
50
+
51
+ ```bash
52
+ git checkout -b feat/dual-mode-architecture origin/dev
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Decision Points
58
+
59
+ Before proceeding, confirm:
60
+
61
+ 1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
62
+ - Simple mode = faster to implement, works today
63
+ - Advanced mode = better quality, more work
64
+
65
+ 2. **Timeline**: How much time do we have?
66
+ - If < 1 day: Focus on simple mode only
67
+ - If > 1 day: Implement dual-mode
68
+
69
+ 3. **Dependencies**: Is `agent-framework-core` available?
70
+ - Check: `pip index versions agent-framework-core`
71
+ - If not on PyPI, may need to install from GitHub
72
+
73
+ ---
74
+
75
+ ## Quick Start (Simple Mode Only)
76
+
77
+ If time is limited, implement only simple mode improvements:
78
+
79
+ ```bash
80
+ # On feat/dual-mode-architecture branch
81
+
82
+ # 1. Update judges.py to add HuggingFace support
83
+ # 2. Update config.py to add HF settings
84
+ # 3. Create free_tier_demo.py
85
+ # 4. Run make check
86
+ # 5. Create PR to dev
87
+ ```
88
+
89
+ This gives you free-tier capability without touching agent framework code.
90
+
91
+ ---
92
+
93
+ ## Quick Start (Full Dual-Mode)
94
+
95
+ If time permits, implement full dual-mode:
96
+
97
+ Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
98
+
99
+ ---
100
+
101
+ ## Emergency Rollback
102
+
103
+ If anything goes wrong:
104
+
105
+ ```bash
106
+ # Reset to safe state
107
+ git checkout main
108
+ git branch -D feat/dual-mode-architecture
109
+ git checkout -b feat/dual-mode-architecture origin/dev
110
+ ```
111
+
112
+ Origin/dev is the safe fallback - it has agent framework intact.
docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Follow-Up Review Request: Did We Implement Your Feedback?
2
+
3
+ **Date:** November 27, 2025
4
+ **Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
5
+
6
+ ---
7
+
8
+ ## Your Original Feedback vs Our Changes
9
+
10
+ ### 1. Naming Confusion Clarification
11
+
12
+ **Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
13
+
14
+ **Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
15
+ ```markdown
16
+ ## 4. CRITICAL: Naming Confusion Clarification
17
+
18
+ > **Senior Agent Review Finding:** The codebase uses "magentic" in file names
19
+ > (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
20
+ > the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
21
+
22
+ **The naming confusion:**
23
+ - `magentic` (PyPI package): A different library for structured LLM outputs
24
+ - "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
25
+ - `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
26
+
27
+ **Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
28
+ ```
29
+
30
+ **Status:** ✅ IMPLEMENTED
31
+
32
+ ---
33
+
34
+ ### 2. Bridge Complexity Warning
35
+
36
+ **Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
37
+
38
+ **Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
39
+ ```markdown
40
+ ### 10.1 Bridge Complexity (MEDIUM)
41
+
42
+ **Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
43
+ Both are async. Context variables (`MagenticState`) must propagate correctly.
44
+
45
+ **Mitigation:**
46
+ - pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
47
+ - Test context propagation explicitly in integration tests
48
+ - If issues arise, pass state explicitly rather than via context vars
49
+ ```
50
+
51
+ **Status:** ✅ IMPLEMENTED
52
+
53
+ ---
54
+
55
+ ### 3. Integration Drift Warning
56
+
57
+ **Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
58
+
59
+ **Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
60
+ ```markdown
61
+ ### 10.2 Integration Drift (MEDIUM)
62
+
63
+ **Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
64
+
65
+ **Mitigation:**
66
+ - Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
67
+ - Handlers are the single source of truth for business logic
68
+ - Agents are thin wrappers that delegate to handlers
69
+ ```
70
+
71
+ **Status:** ✅ IMPLEMENTED
72
+
73
+ ---
74
+
75
+ ### 4. Testing Burden Warning
76
+
77
+ **Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
78
+
79
+ **Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
80
+ ```markdown
81
+ ### 10.3 Testing Burden (LOW-MEDIUM)
82
+
83
+ **Risk:** Two distinct orchestrators doubles integration testing surface area.
84
+
85
+ **Mitigation:**
86
+ - Unit test handlers independently (shared code)
87
+ - Integration tests for each mode separately
88
+ - End-to-end tests verify same output for same input
89
+ ```
90
+
91
+ **Status:** ✅ IMPLEMENTED
92
+
93
+ ---
94
+
95
+ ### 5. Rename Recommendation
96
+
97
+ **Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
98
+
99
+ **Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
100
+ ```markdown
101
+ ### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
102
+
103
+ > **Senior Agent Recommendation:** Rename files to eliminate confusion.
104
+
105
+ git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
106
+ git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
107
+
108
+ **Note:** This is optional for the hackathon. Can be done in a follow-up PR.
109
+ ```
110
+
111
+ **Status:** ✅ DOCUMENTED (marked as optional for hackathon)
112
+
113
+ ---
114
+
115
+ ### 6. Standardize Wrapper Recommendation
116
+
117
+ **Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
118
+
119
+ **Our change:** NOT YET DOCUMENTED
120
+
121
+ **Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
122
+
123
+ ---
124
+
125
+ ## Questions for Your Review
126
+
127
+ 1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
128
+
129
+ 2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
130
+
131
+ 3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
132
+ - Pin to a specific version?
133
+ - Use a version range?
134
+ - Install from GitHub source?
135
+
136
+ 4. **Anything else we missed?**
137
+
138
+ ---
139
+
140
+ ## Files to Re-Review
141
+
142
+ 1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
143
+ 2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
144
+ 3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
145
+
146
+ ---
147
+
148
+ ## Current Branch State
149
+
150
+ We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
151
+ - ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
152
+ - ✅ Documentation committed
153
+ - ❌ PR #41 still open (need to close it)
154
+ - ❌ Cherry-pick of pydantic-ai improvements not yet done
155
+
156
+ ---
157
+
158
+ Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?
docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Senior Agent Review Prompt
2
+
3
+ Copy and paste everything below this line to a fresh Claude/AI session:
4
+
5
+ ---
6
+
7
+ ## Context
8
+
9
+ I am a junior developer working on a HuggingFace hackathon project called DeepCritical. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
10
+
11
+ ## The Situation
12
+
13
+ We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
14
+
15
+ **They are not.** They are complementary:
16
+ - `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
17
+ - `agent-framework` orchestrates multiple agents working together (coordination layer)
18
+
19
+ We now want to implement a **dual-mode architecture** where:
20
+ - **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
21
+ - **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
22
+
23
+ ## Your Task
24
+
25
+ Please perform a **deep, critical review** of:
26
+
27
+ 1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
28
+ 2. **Our documentation** (4 files listed below)
29
+ 3. **The actual codebase** to verify our claims
30
+
31
+ ## Specific Questions to Answer
32
+
33
+ ### Architecture Validation
34
+ 1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
35
+ 2. Does the dual-mode architecture diagram accurately represent how these should integrate?
36
+ 3. Are there any architectural flaws or anti-patterns in our proposed design?
37
+
38
+ ### Documentation Accuracy
39
+ 4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
40
+ 5. Is our understanding of what code exists where correct?
41
+ 6. Are the implementation phases realistic and in the correct order?
42
+ 7. Are there any missing steps or dependencies we overlooked?
43
+
44
+ ### Codebase Reality Check
45
+ 8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
46
+ - `git ls-tree origin/dev -- src/agents/`
47
+ - `git ls-tree origin/dev -- src/orchestrator_magentic.py`
48
+ 9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
49
+ 10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
50
+
51
+ ### Implementation Feasibility
52
+ 11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
53
+ 12. Is the mode auto-detection logic sound?
54
+ 13. What are the risks we haven't identified?
55
+
56
+ ### Critical Errors Check
57
+ 14. Did we miss anything critical in our analysis?
58
+ 15. Are there any factual errors in our documentation?
59
+ 16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
60
+
61
+ ## Files to Review
62
+
63
+ Please read these files in order:
64
+
65
+ 1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
66
+ 2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
67
+ 3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
68
+ 4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
69
+
70
+ And the architecture diagram:
71
+ 5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/assets/magentic-pydantic.png`
72
+
73
+ ## Reference Repositories to Consult
74
+
75
+ We have local clones of the source-of-truth repositories:
76
+
77
+ - **Original DeepCritical:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/DeepCritical/`
78
+ - **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/agent-framework/`
79
+ - **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/autogen-microsoft/`
80
+
81
+ Please cross-reference our hackathon fork against these to verify architectural alignment.
82
+
83
+ ## Codebase to Analyze
84
+
85
+ Our hackathon fork is at:
86
+ `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/`
87
+
88
+ Key files to examine:
89
+ - `src/agents/` - Agent framework integration
90
+ - `src/agent_factory/judges.py` - pydantic-ai integration
91
+ - `src/orchestrator.py` - Simple mode orchestrator
92
+ - `src/orchestrator_magentic.py` - Advanced mode orchestrator
93
+ - `src/orchestrator_factory.py` - Mode selection
94
+ - `pyproject.toml` - Dependencies
95
+
96
+ ## Expected Output
97
+
98
+ Please provide:
99
+
100
+ 1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
101
+ 2. **Errors Found:** List any factual errors in our documentation
102
+ 3. **Missing Items:** What did we overlook?
103
+ 4. **Risk Assessment:** What could go wrong?
104
+ 5. **Recommended Changes:** Specific edits to our documentation or plan
105
+ 6. **Go/No-Go Recommendation:** Should we proceed with this plan?
106
+
107
+ ## Tone
108
+
109
+ Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
110
+
111
+ ---
112
+
113
+ END OF PROMPT
pyproject.toml CHANGED
@@ -44,7 +44,7 @@ dev = [
44
  "pre-commit>=3.7",
45
  ]
46
  magentic = [
47
- "agent-framework-core>=1.0.0b251120,<2.0.0", # Pin to avoid breaking changes
48
  ]
49
  embeddings = [
50
  "chromadb>=0.4.0",
 
44
  "pre-commit>=3.7",
45
  ]
46
  magentic = [
47
+ "agent-framework-core>=1.0.0b251120,<2.0.0", # Microsoft Agent Framework (PyPI)
48
  ]
49
  embeddings = [
50
  "chromadb>=0.4.0",
src/agent_factory/judges.py CHANGED
@@ -8,8 +8,10 @@ import structlog
8
  from huggingface_hub import InferenceClient
9
  from pydantic_ai import Agent
10
  from pydantic_ai.models.anthropic import AnthropicModel
 
11
  from pydantic_ai.models.openai import OpenAIModel
12
  from pydantic_ai.providers.anthropic import AnthropicProvider
 
13
  from pydantic_ai.providers.openai import OpenAIProvider
14
  from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
15
 
@@ -36,6 +38,12 @@ def get_model() -> Any:
36
  provider = AnthropicProvider(api_key=settings.anthropic_api_key)
37
  return AnthropicModel(settings.anthropic_model, provider=provider)
38
 
 
 
 
 
 
 
39
  if llm_provider != "openai":
40
  logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
41
 
 
8
  from huggingface_hub import InferenceClient
9
  from pydantic_ai import Agent
10
  from pydantic_ai.models.anthropic import AnthropicModel
11
+ from pydantic_ai.models.huggingface import HuggingFaceModel
12
  from pydantic_ai.models.openai import OpenAIModel
13
  from pydantic_ai.providers.anthropic import AnthropicProvider
14
+ from pydantic_ai.providers.huggingface import HuggingFaceProvider
15
  from pydantic_ai.providers.openai import OpenAIProvider
16
  from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
17
 
 
38
  provider = AnthropicProvider(api_key=settings.anthropic_api_key)
39
  return AnthropicModel(settings.anthropic_model, provider=provider)
40
 
41
+ if llm_provider == "huggingface":
42
+ # Free tier - uses HF_TOKEN from environment if available
43
+ model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
44
+ hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
45
+ return HuggingFaceModel(model_name, provider=hf_provider)
46
+
47
  if llm_provider != "openai":
48
  logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)
49
 
src/app.py CHANGED
@@ -31,7 +31,7 @@ def configure_orchestrator(
31
 
32
  Args:
33
  use_mock: If True, use MockJudgeHandler (no API key needed)
34
- mode: Orchestrator mode ("simple" or "magentic")
35
  user_api_key: Optional user-provided API key (BYOK)
36
  api_provider: API provider ("openai" or "anthropic")
37
 
@@ -115,7 +115,7 @@ async def research_agent(
115
  Args:
116
  message: User's research question
117
  history: Chat history (Gradio format)
118
- mode: Orchestrator mode ("simple" or "magentic")
119
  api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
120
  api_provider: API provider ("openai" or "anthropic")
121
 
@@ -135,10 +135,11 @@ async def research_agent(
135
  has_user_key = bool(user_api_key)
136
  has_paid_key = has_openai or has_anthropic or has_user_key
137
 
138
- # Magentic mode requires OpenAI specifically
139
- if mode == "magentic" and not (has_openai or (has_user_key and api_provider == "openai")):
140
  yield (
141
- "⚠️ **Warning**: Magentic mode requires OpenAI API key. Falling back to simple mode.\n\n"
 
142
  )
143
  mode = "simple"
144
 
@@ -227,10 +228,13 @@ def create_demo() -> gr.ChatInterface:
227
  additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
228
  additional_inputs=[
229
  gr.Radio(
230
- choices=["simple", "magentic"],
231
  value="simple",
232
  label="Orchestrator Mode",
233
- info="Simple: Linear | Magentic: Multi-Agent (OpenAI)",
 
 
 
234
  ),
235
  gr.Textbox(
236
  label="🔑 API Key (Optional - BYOK)",
 
31
 
32
  Args:
33
  use_mock: If True, use MockJudgeHandler (no API key needed)
34
+ mode: Orchestrator mode ("simple" or "advanced")
35
  user_api_key: Optional user-provided API key (BYOK)
36
  api_provider: API provider ("openai" or "anthropic")
37
 
 
115
  Args:
116
  message: User's research question
117
  history: Chat history (Gradio format)
118
+ mode: Orchestrator mode ("simple" or "advanced")
119
  api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
120
  api_provider: API provider ("openai" or "anthropic")
121
 
 
135
  has_user_key = bool(user_api_key)
136
  has_paid_key = has_openai or has_anthropic or has_user_key
137
 
138
+ # Advanced mode requires OpenAI specifically (due to agent-framework binding)
139
+ if mode == "advanced" and not (has_openai or (has_user_key and api_provider == "openai")):
140
  yield (
141
+ "⚠️ **Warning**: Advanced mode currently requires OpenAI API key. "
142
+ "Falling back to simple mode.\n\n"
143
  )
144
  mode = "simple"
145
 
 
228
  additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
229
  additional_inputs=[
230
  gr.Radio(
231
+ choices=["simple", "advanced"],
232
  value="simple",
233
  label="Orchestrator Mode",
234
+ info=(
235
+ "Simple: Linear (Free Tier Friendly) | "
236
+ "Advanced: Multi-Agent (Requires OpenAI)"
237
+ ),
238
  ),
239
  gr.Textbox(
240
  label="🔑 API Key (Optional - BYOK)",
src/orchestrator_factory.py CHANGED
@@ -2,15 +2,34 @@
2
 
3
  from typing import Any, Literal
4
 
 
 
5
  from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
 
6
  from src.utils.models import OrchestratorConfig
7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
  def create_orchestrator(
10
  search_handler: SearchHandlerProtocol | None = None,
11
  judge_handler: JudgeHandlerProtocol | None = None,
12
  config: OrchestratorConfig | None = None,
13
- mode: Literal["simple", "magentic"] = "simple",
14
  ) -> Any:
15
  """
16
  Create an orchestrator instance.
@@ -19,25 +38,19 @@ def create_orchestrator(
19
  search_handler: The search handler (required for simple mode)
20
  judge_handler: The judge handler (required for simple mode)
21
  config: Optional configuration
22
- mode: "simple" for Phase 4 loop, "magentic" for ChatAgent-based multi-agent
23
 
24
  Returns:
25
  Orchestrator instance
26
-
27
- Note:
28
- Magentic mode does NOT use search_handler/judge_handler.
29
- It creates ChatAgent instances with internal LLMs that call tools directly.
30
  """
31
- if mode == "magentic":
32
- try:
33
- from src.orchestrator_magentic import MagenticOrchestrator
34
 
35
- return MagenticOrchestrator(
36
- max_rounds=config.max_iterations if config else 10,
37
- )
38
- except ImportError:
39
- # Fallback to simple if agent-framework not installed
40
- pass
41
 
42
  # Simple mode requires handlers
43
  if search_handler is None or judge_handler is None:
@@ -48,3 +61,17 @@ def create_orchestrator(
48
  judge_handler=judge_handler,
49
  config=config,
50
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  from typing import Any, Literal
4
 
5
+ import structlog
6
+
7
  from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
8
+ from src.utils.config import settings
9
  from src.utils.models import OrchestratorConfig
10
 
11
+ logger = structlog.get_logger()
12
+
13
+
14
+ def _get_magentic_orchestrator_class() -> Any:
15
+ """Import MagenticOrchestrator lazily to avoid hard dependency."""
16
+ try:
17
+ from src.orchestrator_magentic import MagenticOrchestrator
18
+
19
+ return MagenticOrchestrator
20
+ except ImportError as e:
21
+ logger.error("Failed to import MagenticOrchestrator", error=str(e))
22
+ raise ValueError(
23
+ "Advanced mode requires agent-framework-core. "
24
+ "Please install it or use mode='simple'."
25
+ ) from e
26
+
27
 
28
  def create_orchestrator(
29
  search_handler: SearchHandlerProtocol | None = None,
30
  judge_handler: JudgeHandlerProtocol | None = None,
31
  config: OrchestratorConfig | None = None,
32
+ mode: Literal["simple", "magentic", "advanced"] | None = None,
33
  ) -> Any:
34
  """
35
  Create an orchestrator instance.
 
38
  search_handler: The search handler (required for simple mode)
39
  judge_handler: The judge handler (required for simple mode)
40
  config: Optional configuration
41
+ mode: "simple", "magentic", "advanced" or None (auto-detect)
42
 
43
  Returns:
44
  Orchestrator instance
 
 
 
 
45
  """
46
+ effective_mode = _determine_mode(mode)
47
+ logger.info("Creating orchestrator", mode=effective_mode)
 
48
 
49
+ if effective_mode == "advanced":
50
+ orchestrator_cls = _get_magentic_orchestrator_class()
51
+ return orchestrator_cls(
52
+ max_rounds=config.max_iterations if config else 10,
53
+ )
 
54
 
55
  # Simple mode requires handlers
56
  if search_handler is None or judge_handler is None:
 
61
  judge_handler=judge_handler,
62
  config=config,
63
  )
64
+
65
+
66
+ def _determine_mode(explicit_mode: str | None) -> str:
67
+ """Determine which mode to use."""
68
+ if explicit_mode:
69
+ if explicit_mode in ("magentic", "advanced"):
70
+ return "advanced"
71
+ return "simple"
72
+
73
+ # Auto-detect: advanced if paid API key available
74
+ if settings.has_openai_key:
75
+ return "advanced"
76
+
77
+ return "simple"
src/utils/config.py CHANGED
@@ -23,13 +23,20 @@ class Settings(BaseSettings):
23
  # LLM Configuration
24
  openai_api_key: str | None = Field(default=None, description="OpenAI API key")
25
  anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
26
- llm_provider: Literal["openai", "anthropic"] = Field(
27
  default="openai", description="Which LLM provider to use"
28
  )
29
  openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
30
  anthropic_model: str = Field(
31
  default="claude-sonnet-4-5-20250929", description="Anthropic model"
32
  )
 
 
 
 
 
 
 
33
 
34
  # Embedding Configuration
35
  # Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
@@ -97,10 +104,15 @@ class Settings(BaseSettings):
97
  """Check if Anthropic API key is available."""
98
  return bool(self.anthropic_api_key)
99
 
 
 
 
 
 
100
  @property
101
  def has_any_llm_key(self) -> bool:
102
  """Check if any LLM API key is available."""
103
- return self.has_openai_key or self.has_anthropic_key
104
 
105
 
106
  def get_settings() -> Settings:
 
23
  # LLM Configuration
24
  openai_api_key: str | None = Field(default=None, description="OpenAI API key")
25
  anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
26
+ llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
27
  default="openai", description="Which LLM provider to use"
28
  )
29
  openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
30
  anthropic_model: str = Field(
31
  default="claude-sonnet-4-5-20250929", description="Anthropic model"
32
  )
33
+ # HuggingFace (free tier)
34
+ huggingface_model: str | None = Field(
35
+ default="meta-llama/Llama-3.1-70B-Instruct", description="HuggingFace model name"
36
+ )
37
+ hf_token: str | None = Field(
38
+ default=None, alias="HF_TOKEN", description="HuggingFace API token"
39
+ )
40
 
41
  # Embedding Configuration
42
  # Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
 
104
  """Check if Anthropic API key is available."""
105
  return bool(self.anthropic_api_key)
106
 
107
+ @property
108
+ def has_huggingface_key(self) -> bool:
109
+ """Check if HuggingFace token is available."""
110
+ return bool(self.hf_token)
111
+
112
  @property
113
  def has_any_llm_key(self) -> bool:
114
  """Check if any LLM API key is available."""
115
+ return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
116
 
117
 
118
  def get_settings() -> Settings:
tests/integration/test_dual_mode_e2e.py ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """End-to-End Integration Tests for Dual-Mode Architecture."""
2
+
3
+ from unittest.mock import AsyncMock, MagicMock, patch
4
+
5
+ import pytest
6
+
7
+ pytestmark = [pytest.mark.integration, pytest.mark.slow]
8
+
9
+ from src.orchestrator_factory import create_orchestrator
10
+ from src.utils.models import Citation, Evidence, OrchestratorConfig
11
+
12
+
13
+ @pytest.fixture
14
+ def mock_search_handler():
15
+ handler = MagicMock()
16
+ handler.execute = AsyncMock(
17
+ return_value=[
18
+ Evidence(
19
+ citation=Citation(
20
+ title="Test Paper", url="http://test", date="2024", source="pubmed"
21
+ ),
22
+ content="Metformin increases lifespan in mice.",
23
+ )
24
+ ]
25
+ )
26
+ return handler
27
+
28
+
29
+ @pytest.fixture
30
+ def mock_judge_handler():
31
+ handler = MagicMock()
32
+ # Mock return value of assess
33
+ assessment = MagicMock()
34
+ assessment.sufficient = True
35
+ assessment.recommendation = "synthesize"
36
+ handler.assess = AsyncMock(return_value=assessment)
37
+ return handler
38
+
39
+
40
+ @pytest.mark.asyncio
41
+ async def test_simple_mode_e2e(mock_search_handler, mock_judge_handler):
42
+ """Test Simple Mode Orchestration flow."""
43
+ orch = create_orchestrator(
44
+ search_handler=mock_search_handler,
45
+ judge_handler=mock_judge_handler,
46
+ mode="simple",
47
+ config=OrchestratorConfig(max_iterations=1),
48
+ )
49
+
50
+ # Run
51
+ results = []
52
+ async for event in orch.run("Test query"):
53
+ results.append(event)
54
+
55
+ assert len(results) > 0
56
+ assert mock_search_handler.execute.called
57
+ assert mock_judge_handler.assess.called
58
+
59
+
60
+ @pytest.mark.asyncio
61
+ async def test_advanced_mode_explicit_instantiation():
62
+ """Test explicit Advanced Mode instantiation (not auto-detect).
63
+
64
+ This tests the explicit mode="advanced" path, verifying that
65
+ MagenticOrchestrator can be instantiated when explicitly requested.
66
+ The settings patch ensures any internal checks pass.
67
+ """
68
+ with patch("src.orchestrator_factory.settings") as mock_settings:
69
+ # Settings patch ensures factory checks pass (even though mode is explicit)
70
+ mock_settings.has_openai_key = True
71
+
72
+ with patch("src.agents.magentic_agents.OpenAIChatClient"):
73
+ # Mock agent creation to avoid real API calls during init
74
+ with (
75
+ patch("src.orchestrator_magentic.create_search_agent"),
76
+ patch("src.orchestrator_magentic.create_judge_agent"),
77
+ patch("src.orchestrator_magentic.create_hypothesis_agent"),
78
+ patch("src.orchestrator_magentic.create_report_agent"),
79
+ ):
80
+ # Explicit mode="advanced" - tests the explicit path, not auto-detect
81
+ orch = create_orchestrator(mode="advanced")
82
+ assert orch is not None
tests/unit/agent_factory/test_judges_factory.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unit tests for Judge Factory and Model Selection."""
2
+
3
+ from unittest.mock import patch
4
+
5
+ import pytest
6
+
7
+ pytestmark = pytest.mark.unit
8
+ from pydantic_ai.models.anthropic import AnthropicModel
9
+
10
+ # We expect this import to exist after we implement it, or we mock it if it's not there yet
11
+ # For TDD, we assume we will use the library class
12
+ from pydantic_ai.models.huggingface import HuggingFaceModel
13
+ from pydantic_ai.models.openai import OpenAIModel
14
+
15
+ from src.agent_factory.judges import get_model
16
+
17
+
18
+ @pytest.fixture
19
+ def mock_settings():
20
+ with patch("src.agent_factory.judges.settings", autospec=True) as mock_settings:
21
+ yield mock_settings
22
+
23
+
24
+ def test_get_model_openai(mock_settings):
25
+ """Test that OpenAI model is returned when provider is openai."""
26
+ mock_settings.llm_provider = "openai"
27
+ mock_settings.openai_api_key = "sk-test"
28
+ mock_settings.openai_model = "gpt-4o"
29
+
30
+ model = get_model()
31
+ assert isinstance(model, OpenAIModel)
32
+ assert model.model_name == "gpt-4o"
33
+
34
+
35
+ def test_get_model_anthropic(mock_settings):
36
+ """Test that Anthropic model is returned when provider is anthropic."""
37
+ mock_settings.llm_provider = "anthropic"
38
+ mock_settings.anthropic_api_key = "sk-ant-test"
39
+ mock_settings.anthropic_model = "claude-3-5-sonnet"
40
+
41
+ model = get_model()
42
+ assert isinstance(model, AnthropicModel)
43
+ assert model.model_name == "claude-3-5-sonnet"
44
+
45
+
46
+ def test_get_model_huggingface(mock_settings):
47
+ """Test that HuggingFace model is returned when provider is huggingface."""
48
+ mock_settings.llm_provider = "huggingface"
49
+ mock_settings.hf_token = "hf_test_token"
50
+ mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
51
+
52
+ model = get_model()
53
+ assert isinstance(model, HuggingFaceModel)
54
+ assert model.model_name == "meta-llama/Llama-3.1-70B-Instruct"
55
+
56
+
57
+ def test_get_model_default_fallback(mock_settings):
58
+ """Test fallback to OpenAI if provider is unknown."""
59
+ mock_settings.llm_provider = "unknown_provider"
60
+ mock_settings.openai_api_key = "sk-test"
61
+ mock_settings.openai_model = "gpt-4o"
62
+
63
+ model = get_model()
64
+ assert isinstance(model, OpenAIModel)
tests/unit/agents/test_agent_imports.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Test that agent framework dependencies are importable and usable."""
2
+
3
+ from unittest.mock import MagicMock
4
+
5
+ import pytest
6
+
7
+ pytestmark = pytest.mark.unit
8
+
9
+ # Import conditional on package availability, but for this test we expect it to be there
10
+ try:
11
+ from agent_framework import ChatAgent
12
+ from agent_framework.openai import OpenAIChatClient
13
+ except ImportError:
14
+ ChatAgent = None
15
+ OpenAIChatClient = None
16
+
17
+
18
+ @pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
19
+ def test_agent_framework_import():
20
+ """Test that agent_framework can be imported."""
21
+ assert ChatAgent is not None
22
+ assert OpenAIChatClient is not None # Verify both imports work
23
+
24
+
25
+ @pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
26
+ def test_chat_agent_instantiation():
27
+ """Test that ChatAgent can be instantiated with a mock client."""
28
+ mock_client = MagicMock()
29
+ # We assume ChatAgent takes chat_client as first argument based on _agents.py source
30
+ agent = ChatAgent(chat_client=mock_client, name="TestAgent")
31
+ assert agent.name == "TestAgent"
32
+ assert agent.chat_client == mock_client
tests/unit/test_orchestrator_factory.py ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Unit tests for Orchestrator Factory."""
2
+
3
+ from unittest.mock import MagicMock, patch
4
+
5
+ import pytest
6
+
7
+ pytestmark = pytest.mark.unit
8
+
9
+ from src.orchestrator import Orchestrator
10
+ from src.orchestrator_factory import create_orchestrator
11
+
12
+
13
+ @pytest.fixture
14
+ def mock_settings():
15
+ with patch("src.orchestrator_factory.settings", autospec=True) as mock_settings:
16
+ yield mock_settings
17
+
18
+
19
+ @pytest.fixture
20
+ def mock_magentic_cls():
21
+ with patch("src.orchestrator_factory._get_magentic_orchestrator_class") as mock:
22
+ # The mock returns a class (callable), which returns an instance
23
+ mock_class = MagicMock()
24
+ mock.return_value = mock_class
25
+ yield mock_class
26
+
27
+
28
+ @pytest.fixture
29
+ def mock_handlers():
30
+ return MagicMock(), MagicMock()
31
+
32
+
33
+ def test_create_orchestrator_simple_explicit(mock_settings, mock_handlers):
34
+ """Test explicit simple mode."""
35
+ search, judge = mock_handlers
36
+ orch = create_orchestrator(search_handler=search, judge_handler=judge, mode="simple")
37
+ assert isinstance(orch, Orchestrator)
38
+
39
+
40
+ def test_create_orchestrator_advanced_explicit(mock_settings, mock_handlers, mock_magentic_cls):
41
+ """Test explicit advanced mode."""
42
+ # Ensure has_openai_key is True so it doesn't error if we add checks
43
+ mock_settings.has_openai_key = True
44
+
45
+ orch = create_orchestrator(mode="advanced")
46
+ # verify instantiated
47
+ mock_magentic_cls.assert_called_once()
48
+ assert orch == mock_magentic_cls.return_value
49
+
50
+
51
+ def test_create_orchestrator_auto_advanced(mock_settings, mock_magentic_cls):
52
+ """Test auto-detect advanced mode when OpenAI key exists."""
53
+ mock_settings.has_openai_key = True
54
+
55
+ orch = create_orchestrator()
56
+ mock_magentic_cls.assert_called_once()
57
+ assert orch == mock_magentic_cls.return_value
58
+
59
+
60
+ def test_create_orchestrator_auto_simple(mock_settings, mock_handlers):
61
+ """Test auto-detect simple mode when no paid keys."""
62
+ mock_settings.has_openai_key = False
63
+
64
+ search, judge = mock_handlers
65
+ orch = create_orchestrator(search_handler=search, judge_handler=judge)
66
+ assert isinstance(orch, Orchestrator)