Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

VibecoderMcSwaggins commited on 10 days ago

Commit

b2929fc

unverified ·

1 Parent(s): b72f9f1

feat: implement dual-mode architecture (Simple + Advanced) (#45)

Browse files

* docs: add dual-mode architecture specification

Senior agent reviewed and approved. Key documents:
- 00_SITUATION_AND_PLAN.md: Problem analysis, branch states, recommended path
- 01_ARCHITECTURE_SPEC.md: Dual-mode architecture (Simple + Advanced)
- 02_IMPLEMENTATION_PHASES.md: 6-phase implementation plan
- 03_IMMEDIATE_ACTIONS.md: Quick reference checklist

Architecture: pydantic-ai (structured outputs) + Microsoft Agent Framework
(orchestration) are COMPLEMENTARY, not competing. Dual-mode allows
graceful degradation to free tier when no API keys available.

* docs: add follow-up review request for senior agent verification

* feat: implement dual-mode architecture (Simple + Advanced)

Phase 1 - Pydantic-AI Improvements (Simple Mode):
- Add HuggingFace provider support in judges.py with get_model()
- Add huggingface_model and hf_token config fields
- Tests in test_judges_factory.py

Phase 2 - Orchestrator Factory:
- Implement create_orchestrator() with auto-detection logic
- Simple mode for free tier, Advanced mode when OpenAI key present
- Lazy loading of MagenticOrchestrator to avoid hard dependency
- Tests in test_orchestrator_factory.py

Phase 3 - Agent Framework Integration:
- Use agent-framework-core from PyPI (Microsoft package)
- Verify imports work with test_agent_imports.py

Phase 4 - UI Updates:
- Rename "magentic" to "advanced" in app.py
- Update mode selection labels and descriptions

All 126 unit tests pass. Lint and type checks clean.

* fix: address CodeRabbit review feedback

- Add pytestmark to integration tests (integration, slow markers)
- Add pytestmark to unit tests (unit marker)
- Fix unused OpenAIChatClient import by adding assertion
- Update docs spec to match actual factory implementation
- Add code fence languages (text) to markdown blocks

Note: CodeRabbit incorrectly flagged has_openai_key as a method
when it's actually a @property that returns bool correctly.

All 126 unit tests pass.

* fix: address remaining CodeRabbit nitpicks

- Add 'text' language to ASCII diagram code blocks in docs
- Update Advanced Mode trigger description to clarify OpenAI-only
- Rename and clarify test_advanced_mode_explicit_instantiation
- Improve test docstring explaining explicit vs auto-detect path

All 128 tests pass.

Files changed (15) hide show

docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md +189 -0
docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md +289 -0
docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md +112 -0
docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md +112 -0
docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md +158 -0
docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md +113 -0
pyproject.toml +1 -1
src/agent_factory/judges.py +8 -0
src/app.py +11 -7
src/orchestrator_factory.py +42 -15
src/utils/config.py +14 -2
tests/integration/test_dual_mode_e2e.py +82 -0
tests/unit/agent_factory/test_judges_factory.py +64 -0
tests/unit/agents/test_agent_imports.py +32 -0
tests/unit/test_orchestrator_factory.py +66 -0

docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md ADDED Viewed

	@@ -0,0 +1,189 @@

+# Situation Analysis: Pydantic-AI + Microsoft Agent Framework Integration
+**Date:** November 27, 2025
+**Status:** ACTIVE DECISION REQUIRED
+**Risk Level:** HIGH - DO NOT MERGE PR #41 UNTIL RESOLVED
+---
+## 1. The Problem
+We almost merged a refactor that would have **deleted** multi-agent orchestration capability from the codebase, mistakenly believing pydantic-ai and Microsoft Agent Framework were mutually exclusive.
+**They are not.** They are complementary:
+- **pydantic-ai** (Library): Ensures LLM outputs match Pydantic schemas
+- **Microsoft Agent Framework** (Framework): Orchestrates multi-agent workflows
+---
+## 2. Current Branch State
+| Branch | Location | Has Agent Framework? | Has Pydantic-AI Improvements? | Status |
+|--------|----------|---------------------|------------------------------|--------|
+| `origin/dev` | GitHub | YES | NO | **SAFE - Source of Truth** |
+| `huggingface-upstream/dev` | HF Spaces | YES | NO | **SAFE - Same as GitHub** |
+| `origin/main` | GitHub | YES | NO | **SAFE** |
+| `feat/pubmed-fulltext` | GitHub | NO (deleted) | YES | **DANGER - Has destructive refactor** |
+| `refactor/pydantic-unification` | Local | NO (deleted) | YES | **DANGER - Redundant, delete** |
+| Local `dev` | Local only | NO (deleted) | YES | **DANGER - NOT PUSHED (thankfully)** |
+### Key Files at Risk
+**On `origin/dev` (PRESERVED):**
+```text
+src/agents/
+├── analysis_agent.py      # StatisticalAnalyzer wrapper
+├── hypothesis_agent.py    # Hypothesis generation
+├── judge_agent.py         # JudgeHandler wrapper
+├── magentic_agents.py     # Multi-agent definitions
+├── report_agent.py        # Report synthesis
+├── search_agent.py        # SearchHandler wrapper
+├── state.py               # Thread-safe state management
+└── tools.py               # @ai_function decorated tools
+src/orchestrator_magentic.py  # Multi-agent orchestrator
+src/utils/llm_factory.py      # Centralized LLM client factory
+```
+**Deleted in refactor branch (would be lost if merged):**
+- All of the above
+---
+## 3. Target Architecture
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│  Microsoft Agent Framework (Orchestration Layer)                │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
+│  │ SearchAgent  │→ │ JudgeAgent   │→ │ ReportAgent  │          │
+│  │ (BaseAgent)  │  │ (BaseAgent)  │  │ (BaseAgent)  │          │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘          │
+│         │                 │                 │                  │
+│         ▼                 ▼                 ▼                  │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
+│  │ pydantic-ai  │  │ pydantic-ai  │  │ pydantic-ai  │          │
+│  │ Agent()      │  │ Agent()      │  │ Agent()      │          │
+│  │ output_type= │  │ output_type= │  │ output_type= │          │
+│  │ SearchResult │  │ JudgeAssess  │  │ Report       │          │
+│  └──────────────┘  └──────────────┘  └──────────────┘          │
+└─────────────────────────────────────────────────────────────────┘
+```
+**Why this architecture:**
+1. **Agent Framework** handles: workflow coordination, state passing, middleware, observability
+2. **pydantic-ai** handles: type-safe LLM calls within each agent
+---
+## 4. CRITICAL: Naming Confusion Clarification
+> **Senior Agent Review Finding:** The codebase uses "magentic" in file names (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT** the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework (`agent-framework-core`).
+**The naming confusion:**
+- `magentic` (PyPI package): A different library for structured LLM outputs
+- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
+- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
+**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py` to eliminate confusion.
+---
+## 5. What the Refactor DID Get Right
+The refactor branch (`feat/pubmed-fulltext`) has some valuable improvements:
+1. **`judges.py` unified `get_model()`** - Supports OpenAI, Anthropic, AND HuggingFace via pydantic-ai
+2. **HuggingFace free tier support** - `HuggingFaceModel` integration
+3. **Test fix** - Properly mocks `HuggingFaceModel` class
+4. **Removed broken magentic optional dependency** from pyproject.toml (this was correct - the old `magentic` package is different from Microsoft Agent Framework)
+**What it got WRONG:**
+1. Deleted `src/agents/` entirely instead of refactoring them
+2. Deleted `src/orchestrator_magentic.py` instead of fixing it
+3. Conflated "magentic" (old package) with "Microsoft Agent Framework" (current framework)
+---
+## 6. Options for Path Forward
+### Option A: Abandon Refactor, Start Fresh
+- Close PR #41
+- Delete `feat/pubmed-fulltext` and `refactor/pydantic-unification` branches
+- Reset local `dev` to match `origin/dev`
+- Cherry-pick ONLY the good parts (judges.py improvements, HF support)
+- **Pros:** Clean, safe
+- **Cons:** Lose some work, need to redo carefully
+### Option B: Cherry-Pick Good Parts to origin/dev
+- Do NOT merge PR #41
+- Create new branch from `origin/dev`
+- Cherry-pick specific commits/changes that improve pydantic-ai usage
+- Keep agent framework code intact
+- **Pros:** Preserves both, surgical
+- **Cons:** Requires careful file-by-file review
+### Option C: Revert Deletions in Refactor Branch
+- On `feat/pubmed-fulltext`, restore deleted agent files from `origin/dev`
+- Keep the pydantic-ai improvements
+- Merge THAT to dev
+- **Pros:** Gets both
+- **Cons:** Complex git operations, risk of conflicts
+---
+## 7. Recommended Action: Option B (Cherry-Pick)
+**Step-by-step:**
+1. **Close PR #41** (do not merge)
+2. **Delete redundant branches:**
+   - `refactor/pydantic-unification` (local)
+   - Reset local `dev` to `origin/dev`
+3. **Create new branch from origin/dev:**
+   ```bash
+   git checkout -b feat/pydantic-ai-improvements origin/dev
+   ```
+4. **Cherry-pick or manually port these improvements:**
+   - `src/agent_factory/judges.py` - the unified `get_model()` function
+   - `examples/free_tier_demo.py` - HuggingFace demo
+   - Test improvements
+5. **Do NOT delete any agent framework files**
+6. **Create PR for review**
+---
+## 8. Files to Cherry-Pick (Safe Improvements)
+| File | What Changed | Safe to Port? |
+|------|-------------|---------------|
+| `src/agent_factory/judges.py` | Added `HuggingFaceModel` support in `get_model()` | YES |
+| `examples/free_tier_demo.py` | New demo for HF inference | YES |
+| `tests/unit/agent_factory/test_judges.py` | Fixed HF model mocking | YES |
+| `pyproject.toml` | Removed old `magentic` optional dep | MAYBE (review carefully) |
+---
+## 9. Questions to Answer Before Proceeding
+1. **For the hackathon**: Do we need full multi-agent orchestration, or is single-agent sufficient?
+2. **For DeepCritical mainline**: Is the plan to use Microsoft Agent Framework for orchestration?
+3. **Timeline**: How much time do we have to get this right?
+---
+## 10. Immediate Actions (DO NOW)
+- [ ] **DO NOT merge PR #41**
+- [ ] Close PR #41 with comment explaining the situation
+- [ ] Do not push local `dev` branch anywhere
+- [ ] Confirm HuggingFace Spaces is untouched (it is - verified)
+---
+## 11. Decision Log
+| Date | Decision | Rationale |
+|------|----------|-----------|
+| 2025-11-27 | Pause refactor merge | Discovered agent framework and pydantic-ai are complementary, not exclusive |
+| TBD | ? | Awaiting decision on path forward |

docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md ADDED Viewed

	@@ -0,0 +1,289 @@

+# Architecture Specification: Dual-Mode Agent System
+**Date:** November 27, 2025
+**Status:** SPECIFICATION
+**Goal:** Graceful degradation from full multi-agent orchestration to simple single-agent mode
+---
+## 1. Core Concept: Two Operating Modes
+```text
+┌─────────────────────────────────────────────────────────────────────┐
+│                        USER REQUEST                                 │
+│                            │                                        │
+│                            ▼                                        │
+│                   ┌─────────────────┐                               │
+│                   │  Mode Selection │                               │
+│                   │  (Auto-detect)  │                               │
+│                   └────────┬────────┘                               │
+│                            │                                        │
+│            ┌───────────────┴───────────────┐                        │
+│            │                               │                        │
+│            ▼                               ▼                        │
+│   ┌─────────────────┐             ┌─────────────────┐               │
+│   │   SIMPLE MODE   │             │  ADVANCED MODE  │               │
+│   │  (Free Tier)    │             │  (Paid Tier)    │               │
+│   │                 │             │                 │               │
+│   │  pydantic-ai    │             │  MS Agent Fwk   │               │
+│   │  single-agent   │             │  + pydantic-ai  │               │
+│   │  loop           │             │  multi-agent    │               │
+│   └─────────────────┘             └─────────────────┘               │
+│            │                               │                        │
+│            └───────────────┬───────────────┘                        │
+│                            ▼                                        │
+│                   ┌─────────────────┐                               │
+│                   │  Research Report │                              │
+│                   │  with Citations  │                              │
+│                   └─────────────────┘                               │
+└─────────────────────────────────────────────────────────────────────┘
+```
+---
+## 2. Mode Comparison
+| Aspect | Simple Mode | Advanced Mode |
+|--------|-------------|---------------|
+| **Trigger** | No API key OR `LLM_PROVIDER=huggingface` | OpenAI API key present (currently OpenAI only) |
+| **Framework** | pydantic-ai only | Microsoft Agent Framework + pydantic-ai |
+| **Architecture** | Single orchestrator loop | Multi-agent coordination |
+| **Agents** | One agent does Search→Judge→Report | SearchAgent, JudgeAgent, ReportAgent, AnalysisAgent |
+| **State Management** | Simple dict | Thread-safe `MagenticState` with context vars |
+| **Quality** | Good (functional) | Better (specialized agents, coordination) |
+| **Cost** | Free (HuggingFace Inference) | Paid (OpenAI/Anthropic) |
+| **Use Case** | Demos, hackathon, budget-constrained | Production, research quality |
+---
+## 3. Simple Mode Architecture (pydantic-ai Only)
+```text
+┌─────────────────────────────────────────────────────┐
+│                  Orchestrator                       │
+│                                                     │
+│   while not sufficient and iteration < max:        │
+│       1. SearchHandler.execute(query)              │
+│       2. JudgeHandler.assess(evidence)    ◄── pydantic-ai Agent  │
+│       3. if sufficient: break                      │
+│       4. query = judge.next_queries                │
+│                                                     │
+│   return ReportGenerator.generate(evidence)        │
+└─────────────────────────────────────────────────────┘
+```
+**Components:**
+- `src/orchestrator.py` - Simple loop orchestrator
+- `src/agent_factory/judges.py` - JudgeHandler with pydantic-ai
+- `src/tools/search_handler.py` - Scatter-gather search
+- `src/tools/pubmed.py`, `clinicaltrials.py`, `europepmc.py` - Search tools
+---
+## 4. Advanced Mode Architecture (MS Agent Framework + pydantic-ai)
+```text
+┌─────────────────────────────────────────────────────────────────────┐
+│              Microsoft Agent Framework Orchestrator                 │
+│                                                                     │
+│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
+│   │ SearchAgent │───▶│ JudgeAgent  │───▶│ ReportAgent │            │
+│   │ (BaseAgent) │    │ (BaseAgent) │    │ (BaseAgent) │            │
+│   └──────┬──────┘    └──────┬──────┘    └──────┬──────┘            │
+│          │                  │                  │                    │
+│          ▼                  ▼                  ▼                    │
+│   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐            │
+│   │ pydantic-ai │    │ pydantic-ai │    │ pydantic-ai │            │
+│   │ Agent()     │    │ Agent()     │    │ Agent()     │            │
+│   │ output_type=│    │ output_type=│    │ output_type=│            │
+│   │ SearchResult│    │ JudgeAssess │    │ Report      │            │
+│   └─────────────┘    └─────────────┘    └─────────────┘            │
+│                                                                     │
+│   Shared State: MagenticState (thread-safe via contextvars)        │
+│   - evidence: list[Evidence]                                       │
+│   - embedding_service: EmbeddingService                            │
+└─────────────────────────────────────────────────────────────────────┘
+```
+**Components:**
+- `src/orchestrator_magentic.py` - Multi-agent orchestrator
+- `src/agents/search_agent.py` - SearchAgent (BaseAgent)
+- `src/agents/judge_agent.py` - JudgeAgent (BaseAgent)
+- `src/agents/report_agent.py` - ReportAgent (BaseAgent)
+- `src/agents/analysis_agent.py` - AnalysisAgent (BaseAgent)
+- `src/agents/state.py` - Thread-safe state management
+- `src/agents/tools.py` - @ai_function decorated tools
+---
+## 5. Mode Selection Logic
+```python
+# src/orchestrator_factory.py (actual implementation)
+def create_orchestrator(
+    search_handler: SearchHandlerProtocol | None = None,
+    judge_handler: JudgeHandlerProtocol | None = None,
+    config: OrchestratorConfig | None = None,
+    mode: Literal["simple", "magentic", "advanced"] | None = None,
+) -> Any:
+    """
+    Auto-select orchestrator based on available credentials.
+    Priority:
+    1. If mode explicitly set, use that
+    2. If OpenAI key available -> Advanced Mode (currently OpenAI only)
+    3. Otherwise -> Simple Mode (HuggingFace free tier)
+    """
+    effective_mode = _determine_mode(mode)
+    if effective_mode == "advanced":
+        orchestrator_cls = _get_magentic_orchestrator_class()
+        return orchestrator_cls(max_rounds=config.max_iterations if config else 10)
+    # Simple mode requires handlers
+    if search_handler is None or judge_handler is None:
+        raise ValueError("Simple mode requires search_handler and judge_handler")
+    return Orchestrator(
+        search_handler=search_handler,
+        judge_handler=judge_handler,
+        config=config,
+    )
+```
+---
+## 6. Shared Components (Both Modes Use)
+These components work in both modes:
+| Component | Purpose |
+|-----------|---------|
+| `src/tools/pubmed.py` | PubMed search |
+| `src/tools/clinicaltrials.py` | ClinicalTrials.gov search |
+| `src/tools/europepmc.py` | Europe PMC search |
+| `src/tools/search_handler.py` | Scatter-gather orchestration |
+| `src/tools/rate_limiter.py` | Rate limiting |
+| `src/utils/models.py` | Evidence, Citation, JudgeAssessment |
+| `src/utils/config.py` | Settings |
+| `src/services/embeddings.py` | Vector search (optional) |
+---
+## 7. pydantic-ai Integration Points
+Both modes use pydantic-ai for structured LLM outputs:
+```python
+# In JudgeHandler (both modes)
+from pydantic_ai import Agent
+from pydantic_ai.models.huggingface import HuggingFaceModel
+from pydantic_ai.models.openai import OpenAIModel
+from pydantic_ai.models.anthropic import AnthropicModel
+class JudgeHandler:
+    def __init__(self, model: Any = None):
+        self.model = model or get_model()  # Auto-selects based on config
+        self.agent = Agent(
+            model=self.model,
+            output_type=JudgeAssessment,  # Structured output!
+            system_prompt=SYSTEM_PROMPT,
+        )
+    async def assess(self, question: str, evidence: list[Evidence]) -> JudgeAssessment:
+        result = await self.agent.run(format_prompt(question, evidence))
+        return result.output  # Guaranteed to be JudgeAssessment
+```
+---
+## 8. Microsoft Agent Framework Integration Points
+Advanced mode wraps pydantic-ai agents in BaseAgent:
+```python
+# In JudgeAgent (advanced mode only)
+from agent_framework import BaseAgent, AgentRunResponse, ChatMessage, Role
+class JudgeAgent(BaseAgent):
+    def __init__(self, judge_handler: JudgeHandlerProtocol):
+        super().__init__(
+            name="JudgeAgent",
+            description="Evaluates evidence quality",
+        )
+        self._handler = judge_handler  # Uses pydantic-ai internally
+    async def run(self, messages, **kwargs) -> AgentRunResponse:
+        question = extract_question(messages)
+        evidence = self._evidence_store.get("current", [])
+        # Delegate to pydantic-ai powered handler
+        assessment = await self._handler.assess(question, evidence)
+        return AgentRunResponse(
+            messages=[ChatMessage(role=Role.ASSISTANT, text=format_response(assessment))],
+            additional_properties={"assessment": assessment.model_dump()},
+        )
+```
+---
+## 9. Benefits of This Architecture
+1. **Graceful Degradation**: Works without API keys (free tier)
+2. **Progressive Enhancement**: Better with API keys (orchestration)
+3. **Code Reuse**: pydantic-ai handlers shared between modes
+4. **Hackathon Ready**: Demo works without requiring paid keys
+5. **Production Ready**: Full orchestration available when needed
+6. **Future Proof**: Can add more agents to advanced mode
+7. **Testable**: Simple mode is easier to unit test
+---
+## 10. Known Risks and Mitigations
+> **From Senior Agent Review**
+### 10.1 Bridge Complexity (MEDIUM)
+**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai). Both are async. Context variables (`MagenticState`) must propagate correctly through the pydantic-ai call stack.
+**Mitigation:**
+- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
+- Test context propagation explicitly in integration tests
+- If issues arise, pass state explicitly rather than via context vars
+### 10.2 Integration Drift (MEDIUM)
+**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time (e.g., Simple Mode uses logic A, Advanced Mode uses logic B).
+**Mitigation:**
+- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
+- Handlers are the single source of truth for business logic
+- Agents are thin wrappers that delegate to handlers
+### 10.3 Testing Burden (LOW-MEDIUM)
+**Risk:** Two distinct orchestrators (`src/orchestrator.py` and `src/orchestrator_magentic.py`) doubles integration testing surface area.
+**Mitigation:**
+- Unit test handlers independently (shared code)
+- Integration tests for each mode separately
+- End-to-end tests verify same output for same input (determinism permitting)
+### 10.4 Dependency Conflicts (LOW)
+**Risk:** `agent-framework-core` might conflict with `pydantic-ai`'s dependencies (e.g., different pydantic versions).
+**Status:** Both use `pydantic>=2.x`. Should be compatible.
+---
+## 11. Naming Clarification
+> See `00_SITUATION_AND_PLAN.md` Section 4 for full details.
+**Important:** The codebase uses "magentic" in file names (`orchestrator_magentic.py`, `magentic_agents.py`) but this refers to our internal naming for Microsoft Agent Framework integration, **NOT** the `magentic` PyPI package.
+**Future action:** Rename to `orchestrator_advanced.py` to eliminate confusion.

docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# Implementation Phases: Dual-Mode Agent System
+**Date:** November 27, 2025
+**Status:** IMPLEMENTATION PLAN (REVISED)
+**Strategy:** TDD (Test-Driven Development), SOLID Principles
+**Dependency Strategy:** PyPI (agent-framework-core)
+---
+## Phase 0: Environment Validation & Cleanup
+**Goal:** Ensure clean state and dependencies are correctly installed.
+### Step 0.1: Verify PyPI Package
+The `agent-framework-core` package is published on PyPI by Microsoft. Verify installation:
+```bash
+uv sync --all-extras
+python -c "from agent_framework import ChatAgent; print('OK')"
+```
+### Step 0.2: Branch State
+We are on `feat/dual-mode-architecture`. Ensure it is up to date with `origin/dev` before starting.
+**Note:** The `reference_repos/agent-framework` folder is kept for reference/documentation only.
+The production dependency uses the official PyPI release.
+---
+## Phase 1: Pydantic-AI Improvements (Simple Mode)
+**Goal:** Implement `HuggingFaceModel` support in `JudgeHandler` using strict TDD.
+### Step 1.1: Test First (Red)
+Create `tests/unit/agent_factory/test_judges_factory.py`:
+- Test `get_model()` returns `HuggingFaceModel` when `LLM_PROVIDER=huggingface`.
+- Test `get_model()` respects `HF_TOKEN`.
+- Test fallback to OpenAI.
+### Step 1.2: Implementation (Green)
+Update `src/utils/config.py`:
+- Add `huggingface_model` and `hf_token` fields.
+Update `src/agent_factory/judges.py`:
+- Implement `get_model` with the logic derived from the tests.
+- Use dependency injection for the model where possible.
+### Step 1.3: Refactor
+Ensure `JudgeHandler` is loosely coupled from the specific model provider.
+---
+## Phase 2: Orchestrator Factory (The Switch)
+**Goal:** Implement the factory pattern to switch between Simple and Advanced modes.
+### Step 2.1: Test First (Red)
+Create `tests/unit/test_orchestrator_factory.py`:
+- Test `create_orchestrator` returns `Orchestrator` (simple) when API keys are missing.
+- Test `create_orchestrator` returns `MagenticOrchestrator` (advanced) when OpenAI key exists.
+- Test explicit mode override.
+### Step 2.2: Implementation (Green)
+Update `src/orchestrator_factory.py` to implement the selection logic.
+---
+## Phase 3: Agent Framework Integration (Advanced Mode)
+**Goal:** Integrate Microsoft Agent Framework from PyPI.
+### Step 3.1: Dependency Management
+The `agent-framework-core` package is installed from PyPI:
+```toml
+[project.optional-dependencies]
+magentic = [
+    "agent-framework-core>=1.0.0b251120,<2.0.0",  # Microsoft Agent Framework (PyPI)
+]
+```
+Install with: `uv sync --all-extras`
+### Step 3.2: Verify Imports (Test First)
+Create `tests/unit/agents/test_agent_imports.py`:
+- Verify `from agent_framework import ChatAgent` works.
+- Verify instantiation of `ChatAgent` with a mock client.
+### Step 3.3: Update Agents
+Refactor `src/agents/*.py` to ensure they match the exact signature of the local `ChatAgent` class.
+- **SOLID:** Ensure agents have single responsibilities.
+- **DRY:** Share tool definitions between Pydantic-AI simple mode and Agent Framework advanced mode.
+---
+## Phase 4: UI & End-to-End Verification
+**Goal:** Update Gradio to reflect the active mode.
+### Step 4.1: UI Updates
+Update `src/app.py` to display "Simple Mode" vs "Advanced Mode".
+### Step 4.2: End-to-End Test
+Run the full loop:
+1. Simple Mode (No Keys) -> Search -> Judge (HF) -> Report.
+2. Advanced Mode (OpenAI Key) -> SearchAgent -> JudgeAgent -> ReportAgent.
+---
+## Phase 5: Cleanup & Documentation
+- Remove unused code.
+- Update main README.md.
+- Final `make check`.

docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md ADDED Viewed

	@@ -0,0 +1,112 @@

+# Immediate Actions Checklist
+**Date:** November 27, 2025
+**Priority:** Execute in order
+---
+## Before Starting Implementation
+### 1. Close PR #41 (CRITICAL)
+```bash
+gh pr close 41 --comment "Architecture decision changed. Cherry-picking improvements to preserve both pydantic-ai and Agent Framework capabilities."
+```
+### 2. Verify HuggingFace Spaces is Safe
+```bash
+# Should show agent framework files exist
+git ls-tree --name-only huggingface-upstream/dev -- src/agents/
+git ls-tree --name-only huggingface-upstream/dev -- src/orchestrator_magentic.py
+```
+Expected output: Files should exist (they do as of this writing).
+### 3. Clean Local Environment
+```bash
+# Switch to main first
+git checkout main
+# Delete problematic branches
+git branch -D refactor/pydantic-unification 2>/dev/null || true
+git branch -D feat/pubmed-fulltext 2>/dev/null || true
+# Reset local dev to origin/dev
+git branch -D dev 2>/dev/null || true
+git checkout -b dev origin/dev
+# Verify agent framework code exists
+ls src/agents/
+# Expected: __init__.py, analysis_agent.py, hypothesis_agent.py, judge_agent.py,
+#           magentic_agents.py, report_agent.py, search_agent.py, state.py, tools.py
+ls src/orchestrator_magentic.py
+# Expected: file exists
+```
+### 4. Create Fresh Feature Branch
+```bash
+git checkout -b feat/dual-mode-architecture origin/dev
+```
+---
+## Decision Points
+Before proceeding, confirm:
+1. **For hackathon**: Do we need advanced mode, or is simple mode sufficient?
+   - Simple mode = faster to implement, works today
+   - Advanced mode = better quality, more work
+2. **Timeline**: How much time do we have?
+   - If < 1 day: Focus on simple mode only
+   - If > 1 day: Implement dual-mode
+3. **Dependencies**: Is `agent-framework-core` available?
+   - Check: `pip index versions agent-framework-core`
+   - If not on PyPI, may need to install from GitHub
+---
+## Quick Start (Simple Mode Only)
+If time is limited, implement only simple mode improvements:
+```bash
+# On feat/dual-mode-architecture branch
+# 1. Update judges.py to add HuggingFace support
+# 2. Update config.py to add HF settings
+# 3. Create free_tier_demo.py
+# 4. Run make check
+# 5. Create PR to dev
+```
+This gives you free-tier capability without touching agent framework code.
+---
+## Quick Start (Full Dual-Mode)
+If time permits, implement full dual-mode:
+Follow phases 1-6 in `02_IMPLEMENTATION_PHASES.md`
+---
+## Emergency Rollback
+If anything goes wrong:
+```bash
+# Reset to safe state
+git checkout main
+git branch -D feat/dual-mode-architecture
+git checkout -b feat/dual-mode-architecture origin/dev
+```
+Origin/dev is the safe fallback - it has agent framework intact.

docs/brainstorming/magentic-pydantic/04_FOLLOWUP_REVIEW_REQUEST.md ADDED Viewed

	@@ -0,0 +1,158 @@

+# Follow-Up Review Request: Did We Implement Your Feedback?
+**Date:** November 27, 2025
+**Context:** You previously reviewed our dual-mode architecture plan and provided feedback. We have updated the documentation. Please verify we correctly implemented your recommendations.
+---
+## Your Original Feedback vs Our Changes
+### 1. Naming Confusion Clarification
+**Your feedback:** "You are using Microsoft Agent Framework, but you've named your integration 'Magentic'. This caused the confusion."
+**Our change:** Added Section 4 in `00_SITUATION_AND_PLAN.md`:
+```markdown
+## 4. CRITICAL: Naming Confusion Clarification
+> **Senior Agent Review Finding:** The codebase uses "magentic" in file names
+> (e.g., `orchestrator_magentic.py`, `magentic_agents.py`) but this is **NOT**
+> the `magentic` PyPI package by Jacky Liang. It's Microsoft Agent Framework.
+**The naming confusion:**
+- `magentic` (PyPI package): A different library for structured LLM outputs
+- "Magentic" (in our codebase): Our internal name for Microsoft Agent Framework integration
+- `agent-framework-core` (PyPI package): Microsoft's actual multi-agent orchestration framework
+**Recommended future action:** Rename `orchestrator_magentic.py` → `orchestrator_advanced.py`
+```
+**Status:** ✅ IMPLEMENTED
+---
+### 2. Bridge Complexity Warning
+**Your feedback:** "You must ensure MagenticState (context vars) propagates correctly through the pydantic-ai call stack."
+**Our change:** Added Section 10.1 in `01_ARCHITECTURE_SPEC.md`:
+```markdown
+### 10.1 Bridge Complexity (MEDIUM)
+**Risk:** In Advanced Mode, agents (Agent Framework) wrap handlers (pydantic-ai).
+Both are async. Context variables (`MagenticState`) must propagate correctly.
+**Mitigation:**
+- pydantic-ai uses standard Python `contextvars`, which naturally propagate through `await` chains
+- Test context propagation explicitly in integration tests
+- If issues arise, pass state explicitly rather than via context vars
+```
+**Status:** ✅ IMPLEMENTED
+---
+### 3. Integration Drift Warning
+**Your feedback:** "Simple Mode and Advanced Mode might diverge in behavior."
+**Our change:** Added Section 10.2 in `01_ARCHITECTURE_SPEC.md`:
+```markdown
+### 10.2 Integration Drift (MEDIUM)
+**Risk:** Simple Mode and Advanced Mode might diverge in behavior over time.
+**Mitigation:**
+- Both modes MUST call the exact same underlying Tools (`src/tools/*`) and Handlers (`src/agent_factory/*`)
+- Handlers are the single source of truth for business logic
+- Agents are thin wrappers that delegate to handlers
+```
+**Status:** ✅ IMPLEMENTED
+---
+### 4. Testing Burden Warning
+**Your feedback:** "You now have two distinct orchestrators to maintain. This doubles your integration testing surface area."
+**Our change:** Added Section 10.3 in `01_ARCHITECTURE_SPEC.md`:
+```markdown
+### 10.3 Testing Burden (LOW-MEDIUM)
+**Risk:** Two distinct orchestrators doubles integration testing surface area.
+**Mitigation:**
+- Unit test handlers independently (shared code)
+- Integration tests for each mode separately
+- End-to-end tests verify same output for same input
+```
+**Status:** ✅ IMPLEMENTED
+---
+### 5. Rename Recommendation
+**Your feedback:** "Rename `src/orchestrator_magentic.py` to `src/orchestrator_advanced.py`"
+**Our change:** Added Step 3.4 in `02_IMPLEMENTATION_PHASES.md`:
+```markdown
+### Step 3.4: (OPTIONAL) Rename "Magentic" to "Advanced"
+> **Senior Agent Recommendation:** Rename files to eliminate confusion.
+git mv src/orchestrator_magentic.py src/orchestrator_advanced.py
+git mv src/agents/magentic_agents.py src/agents/advanced_agents.py
+**Note:** This is optional for the hackathon. Can be done in a follow-up PR.
+```
+**Status:** ✅ DOCUMENTED (marked as optional for hackathon)
+---
+### 6. Standardize Wrapper Recommendation
+**Your feedback:** "Create a generic `PydanticAiAgentWrapper(BaseAgent)` class instead of manually wrapping each handler."
+**Our change:** NOT YET DOCUMENTED
+**Status:** ⚠️ NOT IMPLEMENTED - Should we add this?
+---
+## Questions for Your Review
+1. **Did we correctly implement your feedback?** Are there any misunderstandings in how we interpreted your recommendations?
+2. **Is the "Standardize Wrapper" recommendation critical?** Should we add it to the implementation phases, or is it a nice-to-have for later?
+3. **Dependency versioning:** You noted `agent-framework-core>=1.0.0b251120` might be ephemeral. Should we:
+   - Pin to a specific version?
+   - Use a version range?
+   - Install from GitHub source?
+4. **Anything else we missed?**
+---
+## Files to Re-Review
+1. `00_SITUATION_AND_PLAN.md` - Added Section 4 (Naming Clarification)
+2. `01_ARCHITECTURE_SPEC.md` - Added Sections 10-11 (Risks, Naming)
+3. `02_IMPLEMENTATION_PHASES.md` - Added Step 3.4 (Optional Rename)
+---
+## Current Branch State
+We are now on `feat/dual-mode-architecture` branched from `origin/dev`:
+- ✅ Agent framework code intact (`src/agents/`, `src/orchestrator_magentic.py`)
+- ✅ Documentation committed
+- ❌ PR #41 still open (need to close it)
+- ❌ Cherry-pick of pydantic-ai improvements not yet done
+---
+Please confirm: **GO / NO-GO** to proceed with Phase 1 (cherry-picking pydantic-ai improvements)?

docs/brainstorming/magentic-pydantic/REVIEW_PROMPT_FOR_SENIOR_AGENT.md ADDED Viewed

	@@ -0,0 +1,113 @@

+# Senior Agent Review Prompt
+Copy and paste everything below this line to a fresh Claude/AI session:
+---
+## Context
+I am a junior developer working on a HuggingFace hackathon project called DeepCritical. We made a significant architectural mistake and are now trying to course-correct. I need you to act as a **senior staff engineer** and critically review our proposed solution.
+## The Situation
+We almost merged a refactor that would have **deleted** our multi-agent orchestration capability, mistakenly believing that `pydantic-ai` (a library for structured LLM outputs) and Microsoft's `agent-framework` (a framework for multi-agent orchestration) were mutually exclusive alternatives.
+**They are not.** They are complementary:
+- `pydantic-ai` ensures LLM responses match Pydantic schemas (type-safe outputs)
+- `agent-framework` orchestrates multiple agents working together (coordination layer)
+We now want to implement a **dual-mode architecture** where:
+- **Simple Mode (No API key):** Uses only pydantic-ai with HuggingFace free tier
+- **Advanced Mode (With API key):** Uses Microsoft Agent Framework for orchestration, with pydantic-ai inside each agent for structured outputs
+## Your Task
+Please perform a **deep, critical review** of:
+1. **The architecture diagram** (image attached: `assets/magentic-pydantic.png`)
+2. **Our documentation** (4 files listed below)
+3. **The actual codebase** to verify our claims
+## Specific Questions to Answer
+### Architecture Validation
+1. Is our understanding correct that pydantic-ai and agent-framework are complementary, not competing?
+2. Does the dual-mode architecture diagram accurately represent how these should integrate?
+3. Are there any architectural flaws or anti-patterns in our proposed design?
+### Documentation Accuracy
+4. Are the branch states we documented accurate? (Check `git log`, `git ls-tree`)
+5. Is our understanding of what code exists where correct?
+6. Are the implementation phases realistic and in the correct order?
+7. Are there any missing steps or dependencies we overlooked?
+### Codebase Reality Check
+8. Does `origin/dev` actually have the agent framework code intact? Verify by checking:
+   - `git ls-tree origin/dev -- src/agents/`
+   - `git ls-tree origin/dev -- src/orchestrator_magentic.py`
+9. What does the current `src/agents/` code actually import? Does it use `agent_framework` or `agent-framework-core`?
+10. Is the `agent-framework-core` package actually available on PyPI, or do we need to install from source?
+### Implementation Feasibility
+11. Can the cherry-pick strategy we outlined actually work, or are there merge conflicts we're not seeing?
+12. Is the mode auto-detection logic sound?
+13. What are the risks we haven't identified?
+### Critical Errors Check
+14. Did we miss anything critical in our analysis?
+15. Are there any factual errors in our documentation?
+16. Would a Google/DeepMind senior engineer approve this plan, or would they flag issues?
+## Files to Review
+Please read these files in order:
+1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/00_SITUATION_AND_PLAN.md`
+2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/01_ARCHITECTURE_SPEC.md`
+3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/02_IMPLEMENTATION_PHASES.md`
+4. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/docs/brainstorming/magentic-pydantic/03_IMMEDIATE_ACTIONS.md`
+And the architecture diagram:
+5. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/assets/magentic-pydantic.png`
+## Reference Repositories to Consult
+We have local clones of the source-of-truth repositories:
+- **Original DeepCritical:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/DeepCritical/`
+- **Microsoft Agent Framework:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/agent-framework/`
+- **Microsoft AutoGen:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/reference_repos/autogen-microsoft/`
+Please cross-reference our hackathon fork against these to verify architectural alignment.
+## Codebase to Analyze
+Our hackathon fork is at:
+`/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepCritical-1/`
+Key files to examine:
+- `src/agents/` - Agent framework integration
+- `src/agent_factory/judges.py` - pydantic-ai integration
+- `src/orchestrator.py` - Simple mode orchestrator
+- `src/orchestrator_magentic.py` - Advanced mode orchestrator
+- `src/orchestrator_factory.py` - Mode selection
+- `pyproject.toml` - Dependencies
+## Expected Output
+Please provide:
+1. **Validation Summary:** Is our plan sound? (YES/NO with explanation)
+2. **Errors Found:** List any factual errors in our documentation
+3. **Missing Items:** What did we overlook?
+4. **Risk Assessment:** What could go wrong?
+5. **Recommended Changes:** Specific edits to our documentation or plan
+6. **Go/No-Go Recommendation:** Should we proceed with this plan?
+## Tone
+Be brutally honest. If our plan is flawed, say so directly. We would rather know now than after implementation. Don't soften criticism - we need accuracy.
+---
+END OF PROMPT

pyproject.toml CHANGED Viewed

@@ -44,7 +44,7 @@ dev = [
     "pre-commit>=3.7",
 ]
 magentic = [
-    "agent-framework-core>=1.0.0b251120,<2.0.0",  # Pin to avoid breaking changes
 ]
 embeddings = [
     "chromadb>=0.4.0",

     "pre-commit>=3.7",
 ]
 magentic = [
+    "agent-framework-core>=1.0.0b251120,<2.0.0",  # Microsoft Agent Framework (PyPI)
 ]
 embeddings = [
     "chromadb>=0.4.0",

src/agent_factory/judges.py CHANGED Viewed

@@ -8,8 +8,10 @@ import structlog
 from huggingface_hub import InferenceClient
 from pydantic_ai import Agent
 from pydantic_ai.models.anthropic import AnthropicModel
 from pydantic_ai.models.openai import OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
@@ -36,6 +38,12 @@ def get_model() -> Any:
         provider = AnthropicProvider(api_key=settings.anthropic_api_key)
         return AnthropicModel(settings.anthropic_model, provider=provider)
     if llm_provider != "openai":
         logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)

 from huggingface_hub import InferenceClient
 from pydantic_ai import Agent
 from pydantic_ai.models.anthropic import AnthropicModel
+from pydantic_ai.models.huggingface import HuggingFaceModel
 from pydantic_ai.models.openai import OpenAIModel
 from pydantic_ai.providers.anthropic import AnthropicProvider
+from pydantic_ai.providers.huggingface import HuggingFaceProvider
 from pydantic_ai.providers.openai import OpenAIProvider
 from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_exponential
         provider = AnthropicProvider(api_key=settings.anthropic_api_key)
         return AnthropicModel(settings.anthropic_model, provider=provider)
+    if llm_provider == "huggingface":
+        # Free tier - uses HF_TOKEN from environment if available
+        model_name = settings.huggingface_model or "meta-llama/Llama-3.1-70B-Instruct"
+        hf_provider = HuggingFaceProvider(api_key=settings.hf_token)
+        return HuggingFaceModel(model_name, provider=hf_provider)
     if llm_provider != "openai":
         logger.warning("Unknown LLM provider, defaulting to OpenAI", provider=llm_provider)

src/app.py CHANGED Viewed

@@ -31,7 +31,7 @@ def configure_orchestrator(
     Args:
         use_mock: If True, use MockJudgeHandler (no API key needed)
-        mode: Orchestrator mode ("simple" or "magentic")
         user_api_key: Optional user-provided API key (BYOK)
         api_provider: API provider ("openai" or "anthropic")
@@ -115,7 +115,7 @@ async def research_agent(
     Args:
         message: User's research question
         history: Chat history (Gradio format)
-        mode: Orchestrator mode ("simple" or "magentic")
         api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
         api_provider: API provider ("openai" or "anthropic")
@@ -135,10 +135,11 @@ async def research_agent(
     has_user_key = bool(user_api_key)
     has_paid_key = has_openai or has_anthropic or has_user_key
-    # Magentic mode requires OpenAI specifically
-    if mode == "magentic" and not (has_openai or (has_user_key and api_provider == "openai")):
         yield (
-            "⚠️ **Warning**: Magentic mode requires OpenAI API key. Falling back to simple mode.\n\n"
         )
         mode = "simple"
@@ -227,10 +228,13 @@ def create_demo() -> gr.ChatInterface:
         additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
         additional_inputs=[
             gr.Radio(
-                choices=["simple", "magentic"],
                 value="simple",
                 label="Orchestrator Mode",
-                info="Simple: Linear | Magentic: Multi-Agent (OpenAI)",
             ),
             gr.Textbox(
                 label="🔑 API Key (Optional - BYOK)",

     Args:
         use_mock: If True, use MockJudgeHandler (no API key needed)
+        mode: Orchestrator mode ("simple" or "advanced")
         user_api_key: Optional user-provided API key (BYOK)
         api_provider: API provider ("openai" or "anthropic")
     Args:
         message: User's research question
         history: Chat history (Gradio format)
+        mode: Orchestrator mode ("simple" or "advanced")
         api_key: Optional user-provided API key (BYOK - Bring Your Own Key)
         api_provider: API provider ("openai" or "anthropic")
     has_user_key = bool(user_api_key)
     has_paid_key = has_openai or has_anthropic or has_user_key
+    # Advanced mode requires OpenAI specifically (due to agent-framework binding)
+    if mode == "advanced" and not (has_openai or (has_user_key and api_provider == "openai")):
         yield (
+            "⚠️ **Warning**: Advanced mode currently requires OpenAI API key. "
+            "Falling back to simple mode.\n\n"
         )
         mode = "simple"
         additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
         additional_inputs=[
             gr.Radio(
+                choices=["simple", "advanced"],
                 value="simple",
                 label="Orchestrator Mode",
+                info=(
+                    "Simple: Linear (Free Tier Friendly) | "
+                    "Advanced: Multi-Agent (Requires OpenAI)"
+                ),
             ),
             gr.Textbox(
                 label="🔑 API Key (Optional - BYOK)",

src/orchestrator_factory.py CHANGED Viewed

@@ -2,15 +2,34 @@
 from typing import Any, Literal
 from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
 from src.utils.models import OrchestratorConfig
 def create_orchestrator(
     search_handler: SearchHandlerProtocol | None = None,
     judge_handler: JudgeHandlerProtocol | None = None,
     config: OrchestratorConfig | None = None,
-    mode: Literal["simple", "magentic"] = "simple",
 ) -> Any:
     """
     Create an orchestrator instance.
@@ -19,25 +38,19 @@ def create_orchestrator(
         search_handler: The search handler (required for simple mode)
         judge_handler: The judge handler (required for simple mode)
         config: Optional configuration
-        mode: "simple" for Phase 4 loop, "magentic" for ChatAgent-based multi-agent
     Returns:
         Orchestrator instance
-    Note:
-        Magentic mode does NOT use search_handler/judge_handler.
-        It creates ChatAgent instances with internal LLMs that call tools directly.
     """
-    if mode == "magentic":
-        try:
-            from src.orchestrator_magentic import MagenticOrchestrator
-            return MagenticOrchestrator(
-                max_rounds=config.max_iterations if config else 10,
-            )
-        except ImportError:
-            # Fallback to simple if agent-framework not installed
-            pass
     # Simple mode requires handlers
     if search_handler is None or judge_handler is None:
@@ -48,3 +61,17 @@ def create_orchestrator(
         judge_handler=judge_handler,
         config=config,
     )

 from typing import Any, Literal
+import structlog
 from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
+from src.utils.config import settings
 from src.utils.models import OrchestratorConfig
+logger = structlog.get_logger()
+def _get_magentic_orchestrator_class() -> Any:
+    """Import MagenticOrchestrator lazily to avoid hard dependency."""
+    try:
+        from src.orchestrator_magentic import MagenticOrchestrator
+        return MagenticOrchestrator
+    except ImportError as e:
+        logger.error("Failed to import MagenticOrchestrator", error=str(e))
+        raise ValueError(
+            "Advanced mode requires agent-framework-core. "
+            "Please install it or use mode='simple'."
+        ) from e
 def create_orchestrator(
     search_handler: SearchHandlerProtocol | None = None,
     judge_handler: JudgeHandlerProtocol | None = None,
     config: OrchestratorConfig | None = None,
+    mode: Literal["simple", "magentic", "advanced"] | None = None,
 ) -> Any:
     """
     Create an orchestrator instance.
         search_handler: The search handler (required for simple mode)
         judge_handler: The judge handler (required for simple mode)
         config: Optional configuration
+        mode: "simple", "magentic", "advanced" or None (auto-detect)
     Returns:
         Orchestrator instance
     """
+    effective_mode = _determine_mode(mode)
+    logger.info("Creating orchestrator", mode=effective_mode)
+    if effective_mode == "advanced":
+        orchestrator_cls = _get_magentic_orchestrator_class()
+        return orchestrator_cls(
+            max_rounds=config.max_iterations if config else 10,
+        )
     # Simple mode requires handlers
     if search_handler is None or judge_handler is None:
         judge_handler=judge_handler,
         config=config,
     )
+def _determine_mode(explicit_mode: str | None) -> str:
+    """Determine which mode to use."""
+    if explicit_mode:
+        if explicit_mode in ("magentic", "advanced"):
+            return "advanced"
+        return "simple"
+    # Auto-detect: advanced if paid API key available
+    if settings.has_openai_key:
+        return "advanced"
+    return "simple"

src/utils/config.py CHANGED Viewed

@@ -23,13 +23,20 @@ class Settings(BaseSettings):
     # LLM Configuration
     openai_api_key: str | None = Field(default=None, description="OpenAI API key")
     anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
-    llm_provider: Literal["openai", "anthropic"] = Field(
         default="openai", description="Which LLM provider to use"
     )
     openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
     anthropic_model: str = Field(
         default="claude-sonnet-4-5-20250929", description="Anthropic model"
     )
     # Embedding Configuration
     # Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
@@ -97,10 +104,15 @@ class Settings(BaseSettings):
         """Check if Anthropic API key is available."""
         return bool(self.anthropic_api_key)
     @property
     def has_any_llm_key(self) -> bool:
         """Check if any LLM API key is available."""
-        return self.has_openai_key or self.has_anthropic_key
 def get_settings() -> Settings:

     # LLM Configuration
     openai_api_key: str | None = Field(default=None, description="OpenAI API key")
     anthropic_api_key: str | None = Field(default=None, description="Anthropic API key")
+    llm_provider: Literal["openai", "anthropic", "huggingface"] = Field(
         default="openai", description="Which LLM provider to use"
     )
     openai_model: str = Field(default="gpt-5.1", description="OpenAI model name")
     anthropic_model: str = Field(
         default="claude-sonnet-4-5-20250929", description="Anthropic model"
     )
+    # HuggingFace (free tier)
+    huggingface_model: str | None = Field(
+        default="meta-llama/Llama-3.1-70B-Instruct", description="HuggingFace model name"
+    )
+    hf_token: str | None = Field(
+        default=None, alias="HF_TOKEN", description="HuggingFace API token"
+    )
     # Embedding Configuration
     # Note: OpenAI embeddings require OPENAI_API_KEY (Anthropic has no embeddings API)
         """Check if Anthropic API key is available."""
         return bool(self.anthropic_api_key)
+    @property
+    def has_huggingface_key(self) -> bool:
+        """Check if HuggingFace token is available."""
+        return bool(self.hf_token)
     @property
     def has_any_llm_key(self) -> bool:
         """Check if any LLM API key is available."""
+        return self.has_openai_key or self.has_anthropic_key or self.has_huggingface_key
 def get_settings() -> Settings:

tests/integration/test_dual_mode_e2e.py ADDED Viewed

	@@ -0,0 +1,82 @@

+"""End-to-End Integration Tests for Dual-Mode Architecture."""
+from unittest.mock import AsyncMock, MagicMock, patch
+import pytest
+pytestmark = [pytest.mark.integration, pytest.mark.slow]
+from src.orchestrator_factory import create_orchestrator
+from src.utils.models import Citation, Evidence, OrchestratorConfig
+@pytest.fixture
+def mock_search_handler():
+    handler = MagicMock()
+    handler.execute = AsyncMock(
+        return_value=[
+            Evidence(
+                citation=Citation(
+                    title="Test Paper", url="http://test", date="2024", source="pubmed"
+                ),
+                content="Metformin increases lifespan in mice.",
+            )
+        ]
+    )
+    return handler
+@pytest.fixture
+def mock_judge_handler():
+    handler = MagicMock()
+    # Mock return value of assess
+    assessment = MagicMock()
+    assessment.sufficient = True
+    assessment.recommendation = "synthesize"
+    handler.assess = AsyncMock(return_value=assessment)
+    return handler
+@pytest.mark.asyncio
+async def test_simple_mode_e2e(mock_search_handler, mock_judge_handler):
+    """Test Simple Mode Orchestration flow."""
+    orch = create_orchestrator(
+        search_handler=mock_search_handler,
+        judge_handler=mock_judge_handler,
+        mode="simple",
+        config=OrchestratorConfig(max_iterations=1),
+    )
+    # Run
+    results = []
+    async for event in orch.run("Test query"):
+        results.append(event)
+    assert len(results) > 0
+    assert mock_search_handler.execute.called
+    assert mock_judge_handler.assess.called
+@pytest.mark.asyncio
+async def test_advanced_mode_explicit_instantiation():
+    """Test explicit Advanced Mode instantiation (not auto-detect).
+    This tests the explicit mode="advanced" path, verifying that
+    MagenticOrchestrator can be instantiated when explicitly requested.
+    The settings patch ensures any internal checks pass.
+    """
+    with patch("src.orchestrator_factory.settings") as mock_settings:
+        # Settings patch ensures factory checks pass (even though mode is explicit)
+        mock_settings.has_openai_key = True
+        with patch("src.agents.magentic_agents.OpenAIChatClient"):
+            # Mock agent creation to avoid real API calls during init
+            with (
+                patch("src.orchestrator_magentic.create_search_agent"),
+                patch("src.orchestrator_magentic.create_judge_agent"),
+                patch("src.orchestrator_magentic.create_hypothesis_agent"),
+                patch("src.orchestrator_magentic.create_report_agent"),
+            ):
+                # Explicit mode="advanced" - tests the explicit path, not auto-detect
+                orch = create_orchestrator(mode="advanced")
+                assert orch is not None

tests/unit/agent_factory/test_judges_factory.py ADDED Viewed

	@@ -0,0 +1,64 @@

+"""Unit tests for Judge Factory and Model Selection."""
+from unittest.mock import patch
+import pytest
+pytestmark = pytest.mark.unit
+from pydantic_ai.models.anthropic import AnthropicModel
+# We expect this import to exist after we implement it, or we mock it if it's not there yet
+# For TDD, we assume we will use the library class
+from pydantic_ai.models.huggingface import HuggingFaceModel
+from pydantic_ai.models.openai import OpenAIModel
+from src.agent_factory.judges import get_model
+@pytest.fixture
+def mock_settings():
+    with patch("src.agent_factory.judges.settings", autospec=True) as mock_settings:
+        yield mock_settings
+def test_get_model_openai(mock_settings):
+    """Test that OpenAI model is returned when provider is openai."""
+    mock_settings.llm_provider = "openai"
+    mock_settings.openai_api_key = "sk-test"
+    mock_settings.openai_model = "gpt-4o"
+    model = get_model()
+    assert isinstance(model, OpenAIModel)
+    assert model.model_name == "gpt-4o"
+def test_get_model_anthropic(mock_settings):
+    """Test that Anthropic model is returned when provider is anthropic."""
+    mock_settings.llm_provider = "anthropic"
+    mock_settings.anthropic_api_key = "sk-ant-test"
+    mock_settings.anthropic_model = "claude-3-5-sonnet"
+    model = get_model()
+    assert isinstance(model, AnthropicModel)
+    assert model.model_name == "claude-3-5-sonnet"
+def test_get_model_huggingface(mock_settings):
+    """Test that HuggingFace model is returned when provider is huggingface."""
+    mock_settings.llm_provider = "huggingface"
+    mock_settings.hf_token = "hf_test_token"
+    mock_settings.huggingface_model = "meta-llama/Llama-3.1-70B-Instruct"
+    model = get_model()
+    assert isinstance(model, HuggingFaceModel)
+    assert model.model_name == "meta-llama/Llama-3.1-70B-Instruct"
+def test_get_model_default_fallback(mock_settings):
+    """Test fallback to OpenAI if provider is unknown."""
+    mock_settings.llm_provider = "unknown_provider"
+    mock_settings.openai_api_key = "sk-test"
+    mock_settings.openai_model = "gpt-4o"
+    model = get_model()
+    assert isinstance(model, OpenAIModel)

tests/unit/agents/test_agent_imports.py ADDED Viewed

	@@ -0,0 +1,32 @@

+"""Test that agent framework dependencies are importable and usable."""
+from unittest.mock import MagicMock
+import pytest
+pytestmark = pytest.mark.unit
+# Import conditional on package availability, but for this test we expect it to be there
+try:
+    from agent_framework import ChatAgent
+    from agent_framework.openai import OpenAIChatClient
+except ImportError:
+    ChatAgent = None
+    OpenAIChatClient = None
+@pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
+def test_agent_framework_import():
+    """Test that agent_framework can be imported."""
+    assert ChatAgent is not None
+    assert OpenAIChatClient is not None  # Verify both imports work
+@pytest.mark.skipif(ChatAgent is None, reason="agent-framework-core not installed")
+def test_chat_agent_instantiation():
+    """Test that ChatAgent can be instantiated with a mock client."""
+    mock_client = MagicMock()
+    # We assume ChatAgent takes chat_client as first argument based on _agents.py source
+    agent = ChatAgent(chat_client=mock_client, name="TestAgent")
+    assert agent.name == "TestAgent"
+    assert agent.chat_client == mock_client

tests/unit/test_orchestrator_factory.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""Unit tests for Orchestrator Factory."""
+from unittest.mock import MagicMock, patch
+import pytest
+pytestmark = pytest.mark.unit
+from src.orchestrator import Orchestrator
+from src.orchestrator_factory import create_orchestrator
+@pytest.fixture
+def mock_settings():
+    with patch("src.orchestrator_factory.settings", autospec=True) as mock_settings:
+        yield mock_settings
+@pytest.fixture
+def mock_magentic_cls():
+    with patch("src.orchestrator_factory._get_magentic_orchestrator_class") as mock:
+        # The mock returns a class (callable), which returns an instance
+        mock_class = MagicMock()
+        mock.return_value = mock_class
+        yield mock_class
+@pytest.fixture
+def mock_handlers():
+    return MagicMock(), MagicMock()
+def test_create_orchestrator_simple_explicit(mock_settings, mock_handlers):
+    """Test explicit simple mode."""
+    search, judge = mock_handlers
+    orch = create_orchestrator(search_handler=search, judge_handler=judge, mode="simple")
+    assert isinstance(orch, Orchestrator)
+def test_create_orchestrator_advanced_explicit(mock_settings, mock_handlers, mock_magentic_cls):
+    """Test explicit advanced mode."""
+    # Ensure has_openai_key is True so it doesn't error if we add checks
+    mock_settings.has_openai_key = True
+    orch = create_orchestrator(mode="advanced")
+    # verify instantiated
+    mock_magentic_cls.assert_called_once()
+    assert orch == mock_magentic_cls.return_value
+def test_create_orchestrator_auto_advanced(mock_settings, mock_magentic_cls):
+    """Test auto-detect advanced mode when OpenAI key exists."""
+    mock_settings.has_openai_key = True
+    orch = create_orchestrator()
+    mock_magentic_cls.assert_called_once()
+    assert orch == mock_magentic_cls.return_value
+def test_create_orchestrator_auto_simple(mock_settings, mock_handlers):
+    """Test auto-detect simple mode when no paid keys."""
+    mock_settings.has_openai_key = False
+    search, judge = mock_handlers
+    orch = create_orchestrator(search_handler=search, judge_handler=judge)
+    assert isinstance(orch, Orchestrator)