Spaces:

DataQuests
/

DeepCritical

Running

App Files Files Community

Joseph Pollack commited on 10 days ago

Commit

731a241

unverified ·

1 Parent(s): 1515e72

adds the initial iterative and deep research workflows

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +3 -0
.pre-commit-config.yaml +1 -0
AGENTS.md +0 -118
CLAUDE.md +0 -111
GEMINI.md +0 -98
docs/CONFIGURATION.md +291 -0
docs/architecture/graph_orchestration.md +141 -0
docs/examples/writer_agents_usage.md +415 -0
docs/implementation/02_phase_search.md +31 -19
pyproject.toml +11 -0
src/agent_factory/agents.py +339 -0
src/agent_factory/graph_builder.py +608 -0
src/agent_factory/judges.py +9 -0
src/agents/input_parser.py +178 -0
src/agents/judge_agent.py +1 -1
src/agents/knowledge_gap.py +156 -0
src/agents/long_writer.py +431 -0
src/agents/proofreader.py +205 -0
src/agents/search_agent.py +1 -1
src/agents/state.py +27 -5
src/agents/thinking.py +148 -0
src/agents/tool_selector.py +168 -0
src/agents/writer.py +209 -0
src/{orchestrator.py → legacy_orchestrator.py} +0 -0
src/middleware/__init__.py +33 -0
src/middleware/budget_tracker.py +390 -0
src/middleware/state_machine.py +129 -0
src/middleware/workflow_manager.py +322 -0
src/orchestrator/__init__.py +48 -0
src/orchestrator/graph_orchestrator.py +953 -0
src/orchestrator/planner_agent.py +174 -0
src/orchestrator/research_flow.py +999 -0
src/orchestrator_factory.py +1 -1
src/tools/__init__.py +8 -1
src/tools/crawl_adapter.py +58 -0
src/tools/rag_tool.py +183 -0
src/tools/search_handler.py +67 -5
src/tools/tool_executor.py +193 -0
src/tools/web_search_adapter.py +63 -0
src/utils/citation_validator.py +91 -0
src/utils/config.py +98 -0
src/utils/models.py +267 -1
tests/integration/test_deep_research.py +352 -0
tests/integration/test_middleware_integration.py +245 -0
tests/integration/test_parallel_loops_judge.py +396 -0
tests/integration/test_rag_integration.py +343 -0
tests/integration/test_research_flows.py +584 -0
tests/unit/agent_factory/test_graph_builder.py +439 -0
tests/unit/agents/test_input_parser.py +325 -0
tests/unit/agents/test_long_writer.py +509 -0

.gitignore CHANGED Viewed

@@ -1,3 +1,6 @@
 # Python
 __pycache__/
 *.py[cod]

+folder/
+.cursor/
+.ruff_cache/
 # Python
 __pycache__/
 *.py[cod]

.pre-commit-config.yaml CHANGED Viewed

@@ -13,6 +13,7 @@ repos:
     hooks:
       - id: mypy
         files: ^src/
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2

     hooks:
       - id: mypy
         files: ^src/
+        exclude: ^folder
         additional_dependencies:
           - pydantic>=2.7
           - pydantic-settings>=2.2

AGENTS.md DELETED Viewed

@@ -1,118 +0,0 @@
-# AGENTS.md
-This file provides guidance to AI agents when working with code in this repository.
-## Project Overview
-DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
-**Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
-## Development Commands
-```bash
-# Install all dependencies (including dev)
-make install   # or: uv sync --all-extras && uv run pre-commit install
-# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
-make check
-# Individual commands
-make test        # uv run pytest tests/unit/ -v
-make lint        # uv run ruff check src tests
-make format      # uv run ruff format src tests
-make typecheck   # uv run mypy src
-make test-cov    # uv run pytest --cov=src --cov-report=term-missing
-# Run single test
-uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
-# Integration tests (real APIs)
-uv run pytest -m integration
-```
-## Architecture
-**Pattern**: Search-and-judge loop with multi-tool orchestration.
-```text
-User Question → Orchestrator
-    ↓
-Search Loop:
-  1. Query PubMed, ClinicalTrials.gov, bioRxiv
-  2. Gather evidence
-  3. Judge quality ("Do we have enough?")
-  4. If NO → Refine query, search more
-  5. If YES → Synthesize findings (+ optional Modal analysis)
-    ↓
-Research Report with Citations
-```
-**Key Components**:
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/tools/search_handler.py` - Scatter-gather orchestration
-- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/agent_factory/judges.py` - LLM-based evidence assessment
-- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
-- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
-- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
-- `src/utils/models.py` - Evidence, Citation, SearchResult models
-- `src/utils/exceptions.py` - Exception hierarchy
-- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
-**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Exception Hierarchy
-```text
-DeepCriticalError (base)
-├── SearchError
-│   └── RateLimitError
-├── JudgeError
-└── ConfigurationError
-```
-## Testing
-- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
-- **Markers**: `unit`, `integration`, `slow`
-- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
-- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
-## Coding Standards
-- Python 3.11+, strict mypy, ruff (100-char lines)
-- Type all functions, use Pydantic models for data
-- Use `structlog` for logging, not print
-- Conventional commits: `feat(scope):`, `fix:`, `docs:`
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

CLAUDE.md DELETED Viewed

@@ -1,111 +0,0 @@
-# CLAUDE.md
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
-## Project Overview
-DeepCritical is an AI-native drug repurposing research agent for a HuggingFace hackathon. It uses a search-and-judge loop to autonomously search biomedical databases (PubMed, ClinicalTrials.gov, bioRxiv) and synthesize evidence for queries like "What existing drugs might help treat long COVID fatigue?".
-**Current Status:** Phases 1-13 COMPLETE (Foundation through Modal sandbox integration).
-## Development Commands
-```bash
-# Install all dependencies (including dev)
-make install   # or: uv sync --all-extras && uv run pre-commit install
-# Run all quality checks (lint + typecheck + test) - MUST PASS BEFORE COMMIT
-make check
-# Individual commands
-make test        # uv run pytest tests/unit/ -v
-make lint        # uv run ruff check src tests
-make format      # uv run ruff format src tests
-make typecheck   # uv run mypy src
-make test-cov    # uv run pytest --cov=src --cov-report=term-missing
-# Run single test
-uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default_max_iterations -v
-# Integration tests (real APIs)
-uv run pytest -m integration
-```
-## Architecture
-**Pattern**: Search-and-judge loop with multi-tool orchestration.
-```text
-User Question → Orchestrator
-    ↓
-Search Loop:
-  1. Query PubMed, ClinicalTrials.gov, bioRxiv
-  2. Gather evidence
-  3. Judge quality ("Do we have enough?")
-  4. If NO → Refine query, search more
-  5. If YES → Synthesize findings (+ optional Modal analysis)
-    ↓
-Research Report with Citations
-```
-**Key Components**:
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/tools/search_handler.py` - Scatter-gather orchestration
-- `src/services/embeddings.py` - Semantic search & deduplication (ChromaDB)
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/agent_factory/judges.py` - LLM-based evidence assessment
-- `src/agents/` - Magentic multi-agent mode (SearchAgent, JudgeAgent, etc.)
-- `src/mcp_tools.py` - MCP tool wrappers for Claude Desktop
-- `src/utils/config.py` - Pydantic Settings (loads from `.env`)
-- `src/utils/models.py` - Evidence, Citation, SearchResult models
-- `src/utils/exceptions.py` - Exception hierarchy
-- `src/app.py` - Gradio UI with MCP server (HuggingFace Spaces)
-**Break Conditions**: Judge approval, token budget (50K max), or max iterations (default 10).
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Exception Hierarchy
-```text
-DeepCriticalError (base)
-├── SearchError
-│   └── RateLimitError
-├── JudgeError
-└── ConfigurationError
-```
-## Testing
-- **TDD**: Write tests first in `tests/unit/`, implement in `src/`
-- **Markers**: `unit`, `integration`, `slow`
-- **Mocking**: `respx` for httpx, `pytest-mock` for general mocking
-- **Fixtures**: `tests/conftest.py` has `mock_httpx_client`, `mock_llm_response`
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

GEMINI.md DELETED Viewed

@@ -1,98 +0,0 @@
-# DeepCritical Context
-## Project Overview
-**DeepCritical** is an AI-native Medical Drug Repurposing Research Agent.
-**Goal:** To accelerate the discovery of new uses for existing drugs by intelligently searching biomedical literature (PubMed, ClinicalTrials.gov, bioRxiv), evaluating evidence, and hypothesizing potential applications.
-**Architecture:**
-The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orchestrator) and adheres to **Strict TDD** (Test-Driven Development).
-**Current Status:**
-- **Phases 1-9:** COMPLETE. Foundation, Search, Judge, UI, Orchestrator, Embeddings, Hypothesis, Report, Cleanup.
-- **Phases 10-11:** COMPLETE. ClinicalTrials.gov and bioRxiv integration.
-- **Phase 12:** COMPLETE. MCP Server integration (Gradio MCP at `/gradio_api/mcp/`).
-- **Phase 13:** COMPLETE. Modal sandbox for statistical analysis.
-## Tech Stack & Tooling
-- **Language:** Python 3.11 (Pinned)
-- **Package Manager:** `uv` (Rust-based, extremely fast)
-- **Frameworks:** `pydantic`, `pydantic-ai`, `httpx`, `gradio[mcp]`
-- **Vector DB:** `chromadb` with `sentence-transformers` for semantic search
-- **Code Execution:** `modal` for secure sandboxed Python execution
-- **Testing:** `pytest`, `pytest-asyncio`, `respx` (for mocking)
-- **Quality:** `ruff` (linting/formatting), `mypy` (strict type checking), `pre-commit`
-## Building & Running
-| Command | Description |
-| :--- | :--- |
-| `make install` | Install dependencies and pre-commit hooks. |
-| `make test` | Run unit tests. |
-| `make lint` | Run Ruff linter. |
-| `make format` | Run Ruff formatter. |
-| `make typecheck` | Run Mypy static type checker. |
-| `make check` | **The Golden Gate:** Runs lint, typecheck, and test. Must pass before committing. |
-| `make clean` | Clean up cache and artifacts. |
-## Directory Structure
-- `src/`: Source code
-  - `utils/`: Shared utilities (`config.py`, `exceptions.py`, `models.py`)
-  - `tools/`: Search tools (`pubmed.py`, `clinicaltrials.py`, `biorxiv.py`, `code_execution.py`)
-  - `services/`: Services (`embeddings.py`, `statistical_analyzer.py`)
-  - `agents/`: Magentic multi-agent mode agents
-  - `agent_factory/`: Agent definitions (judges, prompts)
-  - `mcp_tools.py`: MCP tool wrappers for Claude Desktop integration
-  - `app.py`: Gradio UI with MCP server
-- `tests/`: Test suite
-  - `unit/`: Isolated unit tests (Mocked)
-  - `integration/`: Real API tests (Marked as slow/integration)
-- `docs/`: Documentation and Implementation Specs
-- `examples/`: Working demos for each phase
-## Key Components
-- `src/orchestrator.py` - Main agent loop
-- `src/tools/pubmed.py` - PubMed E-utilities search
-- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
-- `src/tools/biorxiv.py` - bioRxiv/medRxiv preprint search
-- `src/tools/code_execution.py` - Modal sandbox execution
-- `src/services/statistical_analyzer.py` - Statistical analysis via Modal
-- `src/mcp_tools.py` - MCP tool wrappers
-- `src/app.py` - Gradio UI (HuggingFace Spaces) with MCP server
-## Configuration
-Settings via pydantic-settings from `.env`:
-- `LLM_PROVIDER`: "openai" or "anthropic"
-- `OPENAI_API_KEY` / `ANTHROPIC_API_KEY`: LLM keys
-- `NCBI_API_KEY`: Optional, for higher PubMed rate limits
-- `MODAL_TOKEN_ID` / `MODAL_TOKEN_SECRET`: For Modal sandbox (optional)
-- `MAX_ITERATIONS`: 1-50, default 10
-- `LOG_LEVEL`: DEBUG, INFO, WARNING, ERROR
-## Development Conventions
-1. **Strict TDD:** Write failing tests in `tests/unit/` *before* implementing logic in `src/`.
-2. **Type Safety:** All code must pass `mypy --strict`. Use Pydantic models for data exchange.
-3. **Linting:** Zero tolerance for Ruff errors.
-4. **Mocking:** Use `respx` or `unittest.mock` for all external API calls in unit tests.
-5. **Vertical Slices:** Implement features end-to-end rather than layer-by-layer.
-## Git Workflow
-- `main`: Production-ready (GitHub)
-- `dev`: Development integration (GitHub)
-- Remote `origin`: GitHub (source of truth for PRs/code review)
-- Remote `huggingface-upstream`: HuggingFace Spaces (deployment target)
-**HuggingFace Spaces Collaboration:**
-- Each contributor should use their own dev branch: `yourname-dev` (e.g., `vcms-dev`, `mario-dev`)
-- **DO NOT push directly to `main` or `dev` on HuggingFace** - these can be overwritten easily
-- GitHub is the source of truth; HuggingFace is for deployment/demo
-- Consider using git hooks to prevent accidental pushes to protected branches

docs/CONFIGURATION.md ADDED Viewed

	@@ -0,0 +1,291 @@

+# Configuration Guide
+## Overview
+DeepCritical uses **Pydantic Settings** for centralized configuration management. All settings are defined in `src/utils/config.py` and can be configured via environment variables or a `.env` file.
+## Quick Start
+1. Copy the example environment file (if available) or create a `.env` file in the project root
+2. Set at least one LLM API key (`OPENAI_API_KEY` or `ANTHROPIC_API_KEY`)
+3. Optionally configure other services as needed
+## Configuration System
+### How It Works
+- **Settings Class**: `Settings` class in `src/utils/config.py` extends `BaseSettings` from `pydantic_settings`
+- **Environment File**: Automatically loads from `.env` file (if present)
+- **Environment Variables**: Reads from environment variables (case-insensitive)
+- **Type Safety**: Strongly-typed fields with validation
+- **Singleton Pattern**: Global `settings` instance for easy access
+### Usage
+```python
+from src.utils.config import settings
+# Check if API keys are available
+if settings.has_openai_key:
+    # Use OpenAI
+    pass
+# Access configuration values
+max_iterations = settings.max_iterations
+web_search_provider = settings.web_search_provider
+```
+## Required Configuration
+### At Least One LLM Provider
+You must configure at least one LLM provider:
+**OpenAI:**
+```bash
+LLM_PROVIDER=openai
+OPENAI_API_KEY=your_openai_api_key_here
+OPENAI_MODEL=gpt-5.1
+```
+**Anthropic:**
+```bash
+LLM_PROVIDER=anthropic
+ANTHROPIC_API_KEY=your_anthropic_api_key_here
+ANTHROPIC_MODEL=claude-sonnet-4-5-20250929
+```
+## Optional Configuration
+### Embedding Configuration
+```bash
+# Embedding Provider: "openai", "local", or "huggingface"
+EMBEDDING_PROVIDER=local
+# OpenAI Embedding Model (used by LlamaIndex RAG)
+OPENAI_EMBEDDING_MODEL=text-embedding-3-small
+# Local Embedding Model (sentence-transformers)
+LOCAL_EMBEDDING_MODEL=all-MiniLM-L6-v2
+# HuggingFace Embedding Model
+HUGGINGFACE_EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
+```
+### HuggingFace Configuration
+```bash
+# HuggingFace API Token (for inference API)
+HUGGINGFACE_API_KEY=your_huggingface_api_key_here
+# Or use HF_TOKEN (alternative name)
+# Default HuggingFace Model ID
+HUGGINGFACE_MODEL=meta-llama/Llama-3.1-8B-Instruct
+```
+### Web Search Configuration
+```bash
+# Web Search Provider: "serper", "searchxng", "brave", "tavily", or "duckduckgo"
+# Default: "duckduckgo" (no API key required)
+WEB_SEARCH_PROVIDER=duckduckgo
+# Serper API Key (for Google search via Serper)
+SERPER_API_KEY=your_serper_api_key_here
+# SearchXNG Host URL
+SEARCHXNG_HOST=http://localhost:8080
+# Brave Search API Key
+BRAVE_API_KEY=your_brave_api_key_here
+# Tavily API Key
+TAVILY_API_KEY=your_tavily_api_key_here
+```
+### PubMed Configuration
+```bash
+# NCBI API Key (optional, for higher rate limits: 10 req/sec vs 3 req/sec)
+NCBI_API_KEY=your_ncbi_api_key_here
+```
+### Agent Configuration
+```bash
+# Maximum iterations per research loop
+MAX_ITERATIONS=10
+# Search timeout in seconds
+SEARCH_TIMEOUT=30
+# Use graph-based execution for research flows
+USE_GRAPH_EXECUTION=false
+```
+### Budget & Rate Limiting Configuration
+```bash
+# Default token budget per research loop
+DEFAULT_TOKEN_LIMIT=100000
+# Default time limit per research loop (minutes)
+DEFAULT_TIME_LIMIT_MINUTES=10
+# Default iterations limit per research loop
+DEFAULT_ITERATIONS_LIMIT=10
+```
+### RAG Service Configuration
+```bash
+# ChromaDB collection name for RAG
+RAG_COLLECTION_NAME=deepcritical_evidence
+# Number of top results to retrieve from RAG
+RAG_SIMILARITY_TOP_K=5
+# Automatically ingest evidence into RAG
+RAG_AUTO_INGEST=true
+```
+### ChromaDB Configuration
+```bash
+# ChromaDB storage path
+CHROMA_DB_PATH=./chroma_db
+# Whether to persist ChromaDB to disk
+CHROMA_DB_PERSIST=true
+# ChromaDB server host (for remote ChromaDB, optional)
+# CHROMA_DB_HOST=localhost
+# ChromaDB server port (for remote ChromaDB, optional)
+# CHROMA_DB_PORT=8000
+```
+### External Services
+```bash
+# Modal Token ID (for Modal sandbox execution)
+MODAL_TOKEN_ID=your_modal_token_id_here
+# Modal Token Secret
+MODAL_TOKEN_SECRET=your_modal_token_secret_here
+```
+### Logging Configuration
+```bash
+# Log Level: "DEBUG", "INFO", "WARNING", or "ERROR"
+LOG_LEVEL=INFO
+```
+## Configuration Properties
+The `Settings` class provides helpful properties for checking configuration:
+```python
+from src.utils.config import settings
+# Check API key availability
+settings.has_openai_key          # bool
+settings.has_anthropic_key       # bool
+settings.has_huggingface_key     # bool
+settings.has_any_llm_key         # bool
+# Check service availability
+settings.modal_available         # bool
+settings.web_search_available    # bool
+```
+## Environment Variables Reference
+### Required (at least one LLM)
+- `OPENAI_API_KEY` or `ANTHROPIC_API_KEY` - At least one LLM provider key
+### Optional LLM Providers
+- `DEEPSEEK_API_KEY` (Phase 2)
+- `OPENROUTER_API_KEY` (Phase 2)
+- `GEMINI_API_KEY` (Phase 2)
+- `PERPLEXITY_API_KEY` (Phase 2)
+- `HUGGINGFACE_API_KEY` or `HF_TOKEN`
+- `AZURE_OPENAI_ENDPOINT` (Phase 2)
+- `AZURE_OPENAI_DEPLOYMENT` (Phase 2)
+- `AZURE_OPENAI_API_KEY` (Phase 2)
+- `AZURE_OPENAI_API_VERSION` (Phase 2)
+- `LOCAL_MODEL_URL` (Phase 2)
+### Web Search
+- `WEB_SEARCH_PROVIDER` (default: "duckduckgo")
+- `SERPER_API_KEY`
+- `SEARCHXNG_HOST`
+- `BRAVE_API_KEY`
+- `TAVILY_API_KEY`
+### Embeddings
+- `EMBEDDING_PROVIDER` (default: "local")
+- `HUGGINGFACE_EMBEDDING_MODEL` (optional)
+### RAG
+- `RAG_COLLECTION_NAME` (default: "deepcritical_evidence")
+- `RAG_SIMILARITY_TOP_K` (default: 5)
+- `RAG_AUTO_INGEST` (default: true)
+### ChromaDB
+- `CHROMA_DB_PATH` (default: "./chroma_db")
+- `CHROMA_DB_PERSIST` (default: true)
+- `CHROMA_DB_HOST` (optional)
+- `CHROMA_DB_PORT` (optional)
+### Budget
+- `DEFAULT_TOKEN_LIMIT` (default: 100000)
+- `DEFAULT_TIME_LIMIT_MINUTES` (default: 10)
+- `DEFAULT_ITERATIONS_LIMIT` (default: 10)
+### Other
+- `LLM_PROVIDER` (default: "openai")
+- `NCBI_API_KEY` (optional)
+- `MODAL_TOKEN_ID` (optional)
+- `MODAL_TOKEN_SECRET` (optional)
+- `MAX_ITERATIONS` (default: 10)
+- `LOG_LEVEL` (default: "INFO")
+- `USE_GRAPH_EXECUTION` (default: false)
+## Validation
+Settings are validated on load using Pydantic validation:
+- **Type checking**: All fields are strongly typed
+- **Range validation**: Numeric fields have min/max constraints
+- **Literal validation**: Enum fields only accept specific values
+- **Required fields**: API keys are checked when accessed via `get_api_key()`
+## Error Handling
+Configuration errors raise `ConfigurationError`:
+```python
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+try:
+    api_key = settings.get_api_key()
+except ConfigurationError as e:
+    print(f"Configuration error: {e}")
+```
+## Future Enhancements (Phase 2)
+The following configurations are planned for Phase 2:
+1. **Additional LLM Providers**: DeepSeek, OpenRouter, Gemini, Perplexity, Azure OpenAI, Local models
+2. **Model Selection**: Reasoning/main/fast model configuration
+3. **Service Integration**: Migrate `folder/llm_config.py` to centralized config
+See `CONFIGURATION_ANALYSIS.md` for the complete implementation plan.

docs/architecture/graph_orchestration.md ADDED Viewed

	@@ -0,0 +1,141 @@

+# Graph Orchestration Architecture
+## Overview
+Phase 4 implements a graph-based orchestration system for research workflows using Pydantic AI agents as nodes. This enables better parallel execution, conditional routing, and state management compared to simple agent chains.
+## Graph Structure
+### Nodes
+Graph nodes represent different stages in the research workflow:
+1. **Agent Nodes**: Execute Pydantic AI agents
+   - Input: Prompt/query
+   - Output: Structured or unstructured response
+   - Examples: `KnowledgeGapAgent`, `ToolSelectorAgent`, `ThinkingAgent`
+2. **State Nodes**: Update or read workflow state
+   - Input: Current state
+   - Output: Updated state
+   - Examples: Update evidence, update conversation history
+3. **Decision Nodes**: Make routing decisions based on conditions
+   - Input: Current state/results
+   - Output: Next node ID
+   - Examples: Continue research vs. complete research
+4. **Parallel Nodes**: Execute multiple nodes concurrently
+   - Input: List of node IDs
+   - Output: Aggregated results
+   - Examples: Parallel iterative research loops
+### Edges
+Edges define transitions between nodes:
+1. **Sequential Edges**: Always traversed (no condition)
+   - From: Source node
+   - To: Target node
+   - Condition: None (always True)
+2. **Conditional Edges**: Traversed based on condition
+   - From: Source node
+   - To: Target node
+   - Condition: Callable that returns bool
+   - Example: If research complete → go to writer, else → continue loop
+3. **Parallel Edges**: Used for parallel execution branches
+   - From: Parallel node
+   - To: Multiple target nodes
+   - Execution: All targets run concurrently
+## Graph Patterns
+### Iterative Research Graph
+```
+[Input] → [Thinking] → [Knowledge Gap] → [Decision: Complete?]
+                                              ↓ No          ↓ Yes
+                                    [Tool Selector]    [Writer]
+                                              ↓
+                                    [Execute Tools] → [Loop Back]
+```
+### Deep Research Graph
+```
+[Input] → [Planner] → [Parallel Iterative Loops] → [Synthesizer]
+                           ↓         ↓         ↓
+                        [Loop1]  [Loop2]  [Loop3]
+```
+## State Management
+State is managed via `WorkflowState` using `ContextVar` for thread-safe isolation:
+- **Evidence**: Collected evidence from searches
+- **Conversation**: Iteration history (gaps, tool calls, findings, thoughts)
+- **Embedding Service**: For semantic search
+State transitions occur at state nodes, which update the global workflow state.
+## Execution Flow
+1. **Graph Construction**: Build graph from nodes and edges
+2. **Graph Validation**: Ensure graph is valid (no cycles, all nodes reachable)
+3. **Graph Execution**: Traverse graph from entry node
+4. **Node Execution**: Execute each node based on type
+5. **Edge Evaluation**: Determine next node(s) based on edges
+6. **Parallel Execution**: Use `asyncio.gather()` for parallel nodes
+7. **State Updates**: Update state at state nodes
+8. **Event Streaming**: Yield events during execution for UI
+## Conditional Routing
+Decision nodes evaluate conditions and return next node IDs:
+- **Knowledge Gap Decision**: If `research_complete` → writer, else → tool selector
+- **Budget Decision**: If budget exceeded → exit, else → continue
+- **Iteration Decision**: If max iterations → exit, else → continue
+## Parallel Execution
+Parallel nodes execute multiple nodes concurrently:
+- Each parallel branch runs independently
+- Results are aggregated after all branches complete
+- State is synchronized after parallel execution
+- Errors in one branch don't stop other branches
+## Budget Enforcement
+Budget constraints are enforced at decision nodes:
+- **Token Budget**: Track LLM token usage
+- **Time Budget**: Track elapsed time
+- **Iteration Budget**: Track iteration count
+If any budget is exceeded, execution routes to exit node.
+## Error Handling
+Errors are handled at multiple levels:
+1. **Node Level**: Catch errors in individual node execution
+2. **Graph Level**: Handle errors during graph traversal
+3. **State Level**: Rollback state changes on error
+Errors are logged and yield error events for UI.
+## Backward Compatibility
+Graph execution is optional via feature flag:
+- `USE_GRAPH_EXECUTION=true`: Use graph-based execution
+- `USE_GRAPH_EXECUTION=false`: Use agent chain execution (existing)
+This allows gradual migration and fallback if needed.

docs/examples/writer_agents_usage.md ADDED Viewed

	@@ -0,0 +1,415 @@

+# Writer Agents Usage Examples
+This document provides examples of how to use the writer agents in DeepCritical for generating research reports.
+## Overview
+DeepCritical provides three writer agents for different report generation scenarios:
+1. **WriterAgent** - Basic writer for simple reports from findings
+2. **LongWriterAgent** - Iterative writer for long-form multi-section reports
+3. **ProofreaderAgent** - Finalizes and polishes report drafts
+## WriterAgent
+The `WriterAgent` generates final reports from research findings. It's used in iterative research flows.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_writer_agent
+# Create writer agent
+writer = create_writer_agent()
+# Generate report
+query = "What is the capital of France?"
+findings = """
+Paris is the capital of France [1].
+It is located in the north-central part of the country [2].
+[1] https://example.com/france-info
+[2] https://example.com/paris-info
+"""
+report = await writer.write_report(
+    query=query,
+    findings=findings,
+)
+print(report)
+```
+### With Output Length Specification
+```python
+report = await writer.write_report(
+    query="Explain machine learning",
+    findings=findings,
+    output_length="500 words",
+)
+```
+### With Additional Instructions
+```python
+report = await writer.write_report(
+    query="Explain machine learning",
+    findings=findings,
+    output_length="A comprehensive overview",
+    output_instructions="Use formal academic language and include examples",
+)
+```
+### Integration with IterativeResearchFlow
+The `WriterAgent` is automatically used by `IterativeResearchFlow`:
+```python
+from src.agent_factory.agents import create_iterative_flow
+flow = create_iterative_flow(max_iterations=5, max_time_minutes=10)
+report = await flow.run(
+    query="What is quantum computing?",
+    output_length="A detailed explanation",
+    output_instructions="Include practical applications",
+)
+```
+## LongWriterAgent
+The `LongWriterAgent` iteratively writes report sections with proper citation management. It's used in deep research flows.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_long_writer_agent
+from src.utils.models import ReportDraft, ReportDraftSection
+# Create long writer agent
+long_writer = create_long_writer_agent()
+# Create report draft with sections
+report_draft = ReportDraft(
+    sections=[
+        ReportDraftSection(
+            section_title="Introduction",
+            section_content="Draft content for introduction with [1].",
+        ),
+        ReportDraftSection(
+            section_title="Methods",
+            section_content="Draft content for methods with [2].",
+        ),
+        ReportDraftSection(
+            section_title="Results",
+            section_content="Draft content for results with [3].",
+        ),
+    ]
+)
+# Generate full report
+report = await long_writer.write_report(
+    original_query="What are the main features of Python?",
+    report_title="Python Programming Language Overview",
+    report_draft=report_draft,
+)
+print(report)
+```
+### Writing Individual Sections
+You can also write sections one at a time:
+```python
+# Write first section
+section_output = await long_writer.write_next_section(
+    original_query="What is Python?",
+    report_draft="",  # No existing draft
+    next_section_title="Introduction",
+    next_section_draft="Python is a programming language...",
+)
+print(section_output.next_section_markdown)
+print(section_output.references)
+# Write second section with existing draft
+section_output = await long_writer.write_next_section(
+    original_query="What is Python?",
+    report_draft="# Report\n\n## Introduction\n\nContent...",
+    next_section_title="Features",
+    next_section_draft="Python features include...",
+)
+```
+### Integration with DeepResearchFlow
+The `LongWriterAgent` is automatically used by `DeepResearchFlow`:
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=True,  # Use long writer (default)
+)
+report = await flow.run("What are the main features of Python programming language?")
+```
+## ProofreaderAgent
+The `ProofreaderAgent` finalizes and polishes report drafts by removing duplicates, adding summaries, and refining wording.
+### Basic Usage
+```python
+from src.agent_factory.agents import create_proofreader_agent
+from src.utils.models import ReportDraft, ReportDraftSection
+# Create proofreader agent
+proofreader = create_proofreader_agent()
+# Create report draft
+report_draft = ReportDraft(
+    sections=[
+        ReportDraftSection(
+            section_title="Introduction",
+            section_content="Python is a programming language [1].",
+        ),
+        ReportDraftSection(
+            section_title="Features",
+            section_content="Python has many features [2].",
+        ),
+    ]
+)
+# Proofread and finalize
+final_report = await proofreader.proofread(
+    query="What is Python?",
+    report_draft=report_draft,
+)
+print(final_report)
+```
+### Integration with DeepResearchFlow
+Use `ProofreaderAgent` instead of `LongWriterAgent`:
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=False,  # Use proofreader instead
+)
+report = await flow.run("What are the main features of Python?")
+```
+## Error Handling
+All writer agents include robust error handling:
+### Handling Empty Inputs
+```python
+# WriterAgent handles empty findings gracefully
+report = await writer.write_report(
+    query="Test query",
+    findings="",  # Empty findings
+)
+# Returns a fallback report
+# LongWriterAgent handles empty sections
+report = await long_writer.write_report(
+    original_query="Test",
+    report_title="Test Report",
+    report_draft=ReportDraft(sections=[]),  # Empty draft
+)
+# Returns minimal report
+# ProofreaderAgent handles empty drafts
+report = await proofreader.proofread(
+    query="Test",
+    report_draft=ReportDraft(sections=[]),
+)
+# Returns minimal report
+```
+### Retry Logic
+All agents automatically retry on transient errors (timeouts, connection errors):
+```python
+# Automatically retries up to 3 times on transient failures
+report = await writer.write_report(
+    query="Test query",
+    findings=findings,
+)
+```
+### Fallback Reports
+If all retries fail, agents return fallback reports:
+```python
+# Returns fallback report with query and findings
+report = await writer.write_report(
+    query="Test query",
+    findings=findings,
+)
+# Fallback includes: "# Research Report\n\n## Query\n...\n\n## Findings\n..."
+```
+## Citation Validation
+### For Markdown Reports
+Use the markdown citation validator:
+```python
+from src.utils.citation_validator import validate_markdown_citations
+from src.utils.models import Evidence, Citation
+# Collect evidence during research
+evidence = [
+    Evidence(
+        content="Paris is the capital of France",
+        citation=Citation(
+            source="web",
+            title="France Information",
+            url="https://example.com/france",
+            date="2024-01-01",
+        ),
+    ),
+]
+# Generate report
+report = await writer.write_report(query="What is the capital of France?", findings=findings)
+# Validate citations
+validated_report, removed_count = validate_markdown_citations(report, evidence)
+if removed_count > 0:
+    print(f"Removed {removed_count} invalid citations")
+```
+### For ResearchReport Objects
+Use the structured citation validator:
+```python
+from src.utils.citation_validator import validate_references
+# For ResearchReport objects (from ReportAgent)
+validated_report = validate_references(report, evidence)
+```
+## Custom Model Configuration
+All writer agents support custom model configuration:
+```python
+from pydantic_ai import Model
+# Create custom model
+custom_model = Model("openai", "gpt-4")
+# Use with writer agents
+writer = create_writer_agent(model=custom_model)
+long_writer = create_long_writer_agent(model=custom_model)
+proofreader = create_proofreader_agent(model=custom_model)
+```
+## Best Practices
+1. **Use WriterAgent for simple reports** - When you have findings as a string and need a quick report
+2. **Use LongWriterAgent for structured reports** - When you need multiple sections with proper citation management
+3. **Use ProofreaderAgent for final polish** - When you have draft sections and need a polished final report
+4. **Validate citations** - Always validate citations against collected evidence
+5. **Handle errors gracefully** - All agents return fallback reports on failure
+6. **Specify output length** - Use `output_length` parameter to control report size
+7. **Provide instructions** - Use `output_instructions` for specific formatting requirements
+## Integration Examples
+### Full Iterative Research Flow
+```python
+from src.agent_factory.agents import create_iterative_flow
+flow = create_iterative_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+)
+report = await flow.run(
+    query="What is machine learning?",
+    output_length="A comprehensive 1000-word explanation",
+    output_instructions="Include practical examples and use cases",
+)
+```
+### Full Deep Research Flow with Long Writer
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=True,
+)
+report = await flow.run("What are the main features of Python programming language?")
+```
+### Full Deep Research Flow with Proofreader
+```python
+from src.agent_factory.agents import create_deep_flow
+flow = create_deep_flow(
+    max_iterations=5,
+    max_time_minutes=10,
+    use_long_writer=False,  # Use proofreader
+)
+report = await flow.run("Explain quantum computing basics")
+```
+## Troubleshooting
+### Empty Reports
+If you get empty reports, check:
+- Input validation logs (agents log warnings for empty inputs)
+- LLM API key configuration
+- Network connectivity
+### Citation Issues
+If citations are missing or invalid:
+- Use `validate_markdown_citations()` to check citations
+- Ensure Evidence objects are properly collected during research
+- Check that URLs in findings match Evidence URLs
+### Performance Issues
+For large reports:
+- Use `LongWriterAgent` for better section management
+- Consider truncating very long findings (agents do this automatically)
+- Use appropriate `max_time_minutes` settings
+## See Also
+- [Research Flows Documentation](../orchestrator/research_flows.md)
+- [Citation Validation](../utils/citation_validation.md)
+- [Agent Factory](../agent_factory/agents.md)

docs/implementation/02_phase_search.md CHANGED Viewed

@@ -4,6 +4,8 @@
 **Philosophy**: "Real data, mocked connections."
 **Prerequisite**: Phase 1 complete (all tests passing)
 ---
 ## 1. The Slice Definition
@@ -12,17 +14,20 @@ This slice covers:
 1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
 2. **Process**:
    - Fetch from PubMed (E-utilities API).
-   - Fetch from Web (DuckDuckGo).
    - Normalize results into `Evidence` models.
 3. **Output**: A list of `Evidence` objects.
 **Files to Create**:
 - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
 - `src/tools/pubmed.py` - PubMed E-utilities tool
-- `src/tools/websearch.py` - DuckDuckGo search tool
 - `src/tools/search_handler.py` - Orchestrates multiple tools
 - `src/tools/__init__.py` - Exports
 ---
 ## 2. PubMed E-utilities API Reference
@@ -767,17 +772,23 @@ async def test_pubmed_live_search():
 ## 8. Implementation Checklist
-- [ ] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult)
-- [ ] Create `src/tools/__init__.py` with SearchTool Protocol and exports
-- [ ] Implement `src/tools/pubmed.py` with PubMedTool class
-- [ ] Implement `src/tools/websearch.py` with WebTool class
-- [ ] Create `src/tools/search_handler.py` with SearchHandler class
-- [ ] Write tests in `tests/unit/tools/test_pubmed.py`
-- [ ] Write tests in `tests/unit/tools/test_websearch.py`
-- [ ] Write tests in `tests/unit/tools/test_search_handler.py`
-- [ ] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS**
 - [ ] (Optional) Run integration test: `uv run pytest -m integration`
-- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"`
 ---
@@ -785,20 +796,19 @@ async def test_pubmed_live_search():
 Phase 2 is **COMPLETE** when:
-1. All unit tests pass: `uv run pytest tests/unit/tools/ -v`
-2. `SearchHandler` can execute with both tools
-3. Graceful degradation: if PubMed fails, WebTool results still return
-4. Rate limiting is enforced (verify no 429 errors)
-5. Can run this in Python REPL:
 ```python
 import asyncio
 from src.tools.pubmed import PubMedTool
-from src.tools.websearch import WebTool
 from src.tools.search_handler import SearchHandler
 async def test():
-    handler = SearchHandler([PubMedTool(), WebTool()])
     result = await handler.execute("metformin alzheimer")
     print(f"Found {result.total_found} results")
     for e in result.evidence[:3]:
@@ -807,4 +817,6 @@ async def test():
 asyncio.run(test())
 ```
 **Proceed to Phase 3 ONLY after all checkboxes are complete.**

 **Philosophy**: "Real data, mocked connections."
 **Prerequisite**: Phase 1 complete (all tests passing)
+> **⚠️ Implementation Note (2025-01-27)**: The DuckDuckGo WebTool specified in this phase was removed in favor of the Europe PMC tool (see Phase 11). Europe PMC provides better coverage for biomedical research by including preprints, peer-reviewed articles, and patents. The current implementation uses PubMed, ClinicalTrials.gov, and Europe PMC as search sources.
 ---
 ## 1. The Slice Definition
 1. **Input**: A string query (e.g., "metformin Alzheimer's disease").
 2. **Process**:
    - Fetch from PubMed (E-utilities API).
+   - ~~Fetch from Web (DuckDuckGo).~~ **REMOVED** - Replaced by Europe PMC in Phase 11
    - Normalize results into `Evidence` models.
 3. **Output**: A list of `Evidence` objects.
 **Files to Create**:
 - `src/utils/models.py` - Pydantic models (Evidence, Citation, SearchResult)
 - `src/tools/pubmed.py` - PubMed E-utilities tool
+- ~~`src/tools/websearch.py` - DuckDuckGo search tool~~ **REMOVED** - See Phase 11 for Europe PMC replacement
 - `src/tools/search_handler.py` - Orchestrates multiple tools
 - `src/tools/__init__.py` - Exports
+**Additional Files (Post-Phase 2 Enhancements)**:
+- `src/tools/query_utils.py` - Query preprocessing (removes question words, expands medical synonyms)
 ---
 ## 2. PubMed E-utilities API Reference
 ## 8. Implementation Checklist
+- [x] Create `src/utils/models.py` with all Pydantic models (Evidence, Citation, SearchResult) - **COMPLETE**
+- [x] Create `src/tools/__init__.py` with SearchTool Protocol and exports - **COMPLETE**
+- [x] Implement `src/tools/pubmed.py` with PubMedTool class - **COMPLETE**
+- [ ] ~~Implement `src/tools/websearch.py` with WebTool class~~ - **REMOVED** (replaced by Europe PMC in Phase 11)
+- [x] Create `src/tools/search_handler.py` with SearchHandler class - **COMPLETE**
+- [x] Write tests in `tests/unit/tools/test_pubmed.py` - **COMPLETE** (basic tests)
+- [ ] Write tests in `tests/unit/tools/test_websearch.py` - **N/A** (WebTool removed)
+- [x] Write tests in `tests/unit/tools/test_search_handler.py` - **COMPLETE** (basic tests)
+- [x] Run `uv run pytest tests/unit/tools/ -v` — **ALL TESTS MUST PASS** - **PASSING**
 - [ ] (Optional) Run integration test: `uv run pytest -m integration`
+- [ ] Add edge case tests (rate limiting, error handling, timeouts) - **PENDING**
+- [ ] Commit: `git commit -m "feat: phase 2 search slice complete"` - **DONE**
+**Post-Phase 2 Enhancements**:
+- [x] Query preprocessing (`src/tools/query_utils.py`) - **ADDED**
+- [x] Europe PMC tool (Phase 11) - **ADDED**
+- [x] ClinicalTrials tool (Phase 10) - **ADDED**
 ---
 Phase 2 is **COMPLETE** when:
+1. ✅ All unit tests pass: `uv run pytest tests/unit/tools/ -v` - **PASSING**
+2. ✅ `SearchHandler` can execute with search tools - **WORKING**
+3. ✅ Graceful degradation: if one tool fails, other tools still return results - **IMPLEMENTED**
+4. ✅ Rate limiting is enforced (verify no 429 errors) - **IMPLEMENTED**
+5. ✅ Can run this in Python REPL:
 ```python
 import asyncio
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 async def test():
+    handler = SearchHandler([PubMedTool()])
     result = await handler.execute("metformin alzheimer")
     print(f"Found {result.total_found} results")
     for e in result.evidence[:3]:
 asyncio.run(test())
 ```
+**Note**: WebTool was removed in favor of Europe PMC (Phase 11). The current implementation uses PubMed as the primary Phase 2 tool, with Europe PMC and ClinicalTrials added in later phases.
 **Proceed to Phase 3 ONLY after all checkboxes are complete.**

pyproject.toml CHANGED Viewed

@@ -24,6 +24,7 @@ dependencies = [
     "tenacity>=8.2", # Retry logic
     "structlog>=24.1", # Structured logging
     "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
 ]
 [project.optional-dependencies]
@@ -91,6 +92,7 @@ ignore = [
     "PLW0603",  # Global statement (singleton pattern for Modal)
     "PLC0415",  # Lazy imports for optional dependencies
     "E402",     # Module level import not at top (needed for pytest.importorskip)
     "RUF100",   # Unused noqa (version differences between local/CI)
 ]
@@ -105,9 +107,12 @@ ignore_missing_imports = true
 disallow_untyped_defs = true
 warn_return_any = true
 warn_unused_ignores = false
 exclude = [
     "^reference_repos/",
     "^examples/",
 ]
 # ============== PYTEST CONFIG ==============
@@ -137,5 +142,11 @@ exclude_lines = [
     "raise NotImplementedError",
 ]
 # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
 # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip

     "tenacity>=8.2", # Retry logic
     "structlog>=24.1", # Structured logging
     "requests>=2.32.5", # ClinicalTrials.gov (httpx blocked by WAF)
+    "pydantic-graph>=1.22.0",
 ]
 [project.optional-dependencies]
     "PLW0603",  # Global statement (singleton pattern for Modal)
     "PLC0415",  # Lazy imports for optional dependencies
     "E402",     # Module level import not at top (needed for pytest.importorskip)
+    "E501",     # Line too long (ignore line length violations)
     "RUF100",   # Unused noqa (version differences between local/CI)
 ]
 disallow_untyped_defs = true
 warn_return_any = true
 warn_unused_ignores = false
+explicit_package_bases = true
+mypy_path = "."
 exclude = [
     "^reference_repos/",
     "^examples/",
+    "^folder/",
 ]
 # ============== PYTEST CONFIG ==============
     "raise NotImplementedError",
 ]
+[dependency-groups]
+dev = [
+    "structlog>=25.5.0",
+    "ty>=0.0.1a28",
+]
 # Note: agent-framework-core is optional for magentic mode (multi-agent orchestration)
 # Version pinned to 1.0.0b* to avoid breaking changes. CI skips tests via pytest.importorskip

src/agent_factory/agents.py CHANGED Viewed

	@@ -0,0 +1,339 @@

+"""Agent factory functions for creating research agents.
+Provides factory functions for creating all Pydantic AI agents used in
+the research workflows, following the pattern from judges.py.
+"""
+from typing import TYPE_CHECKING, Any
+import structlog
+from src.utils.config import settings
+from src.utils.exceptions import ConfigurationError
+if TYPE_CHECKING:
+    from src.agent_factory.graph_builder import GraphBuilder
+    from src.agents.input_parser import InputParserAgent
+    from src.agents.knowledge_gap import KnowledgeGapAgent
+    from src.agents.long_writer import LongWriterAgent
+    from src.agents.proofreader import ProofreaderAgent
+    from src.agents.thinking import ThinkingAgent
+    from src.agents.tool_selector import ToolSelectorAgent
+    from src.agents.writer import WriterAgent
+    from src.orchestrator.graph_orchestrator import GraphOrchestrator
+    from src.orchestrator.planner_agent import PlannerAgent
+    from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+logger = structlog.get_logger()
+def create_input_parser_agent(model: Any | None = None) -> "InputParserAgent":
+    """
+    Create input parser agent for query analysis and research mode detection.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured InputParserAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.input_parser import create_input_parser_agent as _create_agent
+    try:
+        logger.debug("Creating input parser agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create input parser agent", error=str(e))
+        raise ConfigurationError(f"Failed to create input parser agent: {e}") from e
+def create_planner_agent(model: Any | None = None) -> "PlannerAgent":
+    """
+    Create planner agent with web search and crawl tools.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured PlannerAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    # Lazy import to avoid circular dependencies
+    from src.orchestrator.planner_agent import create_planner_agent as _create_planner_agent
+    try:
+        logger.debug("Creating planner agent")
+        return _create_planner_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create planner agent", error=str(e))
+        raise ConfigurationError(f"Failed to create planner agent: {e}") from e
+def create_knowledge_gap_agent(model: Any | None = None) -> "KnowledgeGapAgent":
+    """
+    Create knowledge gap agent for evaluating research completeness.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured KnowledgeGapAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.knowledge_gap import create_knowledge_gap_agent as _create_agent
+    try:
+        logger.debug("Creating knowledge gap agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create knowledge gap agent", error=str(e))
+        raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e
+def create_tool_selector_agent(model: Any | None = None) -> "ToolSelectorAgent":
+    """
+    Create tool selector agent for choosing tools to address gaps.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ToolSelectorAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.tool_selector import create_tool_selector_agent as _create_agent
+    try:
+        logger.debug("Creating tool selector agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create tool selector agent", error=str(e))
+        raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e
+def create_thinking_agent(model: Any | None = None) -> "ThinkingAgent":
+    """
+    Create thinking agent for generating observations.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ThinkingAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.thinking import create_thinking_agent as _create_agent
+    try:
+        logger.debug("Creating thinking agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create thinking agent", error=str(e))
+        raise ConfigurationError(f"Failed to create thinking agent: {e}") from e
+def create_writer_agent(model: Any | None = None) -> "WriterAgent":
+    """
+    Create writer agent for generating final reports.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured WriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.writer import create_writer_agent as _create_agent
+    try:
+        logger.debug("Creating writer agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create writer agent: {e}") from e
+def create_long_writer_agent(model: Any | None = None) -> "LongWriterAgent":
+    """
+    Create long writer agent for iteratively writing report sections.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured LongWriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.long_writer import create_long_writer_agent as _create_agent
+    try:
+        logger.debug("Creating long writer agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create long writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create long writer agent: {e}") from e
+def create_proofreader_agent(model: Any | None = None) -> "ProofreaderAgent":
+    """
+    Create proofreader agent for finalizing report drafts.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ProofreaderAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    from src.agents.proofreader import create_proofreader_agent as _create_agent
+    try:
+        logger.debug("Creating proofreader agent")
+        return _create_agent(model=model)
+    except Exception as e:
+        logger.error("Failed to create proofreader agent", error=str(e))
+        raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e
+def create_iterative_flow(
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    verbose: bool = True,
+    use_graph: bool | None = None,
+) -> "IterativeResearchFlow":
+    """
+    Create iterative research flow.
+    Args:
+        max_iterations: Maximum number of iterations
+        max_time_minutes: Maximum time in minutes
+        verbose: Whether to log progress
+        use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
+    Returns:
+        Configured IterativeResearchFlow instance
+    """
+    from src.orchestrator.research_flow import IterativeResearchFlow
+    try:
+        # Use settings default if not explicitly provided
+        if use_graph is None:
+            use_graph = settings.use_graph_execution
+        logger.debug("Creating iterative research flow", use_graph=use_graph)
+        return IterativeResearchFlow(
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            verbose=verbose,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create iterative flow", error=str(e))
+        raise ConfigurationError(f"Failed to create iterative flow: {e}") from e
+def create_deep_flow(
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    verbose: bool = True,
+    use_long_writer: bool = True,
+    use_graph: bool | None = None,
+) -> "DeepResearchFlow":
+    """
+    Create deep research flow.
+    Args:
+        max_iterations: Maximum iterations per section
+        max_time_minutes: Maximum time per section
+        verbose: Whether to log progress
+        use_long_writer: Whether to use long writer (True) or proofreader (False)
+        use_graph: Whether to use graph execution. If None, reads from settings.use_graph_execution
+    Returns:
+        Configured DeepResearchFlow instance
+    """
+    from src.orchestrator.research_flow import DeepResearchFlow
+    try:
+        # Use settings default if not explicitly provided
+        if use_graph is None:
+            use_graph = settings.use_graph_execution
+        logger.debug("Creating deep research flow", use_graph=use_graph)
+        return DeepResearchFlow(
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            verbose=verbose,
+            use_long_writer=use_long_writer,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create deep flow", error=str(e))
+        raise ConfigurationError(f"Failed to create deep flow: {e}") from e
+def create_graph_orchestrator(
+    mode: str = "auto",
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    use_graph: bool = True,
+) -> "GraphOrchestrator":
+    """
+    Create graph orchestrator.
+    Args:
+        mode: Research mode ("iterative", "deep", or "auto")
+        max_iterations: Maximum iterations per loop
+        max_time_minutes: Maximum time per loop
+        use_graph: Whether to use graph execution (True) or agent chains (False)
+    Returns:
+        Configured GraphOrchestrator instance
+    """
+    from src.orchestrator.graph_orchestrator import create_graph_orchestrator as _create
+    try:
+        logger.debug("Creating graph orchestrator", mode=mode, use_graph=use_graph)
+        return _create(
+            mode=mode,  # type: ignore[arg-type]
+            max_iterations=max_iterations,
+            max_time_minutes=max_time_minutes,
+            use_graph=use_graph,
+        )
+    except Exception as e:
+        logger.error("Failed to create graph orchestrator", error=str(e))
+        raise ConfigurationError(f"Failed to create graph orchestrator: {e}") from e
+def create_graph_builder() -> "GraphBuilder":
+    """
+    Create a graph builder instance.
+    Returns:
+        GraphBuilder instance
+    """
+    from src.agent_factory.graph_builder import GraphBuilder
+    try:
+        logger.debug("Creating graph builder")
+        return GraphBuilder()
+    except Exception as e:
+        logger.error("Failed to create graph builder", error=str(e))
+        raise ConfigurationError(f"Failed to create graph builder: {e}") from e

src/agent_factory/graph_builder.py ADDED Viewed

	@@ -0,0 +1,608 @@

+"""Graph builder utilities for constructing research workflow graphs.
+Provides classes and utilities for building graph-based orchestration systems
+using Pydantic AI agents as nodes.
+"""
+from collections.abc import Callable
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from pydantic import BaseModel, Field
+if TYPE_CHECKING:
+    from pydantic_ai import Agent
+    from src.middleware.state_machine import WorkflowState
+logger = structlog.get_logger()
+# ============================================================================
+# Graph Node Models
+# ============================================================================
+class GraphNode(BaseModel):
+    """Base class for graph nodes."""
+    node_id: str = Field(description="Unique identifier for the node")
+    node_type: Literal["agent", "state", "decision", "parallel"] = Field(description="Type of node")
+    description: str = Field(default="", description="Human-readable description of the node")
+    model_config = {"frozen": True}
+class AgentNode(GraphNode):
+    """Node that executes a Pydantic AI agent."""
+    node_type: Literal["agent"] = "agent"
+    agent: Any = Field(description="Pydantic AI agent to execute")
+    input_transformer: Callable[[Any], Any] | None = Field(
+        default=None, description="Transform input before passing to agent"
+    )
+    output_transformer: Callable[[Any], Any] | None = Field(
+        default=None, description="Transform output after agent execution"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+class StateNode(GraphNode):
+    """Node that updates or reads workflow state."""
+    node_type: Literal["state"] = "state"
+    state_updater: Callable[[Any, Any], Any] = Field(
+        description="Function to update workflow state"
+    )
+    state_reader: Callable[[Any], Any] | None = Field(
+        default=None, description="Function to read state (optional)"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+class DecisionNode(GraphNode):
+    """Node that makes routing decisions based on conditions."""
+    node_type: Literal["decision"] = "decision"
+    decision_function: Callable[[Any], str] = Field(
+        description="Function that returns next node ID based on input"
+    )
+    options: list[str] = Field(description="List of possible next node IDs", min_length=1)
+    model_config = {"arbitrary_types_allowed": True}
+class ParallelNode(GraphNode):
+    """Node that executes multiple nodes in parallel."""
+    node_type: Literal["parallel"] = "parallel"
+    parallel_nodes: list[str] = Field(
+        description="List of node IDs to run in parallel", min_length=1
+    )
+    aggregator: Callable[[list[Any]], Any] | None = Field(
+        default=None, description="Function to aggregate parallel results"
+    )
+    model_config = {"arbitrary_types_allowed": True}
+# ============================================================================
+# Graph Edge Models
+# ============================================================================
+class GraphEdge(BaseModel):
+    """Base class for graph edges."""
+    from_node: str = Field(description="Source node ID")
+    to_node: str = Field(description="Target node ID")
+    condition: Callable[[Any], bool] | None = Field(
+        default=None, description="Optional condition function"
+    )
+    weight: float = Field(default=1.0, description="Edge weight for routing decisions")
+    model_config = {"arbitrary_types_allowed": True}
+class SequentialEdge(GraphEdge):
+    """Edge that is always traversed (no condition)."""
+    condition: None = None
+class ConditionalEdge(GraphEdge):
+    """Edge that is traversed based on a condition."""
+    condition: Callable[[Any], bool] = Field(description="Required condition function")
+    condition_description: str = Field(
+        default="", description="Human-readable description of condition"
+    )
+class ParallelEdge(GraphEdge):
+    """Edge used for parallel execution branches."""
+    condition: None = None
+# ============================================================================
+# Research Graph Class
+# ============================================================================
+class ResearchGraph(BaseModel):
+    """Represents a research workflow graph with nodes and edges."""
+    nodes: dict[str, GraphNode] = Field(default_factory=dict, description="All nodes in the graph")
+    edges: dict[str, list[GraphEdge]] = Field(
+        default_factory=dict, description="Edges by source node ID"
+    )
+    entry_node: str = Field(description="Starting node ID")
+    exit_nodes: list[str] = Field(default_factory=list, description="Terminal node IDs")
+    model_config = {"arbitrary_types_allowed": True}
+    def add_node(self, node: GraphNode) -> None:
+        """Add a node to the graph.
+        Args:
+            node: The node to add
+        Raises:
+            ValueError: If node ID already exists
+        """
+        if node.node_id in self.nodes:
+            raise ValueError(f"Node {node.node_id} already exists in graph")
+        self.nodes[node.node_id] = node
+        logger.debug("Node added to graph", node_id=node.node_id, type=node.node_type)
+    def add_edge(self, edge: GraphEdge) -> None:
+        """Add an edge to the graph.
+        Args:
+            edge: The edge to add
+        Raises:
+            ValueError: If source or target node doesn't exist
+        """
+        if edge.from_node not in self.nodes:
+            raise ValueError(f"Source node {edge.from_node} not found in graph")
+        if edge.to_node not in self.nodes:
+            raise ValueError(f"Target node {edge.to_node} not found in graph")
+        if edge.from_node not in self.edges:
+            self.edges[edge.from_node] = []
+        self.edges[edge.from_node].append(edge)
+        logger.debug(
+            "Edge added to graph",
+            from_node=edge.from_node,
+            to_node=edge.to_node,
+        )
+    def get_node(self, node_id: str) -> GraphNode | None:
+        """Get a node by ID.
+        Args:
+            node_id: The node ID
+        Returns:
+            The node, or None if not found
+        """
+        return self.nodes.get(node_id)
+    def get_next_nodes(self, node_id: str, context: Any = None) -> list[tuple[str, GraphEdge]]:
+        """Get all possible next nodes from a given node.
+        Args:
+            node_id: The current node ID
+            context: Optional context for evaluating conditions
+        Returns:
+            List of (node_id, edge) tuples for valid next nodes
+        """
+        if node_id not in self.edges:
+            return []
+        next_nodes = []
+        for edge in self.edges[node_id]:
+            # Evaluate condition if present
+            if edge.condition is None or edge.condition(context):
+                next_nodes.append((edge.to_node, edge))
+        return next_nodes
+    def validate_structure(self) -> list[str]:
+        """Validate the graph structure.
+        Returns:
+            List of validation error messages (empty if valid)
+        """
+        errors = []
+        # Check entry node exists
+        if self.entry_node not in self.nodes:
+            errors.append(f"Entry node {self.entry_node} not found in graph")
+        # Check exit nodes exist and at least one is defined
+        if not self.exit_nodes:
+            errors.append("At least one exit node must be defined")
+        for exit_node in self.exit_nodes:
+            if exit_node not in self.nodes:
+                errors.append(f"Exit node {exit_node} not found in graph")
+        # Check all edges reference valid nodes
+        for from_node, edge_list in self.edges.items():
+            if from_node not in self.nodes:
+                errors.append(f"Edge source node {from_node} not found")
+            for edge in edge_list:
+                if edge.to_node not in self.nodes:
+                    errors.append(f"Edge target node {edge.to_node} not found")
+        # Check all nodes are reachable from entry node (basic check)
+        if self.entry_node in self.nodes:
+            reachable = {self.entry_node}
+            queue = [self.entry_node]
+            while queue:
+                current = queue.pop(0)
+                for next_node, _ in self.get_next_nodes(current):
+                    if next_node not in reachable:
+                        reachable.add(next_node)
+                        queue.append(next_node)
+            unreachable = set(self.nodes.keys()) - reachable
+            if unreachable:
+                errors.append(f"Unreachable nodes from entry node: {', '.join(unreachable)}")
+        return errors
+# ============================================================================
+# Graph Builder Class
+# ============================================================================
+class GraphBuilder:
+    """Builder for constructing research workflow graphs."""
+    def __init__(self) -> None:
+        """Initialize the graph builder."""
+        self.graph = ResearchGraph(entry_node="", exit_nodes=[])
+    def add_agent_node(
+        self,
+        node_id: str,
+        agent: "Agent[Any, Any]",
+        description: str = "",
+        input_transformer: Callable[[Any], Any] | None = None,
+        output_transformer: Callable[[Any], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add an agent node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            agent: Pydantic AI agent to execute
+            description: Human-readable description
+            input_transformer: Optional input transformation function
+            output_transformer: Optional output transformation function
+        Returns:
+            Self for method chaining
+        """
+        node = AgentNode(
+            node_id=node_id,
+            agent=agent,
+            description=description,
+            input_transformer=input_transformer,
+            output_transformer=output_transformer,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_state_node(
+        self,
+        node_id: str,
+        state_updater: Callable[["WorkflowState", Any], "WorkflowState"],
+        description: str = "",
+        state_reader: Callable[["WorkflowState"], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add a state node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            state_updater: Function to update workflow state
+            description: Human-readable description
+            state_reader: Optional function to read state
+        Returns:
+            Self for method chaining
+        """
+        node = StateNode(
+            node_id=node_id,
+            state_updater=state_updater,
+            description=description,
+            state_reader=state_reader,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_decision_node(
+        self,
+        node_id: str,
+        decision_function: Callable[[Any], str],
+        options: list[str],
+        description: str = "",
+    ) -> "GraphBuilder":
+        """Add a decision node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            decision_function: Function that returns next node ID
+            options: List of possible next node IDs
+            description: Human-readable description
+        Returns:
+            Self for method chaining
+        """
+        node = DecisionNode(
+            node_id=node_id,
+            decision_function=decision_function,
+            options=options,
+            description=description,
+        )
+        self.graph.add_node(node)
+        return self
+    def add_parallel_node(
+        self,
+        node_id: str,
+        parallel_nodes: list[str],
+        description: str = "",
+        aggregator: Callable[[list[Any]], Any] | None = None,
+    ) -> "GraphBuilder":
+        """Add a parallel node to the graph.
+        Args:
+            node_id: Unique identifier for the node
+            parallel_nodes: List of node IDs to run in parallel
+            description: Human-readable description
+            aggregator: Optional function to aggregate results
+        Returns:
+            Self for method chaining
+        """
+        node = ParallelNode(
+            node_id=node_id,
+            parallel_nodes=parallel_nodes,
+            description=description,
+            aggregator=aggregator,
+        )
+        self.graph.add_node(node)
+        return self
+    def connect_nodes(
+        self,
+        from_node: str,
+        to_node: str,
+        condition: Callable[[Any], bool] | None = None,
+        condition_description: str = "",
+    ) -> "GraphBuilder":
+        """Connect two nodes with an edge.
+        Args:
+            from_node: Source node ID
+            to_node: Target node ID
+            condition: Optional condition function
+            condition_description: Description of condition (if conditional)
+        Returns:
+            Self for method chaining
+        """
+        if condition is None:
+            edge: GraphEdge = SequentialEdge(from_node=from_node, to_node=to_node)
+        else:
+            edge = ConditionalEdge(
+                from_node=from_node,
+                to_node=to_node,
+                condition=condition,
+                condition_description=condition_description,
+            )
+        self.graph.add_edge(edge)
+        return self
+    def set_entry_node(self, node_id: str) -> "GraphBuilder":
+        """Set the entry node for the graph.
+        Args:
+            node_id: The entry node ID
+        Returns:
+            Self for method chaining
+        """
+        self.graph.entry_node = node_id
+        return self
+    def set_exit_nodes(self, node_ids: list[str]) -> "GraphBuilder":
+        """Set the exit nodes for the graph.
+        Args:
+            node_ids: List of exit node IDs
+        Returns:
+            Self for method chaining
+        """
+        self.graph.exit_nodes = node_ids
+        return self
+    def build(self) -> ResearchGraph:
+        """Finalize graph construction and validate.
+        Returns:
+            The constructed ResearchGraph
+        Raises:
+            ValueError: If graph validation fails
+        """
+        errors = self.graph.validate_structure()
+        if errors:
+            error_msg = "Graph validation failed:\n" + "\n".join(f"  - {e}" for e in errors)
+            logger.error("Graph validation failed", errors=errors)
+            raise ValueError(error_msg)
+        logger.info(
+            "Graph built successfully",
+            nodes=len(self.graph.nodes),
+            edges=sum(len(edges) for edges in self.graph.edges.values()),
+            entry_node=self.graph.entry_node,
+            exit_nodes=self.graph.exit_nodes,
+        )
+        return self.graph
+# ============================================================================
+# Factory Functions
+# ============================================================================
+def create_iterative_graph(
+    knowledge_gap_agent: "Agent[Any, Any]",
+    tool_selector_agent: "Agent[Any, Any]",
+    thinking_agent: "Agent[Any, Any]",
+    writer_agent: "Agent[Any, Any]",
+) -> ResearchGraph:
+    """Create a graph for iterative research flow.
+    Args:
+        knowledge_gap_agent: Agent for evaluating knowledge gaps
+        tool_selector_agent: Agent for selecting tools
+        thinking_agent: Agent for generating observations
+        writer_agent: Agent for writing final report
+    Returns:
+        Constructed ResearchGraph for iterative research
+    """
+    builder = GraphBuilder()
+    # Add nodes
+    builder.add_agent_node("thinking", thinking_agent, "Generate observations")
+    builder.add_agent_node("knowledge_gap", knowledge_gap_agent, "Evaluate knowledge gaps")
+    builder.add_decision_node(
+        "continue_decision",
+        decision_function=lambda result: "writer"
+        if getattr(result, "research_complete", False)
+        else "tool_selector",
+        options=["tool_selector", "writer"],
+        description="Decide whether to continue research or write report",
+    )
+    builder.add_agent_node("tool_selector", tool_selector_agent, "Select tools to address gap")
+    builder.add_state_node(
+        "execute_tools",
+        state_updater=lambda state,
+        tasks: state,  # Placeholder - actual execution handled separately
+        description="Execute selected tools",
+    )
+    builder.add_agent_node("writer", writer_agent, "Write final report")
+    # Add edges
+    builder.connect_nodes("thinking", "knowledge_gap")
+    builder.connect_nodes("knowledge_gap", "continue_decision")
+    builder.connect_nodes("continue_decision", "tool_selector")
+    builder.connect_nodes("continue_decision", "writer")
+    builder.connect_nodes("tool_selector", "execute_tools")
+    builder.connect_nodes("execute_tools", "thinking")  # Loop back
+    # Set entry and exit
+    builder.set_entry_node("thinking")
+    builder.set_exit_nodes(["writer"])
+    return builder.build()
+def create_deep_graph(
+    planner_agent: "Agent[Any, Any]",
+    knowledge_gap_agent: "Agent[Any, Any]",
+    tool_selector_agent: "Agent[Any, Any]",
+    thinking_agent: "Agent[Any, Any]",
+    writer_agent: "Agent[Any, Any]",
+    long_writer_agent: "Agent[Any, Any]",
+) -> ResearchGraph:
+    """Create a graph for deep research flow.
+    The graph structure: planner → store_plan → parallel_loops → collect_drafts → synthesizer
+    Args:
+        planner_agent: Agent for creating report plan
+        knowledge_gap_agent: Agent for evaluating knowledge gaps (not used directly, but needed for iterative flows)
+        tool_selector_agent: Agent for selecting tools (not used directly, but needed for iterative flows)
+        thinking_agent: Agent for generating observations (not used directly, but needed for iterative flows)
+        writer_agent: Agent for writing section reports (not used directly, but needed for iterative flows)
+        long_writer_agent: Agent for synthesizing final report
+    Returns:
+        Constructed ResearchGraph for deep research
+    """
+    from src.utils.models import ReportPlan
+    builder = GraphBuilder()
+    # Add nodes
+    # 1. Planner agent - creates report plan
+    builder.add_agent_node("planner", planner_agent, "Create report plan with sections")
+    # 2. State node - store report plan in workflow state
+    def store_plan(state: "WorkflowState", plan: ReportPlan) -> "WorkflowState":
+        """Store report plan in state for parallel loops to access."""
+        # Store plan in a custom attribute (we'll need to extend WorkflowState or use a dict)
+        # For now, we'll store it in the context's node_results
+        # The actual storage will happen in the graph execution
+        return state
+    builder.add_state_node(
+        "store_plan",
+        state_updater=store_plan,
+        description="Store report plan in state",
+    )
+    # 3. Parallel node - will execute iterative research flows for each section
+    # The actual execution will be handled dynamically in _execute_parallel_node()
+    # We use a special node ID that the executor will recognize
+    builder.add_parallel_node(
+        "parallel_loops",
+        parallel_nodes=[],  # Will be populated dynamically based on report plan
+        description="Execute parallel iterative research loops for each section",
+        aggregator=lambda results: results,  # Collect all section drafts
+    )
+    # 4. State node - collect section drafts into ReportDraft
+    def collect_drafts(state: "WorkflowState", section_drafts: list[str]) -> "WorkflowState":
+        """Collect section drafts into state for synthesizer."""
+        # Store drafts in state (will be accessed by synthesizer)
+        return state
+    builder.add_state_node(
+        "collect_drafts",
+        state_updater=collect_drafts,
+        description="Collect section drafts for synthesis",
+    )
+    # 5. Synthesizer agent - creates final report from drafts
+    builder.add_agent_node(
+        "synthesizer", long_writer_agent, "Synthesize final report from section drafts"
+    )
+    # Add edges
+    builder.connect_nodes("planner", "store_plan")
+    builder.connect_nodes("store_plan", "parallel_loops")
+    builder.connect_nodes("parallel_loops", "collect_drafts")
+    builder.connect_nodes("collect_drafts", "synthesizer")
+    # Set entry and exit
+    builder.set_entry_node("planner")
+    builder.set_exit_nodes(["synthesizer"])
+    return builder.build()
+# No need to rebuild models since we're using Any types
+# The models will work correctly with arbitrary_types_allowed=True

src/agent_factory/judges.py CHANGED Viewed

@@ -351,6 +351,15 @@ IMPORTANT: Respond with ONLY valid JSON matching this schema:
         )
 class MockJudgeHandler:
     """
     Mock JudgeHandler for demo mode without LLM calls.

         )
+def create_judge_handler() -> JudgeHandler:
+    """Create a judge handler based on configuration.
+    Returns:
+        Configured JudgeHandler instance
+    """
+    return JudgeHandler()
 class MockJudgeHandler:
     """
     Mock JudgeHandler for demo mode without LLM calls.

src/agents/input_parser.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""Input parser agent for analyzing and improving user queries.
+Determines research mode (iterative vs deep) and extracts key information
+from user queries to improve research quality.
+"""
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError, JudgeError
+from src.utils.models import ParsedQuery
+if TYPE_CHECKING:
+    pass
+logger = structlog.get_logger()
+# System prompt for the input parser agent
+SYSTEM_PROMPT = """
+You are an expert research query analyzer. Your job is to analyze user queries and determine:
+1. Whether the query requires iterative research (single focused question) or deep research (multiple sections/topics)
+2. Improve and refine the query for better research results
+3. Extract key entities (drugs, diseases, targets, companies, etc.)
+4. Extract specific research questions
+Guidelines for determining research mode:
+- **Iterative mode**: Single focused question, straightforward research goal, can be answered with a focused search loop
+  Examples: "What is the mechanism of metformin?", "Find clinical trials for drug X"
+- **Deep mode**: Complex query requiring multiple sections, comprehensive report, multiple related topics
+  Examples: "Write a comprehensive report on diabetes treatment", "Analyze the market for quantum computing"
+  Indicators: words like "comprehensive", "report", "sections", "analyze", "market analysis", "overview"
+Your output must be valid JSON matching the ParsedQuery schema. Always provide:
+- original_query: The exact input query
+- improved_query: A refined, clearer version of the query
+- research_mode: Either "iterative" or "deep"
+- key_entities: List of important entities (drugs, diseases, companies, etc.)
+- research_questions: List of specific questions to answer
+Only output JSON. Do not output anything else.
+"""
+class InputParserAgent:
+    """
+    Input parser agent that analyzes queries and determines research mode.
+    Uses Pydantic AI to generate structured ParsedQuery output with research
+    mode detection, query improvement, and entity extraction.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the input parser agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=ParsedQuery,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def parse(self, query: str) -> ParsedQuery:
+        """
+        Parse and analyze a user query.
+        Args:
+            query: The user's research query
+        Returns:
+            ParsedQuery with research mode, improved query, entities, and questions
+        Raises:
+            JudgeError: If parsing fails after retries
+            ConfigurationError: If agent configuration is invalid
+        """
+        self.logger.info("Parsing user query", query=query[:100])
+        user_message = f"QUERY: {query}"
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            parsed_query = result.output
+            # Validate parsed query
+            if not parsed_query.original_query:
+                self.logger.warning("Parsed query missing original_query", query=query[:100])
+                raise JudgeError("Parsed query must have original_query")
+            if not parsed_query.improved_query:
+                self.logger.warning("Parsed query missing improved_query", query=query[:100])
+                # Use original as fallback
+                parsed_query = ParsedQuery(
+                    original_query=parsed_query.original_query,
+                    improved_query=parsed_query.original_query,
+                    research_mode=parsed_query.research_mode,
+                    key_entities=parsed_query.key_entities,
+                    research_questions=parsed_query.research_questions,
+                )
+            self.logger.info(
+                "Query parsed successfully",
+                mode=parsed_query.research_mode,
+                entities=len(parsed_query.key_entities),
+                questions=len(parsed_query.research_questions),
+            )
+            return parsed_query
+        except Exception as e:
+            self.logger.error("Query parsing failed", error=str(e), query=query[:100])
+            # Fallback: return basic parsed query with heuristic mode detection
+            if isinstance(e, JudgeError | ConfigurationError):
+                raise
+            # Heuristic fallback
+            query_lower = query.lower()
+            research_mode: Literal["iterative", "deep"] = "iterative"
+            if any(
+                keyword in query_lower
+                for keyword in [
+                    "comprehensive",
+                    "report",
+                    "sections",
+                    "analyze",
+                    "analysis",
+                    "overview",
+                    "market",
+                ]
+            ):
+                research_mode = "deep"
+            return ParsedQuery(
+                original_query=query,
+                improved_query=query,
+                research_mode=research_mode,
+                key_entities=[],
+                research_questions=[],
+            )
+def create_input_parser_agent(model: Any | None = None) -> InputParserAgent:
+    """
+    Factory function to create an input parser agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured InputParserAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        # Get model from settings if not provided
+        if model is None:
+            model = get_model()
+        # Create and return input parser agent
+        return InputParserAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create input parser agent", error=str(e))
+        raise ConfigurationError(f"Failed to create input parser agent: {e}") from e

src/agents/judge_agent.py CHANGED Viewed

@@ -12,7 +12,7 @@ from agent_framework import (
     Role,
 )
-from src.orchestrator import JudgeHandlerProtocol
 from src.utils.models import Evidence, JudgeAssessment

     Role,
 )
+from src.legacy_orchestrator import JudgeHandlerProtocol
 from src.utils.models import Evidence, JudgeAssessment

src/agents/knowledge_gap.py ADDED Viewed

	@@ -0,0 +1,156 @@

+"""Knowledge gap agent for evaluating research completeness.
+Converts the folder/knowledge_gap_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import KnowledgeGapOutput
+logger = structlog.get_logger()
+# System prompt for the knowledge gap agent
+SYSTEM_PROMPT = f"""
+You are a Research State Evaluator. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+Your job is to critically analyze the current state of a research report,
+identify what knowledge gaps still exist and determine the best next step to take.
+You will be given:
+1. The original user query and any relevant background context to the query
+2. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
+Your task is to:
+1. Carefully review the findings and thoughts, particularly from the latest iteration, and assess their completeness in answering the original query
+2. Determine if the findings are sufficiently complete to end the research loop
+3. If not, identify up to 3 knowledge gaps that need to be addressed in sequence in order to continue with research - these should be relevant to the original query
+Be specific in the gaps you identify and include relevant information as this will be passed onto another agent to process without additional context.
+Only output JSON. Follow the JSON schema for KnowledgeGapOutput. Do not output anything else.
+"""
+class KnowledgeGapAgent:
+    """
+    Agent that evaluates research state and identifies knowledge gaps.
+    Uses Pydantic AI to generate structured KnowledgeGapOutput indicating
+    whether research is complete and what gaps remain.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the knowledge gap agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=KnowledgeGapOutput,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def evaluate(
+        self,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+        iteration: int = 0,
+        time_elapsed_minutes: float = 0.0,
+        max_time_minutes: int = 10,
+    ) -> KnowledgeGapOutput:
+        """
+        Evaluate research state and identify knowledge gaps.
+        Args:
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+            iteration: Current iteration number
+            time_elapsed_minutes: Time elapsed so far
+            max_time_minutes: Maximum time allowed
+        Returns:
+            KnowledgeGapOutput with research completeness and outstanding gaps
+        Raises:
+            JudgeError: If evaluation fails after retries
+        """
+        self.logger.info(
+            "Evaluating knowledge gaps",
+            query=query[:100],
+            iteration=iteration,
+        )
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+Current Iteration Number: {iteration}
+Time Elapsed: {time_elapsed_minutes:.2f} minutes of maximum {max_time_minutes} minutes
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            evaluation = result.output
+            self.logger.info(
+                "Knowledge gap evaluation complete",
+                research_complete=evaluation.research_complete,
+                gaps_count=len(evaluation.outstanding_gaps),
+            )
+            return evaluation
+        except Exception as e:
+            self.logger.error("Knowledge gap evaluation failed", error=str(e))
+            # Return fallback: research not complete, suggest continuing
+            return KnowledgeGapOutput(
+                research_complete=False,
+                outstanding_gaps=[f"Continue research on: {query}"],
+            )
+def create_knowledge_gap_agent(model: Any | None = None) -> KnowledgeGapAgent:
+    """
+    Factory function to create a knowledge gap agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured KnowledgeGapAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return KnowledgeGapAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create knowledge gap agent", error=str(e))
+        raise ConfigurationError(f"Failed to create knowledge gap agent: {e}") from e

src/agents/long_writer.py ADDED Viewed

	@@ -0,0 +1,431 @@

+"""Long writer agent for iteratively writing report sections.
+Converts the folder/long_writer_agent.py implementation to use Pydantic AI.
+"""
+import re
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic import BaseModel, Field
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import ReportDraft
+logger = structlog.get_logger()
+# LongWriterOutput model for structured output
+class LongWriterOutput(BaseModel):
+    """Output from the long writer agent for a single section."""
+    next_section_markdown: str = Field(
+        description="The final draft of the next section in markdown format"
+    )
+    references: list[str] = Field(
+        description="A list of URLs and their corresponding reference numbers for the section"
+    )
+    model_config = {"frozen": True}
+# System prompt for the long writer agent
+SYSTEM_PROMPT = f"""
+You are an expert report writer tasked with iteratively writing each section of a report.
+Today's date is {datetime.now().strftime('%Y-%m-%d')}.
+You will be provided with:
+1. The original research query
+2. A final draft of the report containing the table of contents and all sections written up until this point (in the first iteration there will be no sections written yet)
+3. A first draft of the next section of the report to be written
+OBJECTIVE:
+1. Write a final draft of the next section of the report with numbered citations in square brackets in the body of the report
+2. Produce a list of references to be appended to the end of the report
+CITATIONS/REFERENCES:
+The citations should be in numerical order, written in numbered square brackets in the body of the report.
+Separately, a list of all URLs and their corresponding reference numbers will be included at the end of the report.
+Follow the example below for formatting.
+LongWriterOutput(
+    next_section_markdown="The company specializes in IT consulting [1]. It operates in the software services market which is expected to grow at 10% per year [2].",
+    references=["[1] https://example.com/first-source-url", "[2] https://example.com/second-source-url"]
+)
+GUIDELINES:
+- You can reformat and reorganize the flow of the content and headings within a section to flow logically, but DO NOT remove details that were included in the first draft
+- Only remove text from the first draft if it is already mentioned earlier in the report, or if it should be covered in a later section per the table of contents
+- Ensure the heading for the section matches the table of contents
+- Format the final output and references section as markdown
+- Do not include a title for the reference section, just a list of numbered references
+Only output JSON. Follow the JSON schema for LongWriterOutput. Do not output anything else.
+"""
+class LongWriterAgent:
+    """
+    Agent that iteratively writes report sections with proper citations.
+    Uses Pydantic AI to generate structured LongWriterOutput for each section.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the long writer agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=LongWriterOutput,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def write_next_section(
+        self,
+        original_query: str,
+        report_draft: str,
+        next_section_title: str,
+        next_section_draft: str,
+    ) -> LongWriterOutput:
+        """
+        Write the next section of the report.
+        Args:
+            original_query: The original research query
+            report_draft: Current report draft (all sections written so far)
+            next_section_title: Title of the section to write
+            next_section_draft: Draft content for the next section
+        Returns:
+            LongWriterOutput with formatted section and references
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not original_query or not original_query.strip():
+            self.logger.warning("Empty query provided, using default")
+            original_query = "Research query"
+        if not next_section_title or not next_section_title.strip():
+            self.logger.warning("Empty section title provided, using default")
+            next_section_title = "Section"
+        if next_section_draft is None:
+            next_section_draft = ""
+        if report_draft is None:
+            report_draft = ""
+        # Truncate very long inputs
+        max_draft_length = 30000
+        if len(report_draft) > max_draft_length:
+            self.logger.warning(
+                "Report draft too long, truncating",
+                original_length=len(report_draft),
+            )
+            report_draft = report_draft[:max_draft_length] + "\n\n[Content truncated]"
+        if len(next_section_draft) > max_draft_length:
+            self.logger.warning(
+                "Section draft too long, truncating",
+                original_length=len(next_section_draft),
+            )
+            next_section_draft = next_section_draft[:max_draft_length] + "\n\n[Content truncated]"
+        self.logger.info(
+            "Writing next section",
+            section_title=next_section_title,
+            query=original_query[:100],
+        )
+        user_message = f"""
+<ORIGINAL QUERY>
+{original_query}
+</ORIGINAL QUERY>
+<CURRENT REPORT DRAFT>
+{report_draft or "No draft yet"}
+</CURRENT REPORT DRAFT>
+<TITLE OF NEXT SECTION TO WRITE>
+{next_section_title}
+</TITLE OF NEXT SECTION TO WRITE>
+<DRAFT OF NEXT SECTION>
+{next_section_draft}
+</DRAFT OF NEXT SECTION>
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                output = result.output
+                # Validate output
+                if not output or not isinstance(output, LongWriterOutput):
+                    raise ValueError("Invalid output format")
+                if not output.next_section_markdown or not output.next_section_markdown.strip():
+                    self.logger.warning("Empty section generated, using fallback")
+                    raise ValueError("Empty section generated")
+                self.logger.info(
+                    "Section written",
+                    section_title=next_section_title,
+                    references_count=len(output.references),
+                    attempt=attempt + 1,
+                )
+                return output
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Section writing failed",
+                    error=str(e),
+                    error_type=type(e).__name__,
+                )
+                break
+        # Return fallback section if all attempts failed
+        self.logger.error(
+            "Section writing failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        return LongWriterOutput(
+            next_section_markdown=f"## {next_section_title}\n\n{next_section_draft}",
+            references=[],
+        )
+    async def write_report(
+        self,
+        original_query: str,
+        report_title: str,
+        report_draft: ReportDraft,
+    ) -> str:
+        """
+        Write the final report by iteratively writing each section.
+        Args:
+            original_query: The original research query
+            report_title: Title of the report
+            report_draft: ReportDraft with all sections
+        Returns:
+            Complete markdown report string
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not original_query or not original_query.strip():
+            self.logger.warning("Empty query provided, using default")
+            original_query = "Research query"
+        if not report_title or not report_title.strip():
+            self.logger.warning("Empty report title provided, using default")
+            report_title = "Research Report"
+        if not report_draft or not report_draft.sections:
+            self.logger.warning("Empty report draft provided, returning minimal report")
+            return f"# {report_title}\n\n## Query\n{original_query}\n\n*No sections available.*"
+        self.logger.info(
+            "Writing full report",
+            report_title=report_title,
+            sections_count=len(report_draft.sections),
+        )
+        # Initialize the final draft with title and table of contents
+        final_draft = (
+            f"# {report_title}\n\n## Table of Contents\n\n"
+            + "\n".join(
+                [
+                    f"{i+1}. {section.section_title}"
+                    for i, section in enumerate(report_draft.sections)
+                ]
+            )
+            + "\n\n"
+        )
+        all_references: list[str] = []
+        for section in report_draft.sections:
+            # Write each section
+            next_section_output = await self.write_next_section(
+                original_query,
+                final_draft,
+                section.section_title,
+                section.section_content,
+            )
+            # Reformat references and update section markdown
+            section_markdown, all_references = self._reformat_references(
+                next_section_output.next_section_markdown,
+                next_section_output.references,
+                all_references,
+            )
+            # Reformat section headings
+            section_markdown = self._reformat_section_headings(section_markdown)
+            # Add to final draft
+            final_draft += section_markdown + "\n\n"
+        # Add final references
+        final_draft += "## References:\n\n" + "  \n".join(all_references)
+        self.logger.info("Full report written", length=len(final_draft))
+        return final_draft
+    def _reformat_references(
+        self,
+        section_markdown: str,
+        section_references: list[str],
+        all_references: list[str],
+    ) -> tuple[str, list[str]]:
+        """
+        Reformat references: re-number, de-duplicate, and update markdown.
+        Args:
+            section_markdown: Markdown content with inline references [1], [2]
+            section_references: List of references for this section
+            all_references: Accumulated references from previous sections
+        Returns:
+            Tuple of (updated markdown, updated all_references)
+        """
+        # Convert reference lists to maps (URL -> ref_num)
+        def convert_ref_list_to_map(ref_list: list[str]) -> dict[str, int]:
+            ref_map: dict[str, int] = {}
+            for ref in ref_list:
+                try:
+                    # Parse "[1] https://example.com" format
+                    parts = ref.split("]", 1)
+                    if len(parts) == 2:
+                        ref_num = int(parts[0].strip("["))
+                        url = parts[1].strip()
+                        ref_map[url] = ref_num
+                except (ValueError, IndexError):
+                    logger.warning("Invalid reference format", ref=ref)
+                    continue
+            return ref_map
+        section_ref_map = convert_ref_list_to_map(section_references)
+        report_ref_map = convert_ref_list_to_map(all_references)
+        section_to_report_ref_map: dict[int, int] = {}
+        report_urls = set(report_ref_map.keys())
+        ref_count = max(report_ref_map.values() or [0])
+        # Map section references to report references
+        for url, section_ref_num in section_ref_map.items():
+            if url in report_urls:
+                # URL already exists - reuse its reference number
+                section_to_report_ref_map[section_ref_num] = report_ref_map[url]
+            else:
+                # New URL - assign next reference number
+                ref_count += 1
+                section_to_report_ref_map[section_ref_num] = ref_count
+                all_references.append(f"[{ref_count}] {url}")
+        # Replace reference numbers in markdown
+        def replace_reference(match: re.Match[str]) -> str:
+            ref_num = int(match.group(1))
+            mapped_ref_num = section_to_report_ref_map.get(ref_num)
+            if mapped_ref_num:
+                return f"[{mapped_ref_num}]"
+            return ""
+        updated_markdown = re.sub(r"\[(\d+)\]", replace_reference, section_markdown)
+        return updated_markdown, all_references
+    def _reformat_section_headings(self, section_markdown: str) -> str:
+        """
+        Reformat section headings to be consistent (level-2 for main heading).
+        Args:
+            section_markdown: Markdown content with headings
+        Returns:
+            Updated markdown with adjusted heading levels
+        """
+        if not section_markdown.strip():
+            return section_markdown
+        # Find first heading level
+        first_heading_match = re.search(r"^(#+)\s", section_markdown, re.MULTILINE)
+        if not first_heading_match:
+            return section_markdown
+        # Calculate level adjustment needed (target is level 2)
+        first_heading_level = len(first_heading_match.group(1))
+        level_adjustment = 2 - first_heading_level
+        def adjust_heading_level(match: re.Match[str]) -> str:
+            hashes = match.group(1)
+            content = match.group(2)
+            new_level = max(2, len(hashes) + level_adjustment)
+            return "#" * new_level + " " + content
+        # Apply heading adjustment
+        return re.sub(r"^(#+)\s(.+)$", adjust_heading_level, section_markdown, flags=re.MULTILINE)
+def create_long_writer_agent(model: Any | None = None) -> LongWriterAgent:
+    """
+    Factory function to create a long writer agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured LongWriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return LongWriterAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create long writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create long writer agent: {e}") from e

src/agents/proofreader.py ADDED Viewed

	@@ -0,0 +1,205 @@

+"""Proofreader agent for finalizing report drafts.
+Converts the folder/proofreader_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import ReportDraft
+logger = structlog.get_logger()
+# System prompt for the proofreader agent
+SYSTEM_PROMPT = f"""
+You are a research expert who proofreads and edits research reports.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You are given:
+1. The original query topic for the report
+2. A first draft of the report in ReportDraft format containing each section in sequence
+Your task is to:
+1. **Combine sections:** Concatenate the sections into a single string
+2. **Add section titles:** Add the section titles to the beginning of each section in markdown format, as well as a main title for the report
+3. **De-duplicate:** Remove duplicate content across sections to avoid repetition
+4. **Remove irrelevant sections:** If any sections or sub-sections are completely irrelevant to the query, remove them
+5. **Refine wording:** Edit the wording of the report to be polished, concise and punchy, but **without eliminating any detail** or large chunks of text
+6. **Add a summary:** Add a short report summary / outline to the beginning of the report to provide an overview of the sections and what is discussed
+7. **Preserve sources:** Preserve all sources / references - move the long list of references to the end of the report
+8. **Update reference numbers:** Continue to include reference numbers in square brackets  ([1], [2], [3], etc.) in the main body of the report, but update the numbering to match the new order of references at the end of the report
+9. **Output final report:** Output the final report in markdown format (do not wrap it in a code block)
+Guidelines:
+- Do not add any new facts or data to the report
+- Do not remove any content from the report unless it is very clearly wrong, contradictory or irrelevant
+- Remove or reformat any redundant or excessive headings, and ensure that the final nesting of heading levels is correct
+- Ensure that the final report flows well and has a logical structure
+- Include all sources and references that are present in the final report
+"""
+class ProofreaderAgent:
+    """
+    Agent that proofreads and finalizes report drafts.
+    Uses Pydantic AI to generate polished markdown reports from draft sections.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the proofreader agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns markdown text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def proofread(
+        self,
+        query: str,
+        report_draft: ReportDraft,
+    ) -> str:
+        """
+        Proofread and finalize a report draft.
+        Args:
+            query: The original research query
+            report_draft: ReportDraft with all sections
+        Returns:
+            Final polished markdown report string
+        Raises:
+            ConfigurationError: If proofreading fails
+        """
+        # Input validation
+        if not query or not query.strip():
+            self.logger.warning("Empty query provided, using default")
+            query = "Research query"
+        if not report_draft or not report_draft.sections:
+            self.logger.warning("Empty report draft provided, returning minimal report")
+            return f"# Research Report\n\n## Query\n{query}\n\n*No sections available.*"
+        # Validate section structure
+        valid_sections = []
+        for section in report_draft.sections:
+            if section.section_title and section.section_title.strip():
+                valid_sections.append(section)
+            else:
+                self.logger.warning("Skipping section with empty title")
+        if not valid_sections:
+            self.logger.warning("No valid sections in draft, returning minimal report")
+            return f"# Research Report\n\n## Query\n{query}\n\n*No valid sections available.*"
+        self.logger.info(
+            "Proofreading report",
+            query=query[:100],
+            sections_count=len(valid_sections),
+        )
+        # Create validated draft
+        validated_draft = ReportDraft(sections=valid_sections)
+        user_message = f"""
+QUERY:
+{query}
+REPORT DRAFT:
+{validated_draft.model_dump_json()}
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                final_report = result.output
+                # Validate output
+                if not final_report or not final_report.strip():
+                    self.logger.warning("Empty report generated, using fallback")
+                    raise ValueError("Empty report generated")
+                self.logger.info("Report proofread", length=len(final_report), attempt=attempt + 1)
+                return final_report
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Proofreading failed",
+                    error=str(e),
+                    error_type=type(e).__name__,
+                )
+                break
+        # Return fallback: combine sections manually
+        self.logger.error(
+            "Proofreading failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        sections = [
+            f"## {section.section_title}\n\n{section.section_content or 'Content unavailable.'}"
+            for section in valid_sections
+        ]
+        return f"# Research Report\n\n## Query\n{query}\n\n" + "\n\n".join(sections)
+def create_proofreader_agent(model: Any | None = None) -> ProofreaderAgent:
+    """
+    Factory function to create a proofreader agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ProofreaderAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ProofreaderAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create proofreader agent", error=str(e))
+        raise ConfigurationError(f"Failed to create proofreader agent: {e}") from e

src/agents/search_agent.py CHANGED Viewed

@@ -10,7 +10,7 @@ from agent_framework import (
     Role,
 )
-from src.orchestrator import SearchHandlerProtocol
 from src.utils.models import Citation, Evidence, SearchResult
 if TYPE_CHECKING:

     Role,
 )
+from src.legacy_orchestrator import SearchHandlerProtocol
 from src.utils.models import Citation, Evidence, SearchResult
 if TYPE_CHECKING:

src/agents/state.py CHANGED Viewed

@@ -1,9 +1,11 @@
 """Thread-safe state management for Magentic agents.
-Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
-searching simultaneously via Gradio).
 """
 from contextvars import ContextVar
 from typing import TYPE_CHECKING, Any
@@ -15,8 +17,20 @@ if TYPE_CHECKING:
     from src.services.embeddings import EmbeddingService
 class MagenticState(BaseModel):
-    """Mutable state for a Magentic workflow session."""
     evidence: list[Evidence] = Field(default_factory=list)
     # Type as Any to avoid circular imports/runtime resolution issues
@@ -75,14 +89,22 @@ _magentic_state_var: ContextVar[MagenticState | None] = ContextVar("magentic_sta
 def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
-    """Initialize a new state for the current context."""
     state = MagenticState(embedding_service=embedding_service)
     _magentic_state_var.set(state)
     return state
 def get_magentic_state() -> MagenticState:
-    """Get the current state. Raises RuntimeError if not initialized."""
     state = _magentic_state_var.get()
     if state is None:
         # Auto-initialize if missing (e.g. during tests or simple scripts)

 """Thread-safe state management for Magentic agents.
+DEPRECATED: This module is deprecated. Use src.middleware.state_machine instead.
+This file is kept for backward compatibility and will be removed in a future version.
 """
+import warnings
 from contextvars import ContextVar
 from typing import TYPE_CHECKING, Any
     from src.services.embeddings import EmbeddingService
+def _deprecation_warning() -> None:
+    """Emit deprecation warning for this module."""
+    warnings.warn(
+        "src.agents.state is deprecated. Use src.middleware.state_machine instead.",
+        DeprecationWarning,
+        stacklevel=3,
+    )
 class MagenticState(BaseModel):
+    """Mutable state for a Magentic workflow session.
+    DEPRECATED: Use WorkflowState from src.middleware.state_machine instead.
+    """
     evidence: list[Evidence] = Field(default_factory=list)
     # Type as Any to avoid circular imports/runtime resolution issues
 def init_magentic_state(embedding_service: "EmbeddingService | None" = None) -> MagenticState:
+    """Initialize a new state for the current context.
+    DEPRECATED: Use init_workflow_state from src.middleware.state_machine instead.
+    """
+    _deprecation_warning()
     state = MagenticState(embedding_service=embedding_service)
     _magentic_state_var.set(state)
     return state
 def get_magentic_state() -> MagenticState:
+    """Get the current state. Raises RuntimeError if not initialized.
+    DEPRECATED: Use get_workflow_state from src.middleware.state_machine instead.
+    """
+    _deprecation_warning()
     state = _magentic_state_var.get()
     if state is None:
         # Auto-initialize if missing (e.g. during tests or simple scripts)

src/agents/thinking.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""Thinking agent for generating observations and reflections.
+Converts the folder/thinking_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+logger = structlog.get_logger()
+# System prompt for the thinking agent
+SYSTEM_PROMPT = f"""
+You are a research expert who is managing a research process in iterations. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You are given:
+1. The original research query along with some supporting background context
+2. A history of the tasks, actions, findings and thoughts you've made up until this point in the research process (on iteration 1 you will be at the start of the research process, so this will be empty)
+Your objective is to reflect on the research process so far and share your latest thoughts.
+Specifically, your thoughts should include reflections on questions such as:
+- What have you learned from the last iteration?
+- What new areas would you like to explore next, or existing topics you'd like to go deeper into?
+- Were you able to retrieve the information you were looking for in the last iteration?
+- If not, should we change our approach or move to the next topic?
+- Is there any info that is contradictory or conflicting?
+Guidelines:
+- Share your stream of consciousness on the above questions as raw text
+- Keep your response concise and informal
+- Focus most of your thoughts on the most recent iteration and how that influences this next iteration
+- Our aim is to do very deep and thorough research - bear this in mind when reflecting on the research process
+- DO NOT produce a draft of the final report. This is not your job.
+- If this is the first iteration (i.e. no data from prior iterations), provide thoughts on what info we need to gather in the first iteration to get started
+"""
+class ThinkingAgent:
+    """
+    Agent that generates observations and reflections on the research process.
+    Uses Pydantic AI to generate unstructured text observations about
+    the current state of research and next steps.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the thinking agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def generate_observations(
+        self,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+        iteration: int = 1,
+    ) -> str:
+        """
+        Generate observations about the research process.
+        Args:
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+            iteration: Current iteration number
+        Returns:
+            String containing observations and reflections
+        Raises:
+            ConfigurationError: If generation fails
+        """
+        self.logger.info(
+            "Generating observations",
+            query=query[:100],
+            iteration=iteration,
+        )
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+You are starting iteration {iteration} of your research process.
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            observations = result.output
+            self.logger.info("Observations generated", length=len(observations))
+            return observations
+        except Exception as e:
+            self.logger.error("Observation generation failed", error=str(e))
+            # Return fallback observations
+            return f"Starting iteration {iteration}. Need to gather information about: {query}"
+def create_thinking_agent(model: Any | None = None) -> ThinkingAgent:
+    """
+    Factory function to create a thinking agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ThinkingAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ThinkingAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create thinking agent", error=str(e))
+        raise ConfigurationError(f"Failed to create thinking agent: {e}") from e

src/agents/tool_selector.py ADDED Viewed

	@@ -0,0 +1,168 @@

+"""Tool selector agent for choosing which tools to use for knowledge gaps.
+Converts the folder/tool_selector_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import AgentSelectionPlan
+logger = structlog.get_logger()
+# System prompt for the tool selector agent
+SYSTEM_PROMPT = f"""
+You are a Tool Selector responsible for determining which specialized agents should address a knowledge gap in a research project.
+Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+You will be given:
+1. The original user query
+2. A knowledge gap identified in the research
+3. A full history of the tasks, actions, findings and thoughts you've made up until this point in the research process
+Your task is to decide:
+1. Which specialized agents are best suited to address the gap
+2. What specific queries should be given to the agents (keep this short - 3-6 words)
+Available specialized agents:
+- WebSearchAgent: General web search for broad topics (can be called multiple times with different queries)
+- SiteCrawlerAgent: Crawl the pages of a specific website to retrieve information about it - use this if you want to find out something about a particular company, entity or product
+- RAGAgent: Semantic search within previously collected evidence - use when you need to find information from evidence already gathered in this research session. Best for finding connections, summarizing collected evidence, or retrieving specific details from earlier findings.
+Guidelines:
+- Aim to call at most 3 agents at a time in your final output
+- You can list the WebSearchAgent multiple times with different queries if needed to cover the full scope of the knowledge gap
+- Be specific and concise (3-6 words) with the agent queries - they should target exactly what information is needed
+- If you know the website or domain name of an entity being researched, always include it in the query
+- Use RAGAgent when: (1) You need to search within evidence already collected, (2) You want to find connections between different findings, (3) You need to retrieve specific details from earlier research iterations
+- Use WebSearchAgent or SiteCrawlerAgent when: (1) You need fresh information from the web, (2) You're starting a new research direction, (3) You need information not yet in the collected evidence
+- If a gap doesn't clearly match any agent's capability, default to the WebSearchAgent
+- Use the history of actions / tool calls as a guide - try not to repeat yourself if an approach didn't work previously
+Only output JSON. Follow the JSON schema for AgentSelectionPlan. Do not output anything else.
+"""
+class ToolSelectorAgent:
+    """
+    Agent that selects appropriate tools to address knowledge gaps.
+    Uses Pydantic AI to generate structured AgentSelectionPlan with
+    specific tasks for web search and crawl agents.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the tool selector agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=AgentSelectionPlan,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def select_tools(
+        self,
+        gap: str,
+        query: str,
+        background_context: str = "",
+        conversation_history: str = "",
+    ) -> AgentSelectionPlan:
+        """
+        Select tools to address a knowledge gap.
+        Args:
+            gap: The knowledge gap to address
+            query: The original research query
+            background_context: Optional background context
+            conversation_history: History of actions, findings, and thoughts
+        Returns:
+            AgentSelectionPlan with tasks for selected agents
+        Raises:
+            ConfigurationError: If selection fails
+        """
+        self.logger.info("Selecting tools for gap", gap=gap[:100], query=query[:100])
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        user_message = f"""
+ORIGINAL QUERY:
+{query}
+KNOWLEDGE GAP TO ADDRESS:
+{gap}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            selection_plan = result.output
+            self.logger.info(
+                "Tool selection complete",
+                tasks_count=len(selection_plan.tasks),
+                agents=[task.agent for task in selection_plan.tasks],
+            )
+            return selection_plan
+        except Exception as e:
+            self.logger.error("Tool selection failed", error=str(e))
+            # Return fallback: use web search
+            from src.utils.models import AgentTask
+            return AgentSelectionPlan(
+                tasks=[
+                    AgentTask(
+                        gap=gap,
+                        agent="WebSearchAgent",
+                        query=gap[:50],  # Use gap as query
+                        entity_website=None,
+                    )
+                ]
+            )
+def create_tool_selector_agent(model: Any | None = None) -> ToolSelectorAgent:
+    """
+    Factory function to create a tool selector agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured ToolSelectorAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return ToolSelectorAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create tool selector agent", error=str(e))
+        raise ConfigurationError(f"Failed to create tool selector agent: {e}") from e

src/agents/writer.py ADDED Viewed

	@@ -0,0 +1,209 @@

+"""Writer agent for generating final reports from findings.
+Converts the folder/writer_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.utils.exceptions import ConfigurationError
+logger = structlog.get_logger()
+# System prompt for the writer agent
+SYSTEM_PROMPT = f"""
+You are a senior researcher tasked with comprehensively answering a research query.
+Today's date is {datetime.now().strftime('%Y-%m-%d')}.
+You will be provided with the original query along with research findings put together by a research assistant.
+Your objective is to generate the final response in markdown format.
+The response should be as lengthy and detailed as possible with the information provided, focusing on answering the original query.
+In your final output, include references to the source URLs for all information and data gathered.
+This should be formatted in the form of a numbered square bracket next to the relevant information,
+followed by a list of URLs at the end of the response, per the example below.
+EXAMPLE REFERENCE FORMAT:
+The company has XYZ products [1]. It operates in the software services market which is expected to grow at 10% per year [2].
+References:
+[1] https://example.com/first-source-url
+[2] https://example.com/second-source-url
+GUIDELINES:
+* Answer the query directly, do not include unrelated or tangential information.
+* Adhere to any instructions on the length of your final response if provided in the user prompt.
+* If any additional guidelines are provided in the user prompt, follow them exactly and give them precedence over these system instructions.
+"""
+class WriterAgent:
+    """
+    Agent that generates final reports from research findings.
+    Uses Pydantic AI to generate markdown reports with citations.
+    """
+    def __init__(self, model: Any | None = None) -> None:
+        """
+        Initialize the writer agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+        """
+        self.model = model or get_model()
+        self.logger = logger
+        # Initialize Pydantic AI Agent (no structured output - returns markdown text)
+        self.agent = Agent(
+            model=self.model,
+            system_prompt=SYSTEM_PROMPT,
+            retries=3,
+        )
+    async def write_report(
+        self,
+        query: str,
+        findings: str,
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Write a final report from findings.
+        Args:
+            query: The original research query
+            findings: All findings collected during research
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Markdown formatted report string
+        Raises:
+            ConfigurationError: If writing fails
+        """
+        # Input validation
+        if not query or not query.strip():
+            self.logger.warning("Empty query provided, using default")
+            query = "Research query"
+        if findings is None:
+            self.logger.warning("None findings provided, using empty string")
+            findings = "No findings available."
+        # Truncate very long inputs to prevent context overflow
+        max_findings_length = 50000  # ~12k tokens
+        if len(findings) > max_findings_length:
+            self.logger.warning(
+                "Findings too long, truncating",
+                original_length=len(findings),
+                truncated_length=max_findings_length,
+            )
+            findings = findings[:max_findings_length] + "\n\n[Content truncated due to length]"
+        self.logger.info("Writing final report", query=query[:100], findings_length=len(findings))
+        length_str = (
+            f"* The full response should be approximately {output_length}.\n"
+            if output_length
+            else ""
+        )
+        instructions_str = f"* {output_instructions}" if output_instructions else ""
+        guidelines_str = (
+            ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
+            if length_str or instructions_str
+            else ""
+        )
+        user_message = f"""
+Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
+QUERY: {query}
+FINDINGS:
+{findings}
+"""
+        # Retry logic for transient failures
+        max_retries = 3
+        last_exception: Exception | None = None
+        for attempt in range(max_retries):
+            try:
+                # Run the agent
+                result = await self.agent.run(user_message)
+                report = result.output
+                # Validate output
+                if not report or not report.strip():
+                    self.logger.warning("Empty report generated, using fallback")
+                    raise ValueError("Empty report generated")
+                self.logger.info("Report written", length=len(report), attempt=attempt + 1)
+                return report
+            except (TimeoutError, ConnectionError) as e:
+                # Transient errors - retry
+                last_exception = e
+                if attempt < max_retries - 1:
+                    self.logger.warning(
+                        "Transient error, retrying",
+                        error=str(e),
+                        attempt=attempt + 1,
+                        max_retries=max_retries,
+                    )
+                    continue
+                else:
+                    self.logger.error("Max retries exceeded for transient error", error=str(e))
+                    break
+            except Exception as e:
+                # Non-transient errors - don't retry
+                last_exception = e
+                self.logger.error(
+                    "Report writing failed", error=str(e), error_type=type(e).__name__
+                )
+                break
+        # Return fallback report if all attempts failed
+        self.logger.error(
+            "Report writing failed after all attempts",
+            error=str(last_exception) if last_exception else "Unknown error",
+        )
+        # Truncate findings in fallback if too long
+        fallback_findings = findings[:500] + "..." if len(findings) > 500 else findings
+        return (
+            f"# Research Report\n\n"
+            f"## Query\n{query}\n\n"
+            f"## Findings\n{fallback_findings}\n\n"
+            f"*Note: Report generation encountered an error. This is a fallback report.*"
+        )
+def create_writer_agent(model: Any | None = None) -> WriterAgent:
+    """
+    Factory function to create a writer agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured WriterAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        if model is None:
+            model = get_model()
+        return WriterAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create writer agent", error=str(e))
+        raise ConfigurationError(f"Failed to create writer agent: {e}") from e

src/{orchestrator.py → legacy_orchestrator.py} RENAMED Viewed

File without changes

src/middleware/__init__.py ADDED Viewed

	@@ -0,0 +1,33 @@

+"""Middleware for workflow state management, parallel loop coordination, and budget tracking.
+This module provides:
+- WorkflowState: Thread-safe state management using ContextVar
+- WorkflowManager: Coordination of parallel research loops
+- BudgetTracker: Token, time, and iteration budget tracking
+"""
+from src.middleware.budget_tracker import BudgetStatus, BudgetTracker
+from src.middleware.state_machine import (
+    WorkflowState,
+    get_workflow_state,
+    init_workflow_state,
+)
+from src.middleware.workflow_manager import (
+    LoopStatus,
+    ResearchLoop,
+    WorkflowManager,
+)
+__all__ = [
+    # State management
+    "WorkflowState",
+    "init_workflow_state",
+    "get_workflow_state",
+    # Workflow management
+    "WorkflowManager",
+    "ResearchLoop",
+    "LoopStatus",
+    # Budget tracking
+    "BudgetTracker",
+    "BudgetStatus",
+]

src/middleware/budget_tracker.py ADDED Viewed

	@@ -0,0 +1,390 @@

+"""Budget tracking for research loops.
+Tracks token usage, time elapsed, and iteration counts per loop and globally.
+Enforces budget constraints to prevent infinite loops and excessive resource usage.
+"""
+import time
+import structlog
+from pydantic import BaseModel, Field
+logger = structlog.get_logger()
+class BudgetStatus(BaseModel):
+    """Status of a budget (tokens, time, iterations)."""
+    tokens_used: int = Field(default=0, description="Total tokens used")
+    tokens_limit: int = Field(default=100000, description="Token budget limit", ge=0)
+    time_elapsed_seconds: float = Field(default=0.0, description="Time elapsed", ge=0.0)
+    time_limit_seconds: float = Field(
+        default=600.0, description="Time budget limit (10 min default)", ge=0.0
+    )
+    iterations: int = Field(default=0, description="Number of iterations completed", ge=0)
+    iterations_limit: int = Field(default=10, description="Maximum iterations", ge=1)
+    iteration_tokens: dict[int, int] = Field(
+        default_factory=dict,
+        description="Tokens used per iteration (iteration number -> token count)",
+    )
+    def is_exceeded(self) -> bool:
+        """Check if any budget limit has been exceeded.
+        Returns:
+            True if any limit is exceeded, False otherwise.
+        """
+        return (
+            self.tokens_used >= self.tokens_limit
+            or self.time_elapsed_seconds >= self.time_limit_seconds
+            or self.iterations >= self.iterations_limit
+        )
+    def remaining_tokens(self) -> int:
+        """Get remaining token budget.
+        Returns:
+            Remaining tokens (may be negative if exceeded).
+        """
+        return self.tokens_limit - self.tokens_used
+    def remaining_time_seconds(self) -> float:
+        """Get remaining time budget.
+        Returns:
+            Remaining time in seconds (may be negative if exceeded).
+        """
+        return self.time_limit_seconds - self.time_elapsed_seconds
+    def remaining_iterations(self) -> int:
+        """Get remaining iteration budget.
+        Returns:
+            Remaining iterations (may be negative if exceeded).
+        """
+        return self.iterations_limit - self.iterations
+    def add_iteration_tokens(self, iteration: int, tokens: int) -> None:
+        """Add tokens for a specific iteration.
+        Args:
+            iteration: Iteration number (1-indexed).
+            tokens: Number of tokens to add.
+        """
+        if iteration not in self.iteration_tokens:
+            self.iteration_tokens[iteration] = 0
+        self.iteration_tokens[iteration] += tokens
+        # Also add to total tokens
+        self.tokens_used += tokens
+    def get_iteration_tokens(self, iteration: int) -> int:
+        """Get tokens used for a specific iteration.
+        Args:
+            iteration: Iteration number.
+        Returns:
+            Token count for the iteration, or 0 if not found.
+        """
+        return self.iteration_tokens.get(iteration, 0)
+class BudgetTracker:
+    """Tracks budgets per loop and globally."""
+    def __init__(self) -> None:
+        """Initialize the budget tracker."""
+        self._budgets: dict[str, BudgetStatus] = {}
+        self._start_times: dict[str, float] = {}
+        self._global_budget: BudgetStatus | None = None
+    def create_budget(
+        self,
+        loop_id: str,
+        tokens_limit: int = 100000,
+        time_limit_seconds: float = 600.0,
+        iterations_limit: int = 10,
+    ) -> BudgetStatus:
+        """Create a budget for a specific loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            tokens_limit: Maximum tokens allowed.
+            time_limit_seconds: Maximum time allowed in seconds.
+            iterations_limit: Maximum iterations allowed.
+        Returns:
+            The created BudgetStatus instance.
+        """
+        budget = BudgetStatus(
+            tokens_limit=tokens_limit,
+            time_limit_seconds=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        self._budgets[loop_id] = budget
+        logger.debug(
+            "Budget created",
+            loop_id=loop_id,
+            tokens_limit=tokens_limit,
+            time_limit=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        return budget
+    def get_budget(self, loop_id: str) -> BudgetStatus | None:
+        """Get the budget for a specific loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            The BudgetStatus instance, or None if not found.
+        """
+        return self._budgets.get(loop_id)
+    def add_tokens(self, loop_id: str, tokens: int) -> None:
+        """Add tokens to a loop's budget.
+        Args:
+            loop_id: Unique identifier for the loop.
+            tokens: Number of tokens to add (can be negative).
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        self._budgets[loop_id].tokens_used += tokens
+        logger.debug("Tokens added", loop_id=loop_id, tokens=tokens)
+    def add_iteration_tokens(self, loop_id: str, iteration: int, tokens: int) -> None:
+        """Add tokens for a specific iteration.
+        Args:
+            loop_id: Loop identifier.
+            iteration: Iteration number (1-indexed).
+            tokens: Number of tokens to add.
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        budget = self._budgets[loop_id]
+        budget.add_iteration_tokens(iteration, tokens)
+        logger.debug(
+            "Iteration tokens added",
+            loop_id=loop_id,
+            iteration=iteration,
+            tokens=tokens,
+            total_iteration=budget.get_iteration_tokens(iteration),
+        )
+    def get_iteration_tokens(self, loop_id: str, iteration: int) -> int:
+        """Get tokens used for a specific iteration.
+        Args:
+            loop_id: Loop identifier.
+            iteration: Iteration number.
+        Returns:
+            Token count for the iteration, or 0 if not found.
+        """
+        if loop_id not in self._budgets:
+            return 0
+        return self._budgets[loop_id].get_iteration_tokens(iteration)
+    def start_timer(self, loop_id: str) -> None:
+        """Start the timer for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        self._start_times[loop_id] = time.time()
+        logger.debug("Timer started", loop_id=loop_id)
+    def update_timer(self, loop_id: str) -> None:
+        """Update the elapsed time for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._start_times:
+            logger.warning("Timer not started for loop", loop_id=loop_id)
+            return
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        elapsed = time.time() - self._start_times[loop_id]
+        self._budgets[loop_id].time_elapsed_seconds = elapsed
+        logger.debug("Timer updated", loop_id=loop_id, elapsed=elapsed)
+    def increment_iteration(self, loop_id: str) -> None:
+        """Increment the iteration count for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._budgets:
+            logger.warning("Budget not found for loop", loop_id=loop_id)
+            return
+        self._budgets[loop_id].iterations += 1
+        logger.debug(
+            "Iteration incremented",
+            loop_id=loop_id,
+            iterations=self._budgets[loop_id].iterations,
+        )
+    def check_budget(self, loop_id: str) -> tuple[bool, str]:
+        """Check if a loop's budget has been exceeded.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            Tuple of (exceeded: bool, reason: str). Reason is empty if not exceeded.
+        """
+        if loop_id not in self._budgets:
+            return False, ""
+        budget = self._budgets[loop_id]
+        self.update_timer(loop_id)  # Update time before checking
+        if budget.is_exceeded():
+            reasons = []
+            if budget.tokens_used >= budget.tokens_limit:
+                reasons.append("tokens")
+            if budget.time_elapsed_seconds >= budget.time_limit_seconds:
+                reasons.append("time")
+            if budget.iterations >= budget.iterations_limit:
+                reasons.append("iterations")
+            reason = f"Budget exceeded: {', '.join(reasons)}"
+            logger.warning("Budget exceeded", loop_id=loop_id, reason=reason)
+            return True, reason
+        return False, ""
+    def can_continue(self, loop_id: str) -> bool:
+        """Check if a loop can continue based on budget.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            True if the loop can continue, False if budget is exceeded.
+        """
+        exceeded, _ = self.check_budget(loop_id)
+        return not exceeded
+    def get_budget_summary(self, loop_id: str) -> str:
+        """Get a formatted summary of a loop's budget status.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            Formatted string summary.
+        """
+        if loop_id not in self._budgets:
+            return f"Budget not found for loop: {loop_id}"
+        budget = self._budgets[loop_id]
+        self.update_timer(loop_id)
+        return (
+            f"Loop {loop_id}: "
+            f"Tokens: {budget.tokens_used}/{budget.tokens_limit} "
+            f"({budget.remaining_tokens()} remaining), "
+            f"Time: {budget.time_elapsed_seconds:.1f}/{budget.time_limit_seconds:.1f}s "
+            f"({budget.remaining_time_seconds():.1f}s remaining), "
+            f"Iterations: {budget.iterations}/{budget.iterations_limit} "
+            f"({budget.remaining_iterations()} remaining)"
+        )
+    def reset_budget(self, loop_id: str) -> None:
+        """Reset the budget for a loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id in self._budgets:
+            old_budget = self._budgets[loop_id]
+            # Preserve iteration_tokens when resetting
+            old_iteration_tokens = old_budget.iteration_tokens
+            self._budgets[loop_id] = BudgetStatus(
+                tokens_limit=old_budget.tokens_limit,
+                time_limit_seconds=old_budget.time_limit_seconds,
+                iterations_limit=old_budget.iterations_limit,
+                iteration_tokens=old_iteration_tokens,  # Restore old iteration tokens
+            )
+            if loop_id in self._start_times:
+                self._start_times[loop_id] = time.time()
+            logger.debug("Budget reset", loop_id=loop_id)
+    def set_global_budget(
+        self,
+        tokens_limit: int = 100000,
+        time_limit_seconds: float = 600.0,
+        iterations_limit: int = 10,
+    ) -> None:
+        """Set a global budget that applies to all loops.
+        Args:
+            tokens_limit: Maximum tokens allowed globally.
+            time_limit_seconds: Maximum time allowed in seconds.
+            iterations_limit: Maximum iterations allowed globally.
+        """
+        self._global_budget = BudgetStatus(
+            tokens_limit=tokens_limit,
+            time_limit_seconds=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+        logger.debug(
+            "Global budget set",
+            tokens_limit=tokens_limit,
+            time_limit=time_limit_seconds,
+            iterations_limit=iterations_limit,
+        )
+    def get_global_budget(self) -> BudgetStatus | None:
+        """Get the global budget.
+        Returns:
+            The global BudgetStatus instance, or None if not set.
+        """
+        return self._global_budget
+    def add_global_tokens(self, tokens: int) -> None:
+        """Add tokens to the global budget.
+        Args:
+            tokens: Number of tokens to add (can be negative).
+        """
+        if self._global_budget is None:
+            logger.warning("Global budget not set")
+            return
+        self._global_budget.tokens_used += tokens
+        logger.debug("Global tokens added", tokens=tokens)
+    def estimate_tokens(self, text: str) -> int:
+        """Estimate token count from text (rough estimate: ~4 chars per token).
+        Args:
+            text: Text to estimate tokens for.
+        Returns:
+            Estimated token count.
+        """
+        return len(text) // 4
+    def estimate_llm_call_tokens(self, prompt: str, response: str) -> int:
+        """Estimate token count for an LLM call.
+        Args:
+            prompt: The prompt text.
+            response: The response text.
+        Returns:
+            Estimated total token count (prompt + response).
+        """
+        return self.estimate_tokens(prompt) + self.estimate_tokens(response)

src/middleware/state_machine.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Thread-safe state management for workflow agents.
+Uses contextvars to ensure isolation between concurrent requests (e.g., multiple users
+searching simultaneously via Gradio). Refactored from MagenticState to support both
+iterative and deep research patterns.
+"""
+from contextvars import ContextVar
+from typing import TYPE_CHECKING, Any
+import structlog
+from pydantic import BaseModel, Field
+from src.utils.models import Citation, Conversation, Evidence
+if TYPE_CHECKING:
+    from src.services.embeddings import EmbeddingService
+logger = structlog.get_logger()
+class WorkflowState(BaseModel):
+    """Mutable state for a workflow session.
+    Supports both iterative and deep research patterns by tracking evidence,
+    conversation history, and providing semantic search capabilities.
+    """
+    evidence: list[Evidence] = Field(default_factory=list)
+    conversation: Conversation = Field(default_factory=Conversation)
+    # Type as Any to avoid circular imports/runtime resolution issues
+    # The actual object injected will be an EmbeddingService instance
+    embedding_service: Any = Field(default=None)
+    model_config = {"arbitrary_types_allowed": True}
+    def add_evidence(self, new_evidence: list[Evidence]) -> int:
+        """Add new evidence, deduplicating by URL.
+        Args:
+            new_evidence: List of Evidence objects to add.
+        Returns:
+            Number of *new* items added (excluding duplicates).
+        """
+        existing_urls = {e.citation.url for e in self.evidence}
+        count = 0
+        for item in new_evidence:
+            if item.citation.url not in existing_urls:
+                self.evidence.append(item)
+                existing_urls.add(item.citation.url)
+                count += 1
+        return count
+    async def search_related(self, query: str, n_results: int = 5) -> list[Evidence]:
+        """Search for semantically related evidence using the embedding service.
+        Args:
+            query: Search query string.
+            n_results: Maximum number of results to return.
+        Returns:
+            List of Evidence objects, ordered by relevance.
+        """
+        if not self.embedding_service:
+            logger.warning("Embedding service not available, returning empty results")
+            return []
+        results = await self.embedding_service.search_similar(query, n_results=n_results)
+        # Convert dict results back to Evidence objects
+        evidence_list = []
+        for item in results:
+            meta = item.get("metadata", {})
+            authors_str = meta.get("authors", "")
+            authors = [a.strip() for a in authors_str.split(",") if a.strip()]
+            ev = Evidence(
+                content=item["content"],
+                citation=Citation(
+                    title=meta.get("title", "Related Evidence"),
+                    url=item["id"],
+                    source="pubmed",  # Defaulting to pubmed if unknown
+                    date=meta.get("date", "n.d."),
+                    authors=authors,
+                ),
+                relevance=max(0.0, 1.0 - item.get("distance", 0.5)),
+            )
+            evidence_list.append(ev)
+        return evidence_list
+# The ContextVar holds the WorkflowState for the current execution context
+_workflow_state_var: ContextVar[WorkflowState | None] = ContextVar("workflow_state", default=None)
+def init_workflow_state(
+    embedding_service: "EmbeddingService | None" = None,
+) -> WorkflowState:
+    """Initialize a new state for the current context.
+    Args:
+        embedding_service: Optional embedding service for semantic search.
+    Returns:
+        The initialized WorkflowState instance.
+    """
+    state = WorkflowState(embedding_service=embedding_service)
+    _workflow_state_var.set(state)
+    logger.debug("Workflow state initialized", has_embeddings=embedding_service is not None)
+    return state
+def get_workflow_state() -> WorkflowState:
+    """Get the current state. Auto-initializes if not set.
+    Returns:
+        The current WorkflowState instance.
+    Raises:
+        RuntimeError: If state is not initialized and auto-initialization fails.
+    """
+    state = _workflow_state_var.get()
+    if state is None:
+        # Auto-initialize if missing (e.g. during tests or simple scripts)
+        logger.debug("Workflow state not found, auto-initializing")
+        return init_workflow_state()
+    return state

src/middleware/workflow_manager.py ADDED Viewed

	@@ -0,0 +1,322 @@

+"""Workflow manager for coordinating parallel research loops.
+Manages multiple research loops running in parallel, tracks their status,
+and synchronizes evidence between loops and the global state.
+"""
+import asyncio
+from collections.abc import Callable
+from typing import Any, Literal
+import structlog
+from pydantic import BaseModel, Field
+from src.middleware.state_machine import get_workflow_state
+from src.utils.models import Evidence
+logger = structlog.get_logger()
+LoopStatus = Literal["pending", "running", "completed", "failed", "cancelled"]
+class ResearchLoop(BaseModel):
+    """Represents a single research loop."""
+    loop_id: str = Field(description="Unique identifier for the loop")
+    query: str = Field(description="The research query for this loop")
+    status: LoopStatus = Field(default="pending")
+    evidence: list[Evidence] = Field(default_factory=list)
+    iteration_count: int = Field(default=0, ge=0)
+    error: str | None = Field(default=None)
+    model_config = {"frozen": False}  # Mutable for status updates
+class WorkflowManager:
+    """Manages parallel research loops and state synchronization."""
+    def __init__(self) -> None:
+        """Initialize the workflow manager."""
+        self._loops: dict[str, ResearchLoop] = {}
+    async def add_loop(self, loop_id: str, query: str) -> ResearchLoop:
+        """Add a new research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            query: The research query for this loop.
+        Returns:
+            The created ResearchLoop instance.
+        """
+        loop = ResearchLoop(loop_id=loop_id, query=query, status="pending")
+        self._loops[loop_id] = loop
+        logger.info("Loop added", loop_id=loop_id, query=query)
+        return loop
+    async def get_loop(self, loop_id: str) -> ResearchLoop | None:
+        """Get a research loop by ID.
+        Args:
+            loop_id: Unique identifier for the loop.
+        Returns:
+            The ResearchLoop instance, or None if not found.
+        """
+        return self._loops.get(loop_id)
+    async def update_loop_status(
+        self, loop_id: str, status: LoopStatus, error: str | None = None
+    ) -> None:
+        """Update the status of a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            status: New status for the loop.
+            error: Optional error message if status is "failed".
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].status = status
+        if error:
+            self._loops[loop_id].error = error
+        logger.info("Loop status updated", loop_id=loop_id, status=status)
+    async def add_loop_evidence(self, loop_id: str, evidence: list[Evidence]) -> None:
+        """Add evidence to a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+            evidence: List of Evidence objects to add.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].evidence.extend(evidence)
+        logger.debug(
+            "Evidence added to loop",
+            loop_id=loop_id,
+            evidence_count=len(evidence),
+        )
+    async def increment_loop_iteration(self, loop_id: str) -> None:
+        """Increment the iteration count for a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        self._loops[loop_id].iteration_count += 1
+        logger.debug(
+            "Iteration incremented",
+            loop_id=loop_id,
+            iteration=self._loops[loop_id].iteration_count,
+        )
+    async def run_loops_parallel(
+        self,
+        loop_configs: list[dict[str, Any]],
+        loop_func: Callable[[dict[str, Any]], Any],
+        judge_handler: Any | None = None,
+        budget_tracker: Any | None = None,
+    ) -> list[Any]:
+        """Run multiple research loops in parallel.
+        Args:
+            loop_configs: List of configuration dicts, each must contain 'loop_id' and 'query'.
+            loop_func: Async function that takes a config dict and returns loop results.
+            judge_handler: Optional JudgeHandler for early termination based on evidence sufficiency.
+            budget_tracker: Optional BudgetTracker for budget enforcement.
+        Returns:
+            List of results from each loop (in order of completion, not original order).
+        """
+        logger.info("Starting parallel loops", loop_count=len(loop_configs))
+        # Create loops
+        for config in loop_configs:
+            loop_id = config.get("loop_id")
+            query = config.get("query", "")
+            if loop_id:
+                await self.add_loop(loop_id, query)
+                await self.update_loop_status(loop_id, "running")
+        # Run loops in parallel
+        async def run_single_loop(config: dict[str, Any]) -> Any:
+            loop_id = config.get("loop_id", "unknown")
+            query = config.get("query", "")
+            try:
+                # Check budget before starting
+                if budget_tracker:
+                    exceeded, reason = budget_tracker.check_budget(loop_id)
+                    if exceeded:
+                        await self.update_loop_status(loop_id, "cancelled", error=reason)
+                        logger.warning(
+                            "Loop cancelled due to budget", loop_id=loop_id, reason=reason
+                        )
+                        return None
+                # If loop_func supports periodic checkpoints, we could check judge here
+                # For now, the loop_func itself handles judge checks internally
+                result = await loop_func(config)
+                # Final check with judge if available
+                if judge_handler and query:
+                    should_complete, reason = await self.check_loop_completion(
+                        loop_id, query, judge_handler
+                    )
+                    if should_complete:
+                        logger.info(
+                            "Loop completed early based on judge assessment",
+                            loop_id=loop_id,
+                            reason=reason,
+                        )
+                await self.update_loop_status(loop_id, "completed")
+                return result
+            except Exception as e:
+                error_msg = str(e)
+                await self.update_loop_status(loop_id, "failed", error=error_msg)
+                logger.error("Loop failed", loop_id=loop_id, error=error_msg)
+                raise
+        results = await asyncio.gather(
+            *(run_single_loop(config) for config in loop_configs),
+            return_exceptions=True,
+        )
+        # Log completion
+        completed = sum(1 for r in results if not isinstance(r, Exception))
+        failed = len(results) - completed
+        logger.info(
+            "Parallel loops completed",
+            total=len(loop_configs),
+            completed=completed,
+            failed=failed,
+        )
+        return results
+    async def wait_for_loops(
+        self, loop_ids: list[str], timeout: float | None = None
+    ) -> list[ResearchLoop]:
+        """Wait for loops to complete.
+        Args:
+            loop_ids: List of loop IDs to wait for.
+            timeout: Optional timeout in seconds.
+        Returns:
+            List of ResearchLoop instances (may be incomplete if timeout occurs).
+        """
+        start_time = asyncio.get_event_loop().time()
+        while True:
+            loops = [self._loops.get(loop_id) for loop_id in loop_ids]
+            all_complete = all(
+                loop and loop.status in ("completed", "failed", "cancelled") for loop in loops
+            )
+            if all_complete:
+                return [loop for loop in loops if loop is not None]
+            if timeout is not None:
+                elapsed = asyncio.get_event_loop().time() - start_time
+                if elapsed >= timeout:
+                    logger.warning("Timeout waiting for loops", timeout=timeout)
+                    return [loop for loop in loops if loop is not None]
+            await asyncio.sleep(0.1)  # Small delay to avoid busy waiting
+    async def cancel_loop(self, loop_id: str) -> None:
+        """Cancel a research loop.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        await self.update_loop_status(loop_id, "cancelled")
+        logger.info("Loop cancelled", loop_id=loop_id)
+    async def get_all_loops(self) -> list[ResearchLoop]:
+        """Get all research loops.
+        Returns:
+            List of all ResearchLoop instances.
+        """
+        return list(self._loops.values())
+    async def sync_loop_evidence_to_state(self, loop_id: str) -> None:
+        """Synchronize evidence from a loop to the global state.
+        Args:
+            loop_id: Unique identifier for the loop.
+        """
+        if loop_id not in self._loops:
+            logger.warning("Loop not found", loop_id=loop_id)
+            return
+        loop = self._loops[loop_id]
+        state = get_workflow_state()
+        added_count = state.add_evidence(loop.evidence)
+        logger.debug(
+            "Loop evidence synced to state",
+            loop_id=loop_id,
+            evidence_count=len(loop.evidence),
+            added_count=added_count,
+        )
+    async def get_shared_evidence(self) -> list[Evidence]:
+        """Get evidence from the global state.
+        Returns:
+            List of Evidence objects from the global state.
+        """
+        state = get_workflow_state()
+        return state.evidence
+    async def get_loop_evidence(self, loop_id: str) -> list[Evidence]:
+        """Get evidence collected by a specific loop.
+        Args:
+            loop_id: Loop identifier.
+        Returns:
+            List of Evidence objects from the loop.
+        """
+        if loop_id not in self._loops:
+            return []
+        return self._loops[loop_id].evidence
+    async def check_loop_completion(
+        self, loop_id: str, query: str, judge_handler: Any
+    ) -> tuple[bool, str]:
+        """Check if a loop should complete using judge assessment.
+        Args:
+            loop_id: Loop identifier.
+            query: Research query.
+            judge_handler: JudgeHandler instance.
+        Returns:
+            Tuple of (should_complete: bool, reason: str).
+        """
+        evidence = await self.get_loop_evidence(loop_id)
+        if not evidence:
+            return False, "No evidence collected yet"
+        try:
+            assessment = await judge_handler.assess(query, evidence)
+            if assessment.sufficient:
+                return True, f"Judge assessment: {assessment.reasoning}"
+            return False, f"Judge assessment: {assessment.reasoning}"
+        except Exception as e:
+            logger.error("Judge assessment failed", error=str(e), loop_id=loop_id)
+            return False, f"Judge assessment failed: {e!s}"

src/orchestrator/__init__.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""Orchestrator module for research flows and planner agent.
+This module provides:
+- PlannerAgent: Creates report plans with sections
+- IterativeResearchFlow: Single research loop pattern
+- DeepResearchFlow: Parallel research loops pattern
+- GraphOrchestrator: Stub for Phase 4 (uses agent chains for now)
+- Protocols: SearchHandlerProtocol, JudgeHandlerProtocol (re-exported from legacy_orchestrator)
+- Orchestrator: Legacy orchestrator class (re-exported from legacy_orchestrator)
+"""
+from typing import TYPE_CHECKING
+# Re-export protocols and Orchestrator from legacy_orchestrator for backward compatibility
+from src.legacy_orchestrator import (
+    JudgeHandlerProtocol,
+    Orchestrator,
+    SearchHandlerProtocol,
+)
+# Lazy imports to avoid circular dependencies
+if TYPE_CHECKING:
+    from src.orchestrator.graph_orchestrator import GraphOrchestrator
+    from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
+    from src.orchestrator.research_flow import (
+        DeepResearchFlow,
+        IterativeResearchFlow,
+    )
+# Public exports
+from src.orchestrator.graph_orchestrator import (
+    GraphOrchestrator,
+    create_graph_orchestrator,
+)
+from src.orchestrator.planner_agent import PlannerAgent, create_planner_agent
+from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+__all__ = [
+    "PlannerAgent",
+    "create_planner_agent",
+    "IterativeResearchFlow",
+    "DeepResearchFlow",
+    "GraphOrchestrator",
+    "create_graph_orchestrator",
+    "SearchHandlerProtocol",
+    "JudgeHandlerProtocol",
+    "Orchestrator",
+]

src/orchestrator/graph_orchestrator.py ADDED Viewed

	@@ -0,0 +1,953 @@

+"""Graph orchestrator for Phase 4.
+Implements graph-based orchestration using Pydantic AI agents as nodes.
+Supports both iterative and deep research patterns with parallel execution.
+"""
+import asyncio
+from collections.abc import AsyncGenerator, Callable
+from typing import TYPE_CHECKING, Any, Literal
+import structlog
+from src.agent_factory.agents import (
+    create_input_parser_agent,
+    create_knowledge_gap_agent,
+    create_long_writer_agent,
+    create_planner_agent,
+    create_thinking_agent,
+    create_tool_selector_agent,
+    create_writer_agent,
+)
+from src.agent_factory.graph_builder import (
+    AgentNode,
+    DecisionNode,
+    ParallelNode,
+    ResearchGraph,
+    StateNode,
+    create_deep_graph,
+    create_iterative_graph,
+)
+from src.middleware.budget_tracker import BudgetTracker
+from src.middleware.state_machine import WorkflowState, init_workflow_state
+from src.orchestrator.research_flow import DeepResearchFlow, IterativeResearchFlow
+from src.utils.models import AgentEvent
+if TYPE_CHECKING:
+    pass
+logger = structlog.get_logger()
+class GraphExecutionContext:
+    """Context for managing graph execution state."""
+    def __init__(self, state: WorkflowState, budget_tracker: BudgetTracker) -> None:
+        """Initialize execution context.
+        Args:
+            state: Current workflow state
+            budget_tracker: Budget tracker instance
+        """
+        self.current_node: str = ""
+        self.visited_nodes: set[str] = set()
+        self.node_results: dict[str, Any] = {}
+        self.state = state
+        self.budget_tracker = budget_tracker
+        self.iteration_count = 0
+    def set_node_result(self, node_id: str, result: Any) -> None:
+        """Store result from node execution.
+        Args:
+            node_id: The node ID
+            result: The execution result
+        """
+        self.node_results[node_id] = result
+    def get_node_result(self, node_id: str) -> Any:
+        """Get result from node execution.
+        Args:
+            node_id: The node ID
+        Returns:
+            The stored result, or None if not found
+        """
+        return self.node_results.get(node_id)
+    def has_visited(self, node_id: str) -> bool:
+        """Check if node was visited.
+        Args:
+            node_id: The node ID
+        Returns:
+            True if visited, False otherwise
+        """
+        return node_id in self.visited_nodes
+    def mark_visited(self, node_id: str) -> None:
+        """Mark node as visited.
+        Args:
+            node_id: The node ID
+        """
+        self.visited_nodes.add(node_id)
+    def update_state(
+        self, updater: Callable[[WorkflowState, Any], WorkflowState], data: Any
+    ) -> None:
+        """Update workflow state.
+        Args:
+            updater: Function to update state
+            data: Data to pass to updater
+        """
+        self.state = updater(self.state, data)
+class GraphOrchestrator:
+    """
+    Graph orchestrator using Pydantic AI Graphs.
+    Executes research workflows as graphs with nodes (agents) and edges (transitions).
+    Supports parallel execution, conditional routing, and state management.
+    """
+    def __init__(
+        self,
+        mode: Literal["iterative", "deep", "auto"] = "auto",
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        use_graph: bool = True,
+    ) -> None:
+        """
+        Initialize graph orchestrator.
+        Args:
+            mode: Research mode ("iterative", "deep", or "auto" to detect)
+            max_iterations: Maximum iterations per loop
+            max_time_minutes: Maximum time per loop
+            use_graph: Whether to use graph execution (True) or agent chains (False)
+        """
+        self.mode = mode
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize flows (for backward compatibility)
+        self._iterative_flow: IterativeResearchFlow | None = None
+        self._deep_flow: DeepResearchFlow | None = None
+        # Graph execution components (lazy initialization)
+        self._graph: ResearchGraph | None = None
+        self._budget_tracker: BudgetTracker | None = None
+    async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
+        """
+        Run the research workflow.
+        Args:
+            query: The user's research query
+        Yields:
+            AgentEvent objects for real-time UI updates
+        """
+        self.logger.info(
+            "Starting graph orchestrator",
+            query=query[:100],
+            mode=self.mode,
+            use_graph=self.use_graph,
+        )
+        yield AgentEvent(
+            type="started",
+            message=f"Starting research ({self.mode} mode): {query}",
+            iteration=0,
+        )
+        try:
+            # Determine research mode
+            research_mode = self.mode
+            if research_mode == "auto":
+                research_mode = await self._detect_research_mode(query)
+            # Use graph execution if enabled, otherwise fall back to agent chains
+            if self.use_graph:
+                async for event in self._run_with_graph(query, research_mode):
+                    yield event
+            else:
+                async for event in self._run_with_chains(query, research_mode):
+                    yield event
+        except Exception as e:
+            self.logger.error("Graph orchestrator failed", error=str(e), exc_info=True)
+            yield AgentEvent(
+                type="error",
+                message=f"Research failed: {e!s}",
+                iteration=0,
+            )
+    async def _run_with_graph(
+        self, query: str, research_mode: Literal["iterative", "deep"]
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Run workflow using graph execution.
+        Args:
+            query: The research query
+            research_mode: The research mode
+        Yields:
+            AgentEvent objects
+        """
+        # Initialize state and budget tracker
+        from src.services.embeddings import get_embedding_service
+        embedding_service = get_embedding_service()
+        state = init_workflow_state(embedding_service=embedding_service)
+        budget_tracker = BudgetTracker()
+        budget_tracker.create_budget(
+            loop_id="graph_execution",
+            tokens_limit=100000,
+            time_limit_seconds=self.max_time_minutes * 60,
+            iterations_limit=self.max_iterations,
+        )
+        budget_tracker.start_timer("graph_execution")
+        context = GraphExecutionContext(state, budget_tracker)
+        # Build graph
+        self._graph = await self._build_graph(research_mode)
+        # Execute graph
+        async for event in self._execute_graph(query, context):
+            yield event
+    async def _run_with_chains(
+        self, query: str, research_mode: Literal["iterative", "deep"]
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Run workflow using agent chains (backward compatibility).
+        Args:
+            query: The research query
+            research_mode: The research mode
+        Yields:
+            AgentEvent objects
+        """
+        if research_mode == "iterative":
+            yield AgentEvent(
+                type="searching",
+                message="Running iterative research flow...",
+                iteration=1,
+            )
+            if self._iterative_flow is None:
+                self._iterative_flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                )
+            final_report = await self._iterative_flow.run(query)
+            yield AgentEvent(
+                type="complete",
+                message=final_report,
+                data={"mode": "iterative"},
+                iteration=1,
+            )
+        elif research_mode == "deep":
+            yield AgentEvent(
+                type="searching",
+                message="Running deep research flow...",
+                iteration=1,
+            )
+            if self._deep_flow is None:
+                self._deep_flow = DeepResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                )
+            final_report = await self._deep_flow.run(query)
+            yield AgentEvent(
+                type="complete",
+                message=final_report,
+                data={"mode": "deep"},
+                iteration=1,
+            )
+    async def _build_graph(self, mode: Literal["iterative", "deep"]) -> ResearchGraph:
+        """Build graph for the specified mode.
+        Args:
+            mode: Research mode
+        Returns:
+            Constructed ResearchGraph
+        """
+        if mode == "iterative":
+            # Get agents
+            knowledge_gap_agent = create_knowledge_gap_agent()
+            tool_selector_agent = create_tool_selector_agent()
+            thinking_agent = create_thinking_agent()
+            writer_agent = create_writer_agent()
+            # Create graph
+            graph = create_iterative_graph(
+                knowledge_gap_agent=knowledge_gap_agent.agent,
+                tool_selector_agent=tool_selector_agent.agent,
+                thinking_agent=thinking_agent.agent,
+                writer_agent=writer_agent.agent,
+            )
+        else:  # deep
+            # Get agents
+            planner_agent = create_planner_agent()
+            knowledge_gap_agent = create_knowledge_gap_agent()
+            tool_selector_agent = create_tool_selector_agent()
+            thinking_agent = create_thinking_agent()
+            writer_agent = create_writer_agent()
+            long_writer_agent = create_long_writer_agent()
+            # Create graph
+            graph = create_deep_graph(
+                planner_agent=planner_agent.agent,
+                knowledge_gap_agent=knowledge_gap_agent.agent,
+                tool_selector_agent=tool_selector_agent.agent,
+                thinking_agent=thinking_agent.agent,
+                writer_agent=writer_agent.agent,
+                long_writer_agent=long_writer_agent.agent,
+            )
+        return graph
+    def _emit_start_event(
+        self, node: Any, current_node_id: str, iteration: int, context: GraphExecutionContext
+    ) -> AgentEvent:
+        """Emit start event for a node.
+        Args:
+            node: The node being executed
+            current_node_id: Current node ID
+            iteration: Current iteration number
+            context: Execution context
+        Returns:
+            AgentEvent for the start of node execution
+        """
+        if node and node.node_id == "planner":
+            return AgentEvent(
+                type="searching",
+                message="Creating report plan...",
+                iteration=iteration,
+            )
+        elif node and node.node_id == "parallel_loops":
+            # Get report plan to show section count
+            report_plan = context.get_node_result("planner")
+            if report_plan and hasattr(report_plan, "report_outline"):
+                section_count = len(report_plan.report_outline)
+                return AgentEvent(
+                    type="looping",
+                    message=f"Running parallel research loops for {section_count} sections...",
+                    iteration=iteration,
+                    data={"sections": section_count},
+                )
+            return AgentEvent(
+                type="looping",
+                message="Running parallel research loops...",
+                iteration=iteration,
+            )
+        elif node and node.node_id == "synthesizer":
+            return AgentEvent(
+                type="synthesizing",
+                message="Synthesizing final report from section drafts...",
+                iteration=iteration,
+            )
+        return AgentEvent(
+            type="looping",
+            message=f"Executing node: {current_node_id}",
+            iteration=iteration,
+        )
+    def _emit_completion_event(
+        self, node: Any, current_node_id: str, result: Any, iteration: int
+    ) -> AgentEvent:
+        """Emit completion event for a node.
+        Args:
+            node: The node that was executed
+            current_node_id: Current node ID
+            result: Node execution result
+            iteration: Current iteration number
+        Returns:
+            AgentEvent for the completion of node execution
+        """
+        if not node:
+            return AgentEvent(
+                type="looping",
+                message=f"Completed node: {current_node_id}",
+                iteration=iteration,
+            )
+        if node.node_id == "planner":
+            if isinstance(result, dict) and "report_outline" in result:
+                section_count = len(result["report_outline"])
+                return AgentEvent(
+                    type="search_complete",
+                    message=f"Report plan created with {section_count} sections",
+                    iteration=iteration,
+                    data={"sections": section_count},
+                )
+            return AgentEvent(
+                type="search_complete",
+                message="Report plan created",
+                iteration=iteration,
+            )
+        elif node.node_id == "parallel_loops":
+            if isinstance(result, list):
+                return AgentEvent(
+                    type="search_complete",
+                    message=f"Completed parallel research for {len(result)} sections",
+                    iteration=iteration,
+                    data={"sections_completed": len(result)},
+                )
+            return AgentEvent(
+                type="search_complete",
+                message="Parallel research loops completed",
+                iteration=iteration,
+            )
+        elif node.node_id == "synthesizer":
+            return AgentEvent(
+                type="synthesizing",
+                message="Final report synthesis completed",
+                iteration=iteration,
+            )
+        return AgentEvent(
+            type="searching" if node.node_type == "agent" else "looping",
+            message=f"Completed {node.node_type} node: {current_node_id}",
+            iteration=iteration,
+        )
+    async def _execute_graph(
+        self, query: str, context: GraphExecutionContext
+    ) -> AsyncGenerator[AgentEvent, None]:
+        """Execute the graph from entry node.
+        Args:
+            query: The research query
+            context: Execution context
+        Yields:
+            AgentEvent objects
+        """
+        if not self._graph:
+            raise ValueError("Graph not built")
+        current_node_id = self._graph.entry_node
+        iteration = 0
+        while current_node_id and current_node_id not in self._graph.exit_nodes:
+            # Check budget
+            if not context.budget_tracker.can_continue("graph_execution"):
+                self.logger.warning("Budget exceeded, exiting graph execution")
+                break
+            # Execute current node
+            iteration += 1
+            context.current_node = current_node_id
+            node = self._graph.get_node(current_node_id)
+            # Emit start event
+            yield self._emit_start_event(node, current_node_id, iteration, context)
+            try:
+                result = await self._execute_node(current_node_id, query, context)
+                context.set_node_result(current_node_id, result)
+                context.mark_visited(current_node_id)
+                # Yield completion event
+                yield self._emit_completion_event(node, current_node_id, result, iteration)
+            except Exception as e:
+                self.logger.error("Node execution failed", node_id=current_node_id, error=str(e))
+                yield AgentEvent(
+                    type="error",
+                    message=f"Node {current_node_id} failed: {e!s}",
+                    iteration=iteration,
+                )
+                break
+            # Get next node(s)
+            next_nodes = self._get_next_node(current_node_id, context)
+            if not next_nodes:
+                # No more nodes, check if we're at exit
+                if current_node_id in self._graph.exit_nodes:
+                    break
+                # Otherwise, we've reached a dead end
+                self.logger.warning("Reached dead end in graph", node_id=current_node_id)
+                break
+            current_node_id = next_nodes[0]  # For now, take first next node (handle parallel later)
+        # Final event
+        final_result = context.get_node_result(current_node_id) if current_node_id else None
+        yield AgentEvent(
+            type="complete",
+            message=final_result if isinstance(final_result, str) else "Research completed",
+            data={"mode": self.mode, "iterations": iteration},
+            iteration=iteration,
+        )
+    async def _execute_node(self, node_id: str, query: str, context: GraphExecutionContext) -> Any:
+        """Execute a single node.
+        Args:
+            node_id: The node ID
+            query: The research query
+            context: Execution context
+        Returns:
+            Node execution result
+        """
+        if not self._graph:
+            raise ValueError("Graph not built")
+        node = self._graph.get_node(node_id)
+        if not node:
+            raise ValueError(f"Node {node_id} not found")
+        if isinstance(node, AgentNode):
+            return await self._execute_agent_node(node, query, context)
+        elif isinstance(node, StateNode):
+            return await self._execute_state_node(node, query, context)
+        elif isinstance(node, DecisionNode):
+            return await self._execute_decision_node(node, query, context)
+        elif isinstance(node, ParallelNode):
+            return await self._execute_parallel_node(node, query, context)
+        else:
+            raise ValueError(f"Unknown node type: {type(node)}")
+    async def _execute_agent_node(
+        self, node: AgentNode, query: str, context: GraphExecutionContext
+    ) -> Any:
+        """Execute an agent node.
+        Special handling for deep research nodes:
+        - "planner": Takes query string, returns ReportPlan
+        - "synthesizer": Takes query + ReportPlan + section drafts, returns final report
+        Args:
+            node: The agent node
+            query: The research query
+            context: Execution context
+        Returns:
+            Agent execution result
+        """
+        # Special handling for synthesizer node
+        if node.node_id == "synthesizer":
+            # Call LongWriterAgent.write_report() directly instead of using agent.run()
+            from src.agent_factory.agents import create_long_writer_agent
+            from src.utils.models import ReportDraft, ReportDraftSection, ReportPlan
+            report_plan = context.get_node_result("planner")
+            section_drafts = context.get_node_result("parallel_loops") or []
+            if not isinstance(report_plan, ReportPlan):
+                raise ValueError("ReportPlan not found for synthesizer")
+            if not section_drafts:
+                raise ValueError("Section drafts not found for synthesizer")
+            # Create ReportDraft from section drafts
+            report_draft = ReportDraft(
+                sections=[
+                    ReportDraftSection(
+                        section_title=section.title,
+                        section_content=draft,
+                    )
+                    for section, draft in zip(
+                        report_plan.report_outline, section_drafts, strict=False
+                    )
+                ]
+            )
+            # Get LongWriterAgent instance and call write_report directly
+            long_writer_agent = create_long_writer_agent()
+            final_report = await long_writer_agent.write_report(
+                original_query=query,
+                report_title=report_plan.report_title,
+                report_draft=report_draft,
+            )
+            # Estimate tokens (rough estimate)
+            estimated_tokens = len(final_report) // 4  # Rough token estimate
+            context.budget_tracker.add_tokens("graph_execution", estimated_tokens)
+            return final_report
+        # Standard agent execution
+        # Prepare input based on node type
+        if node.node_id == "planner":
+            # Planner takes the original query
+            input_data = query
+        else:
+            # Standard: use previous node result or query
+            prev_result = context.get_node_result(context.current_node)
+            input_data = prev_result if prev_result is not None else query
+        # Apply input transformer if provided
+        if node.input_transformer:
+            input_data = node.input_transformer(input_data)
+        # Execute agent
+        result = await node.agent.run(input_data)
+        # Transform output if needed
+        output = result.output
+        if node.output_transformer:
+            output = node.output_transformer(output)
+        # Estimate and track tokens
+        if hasattr(result, "usage") and result.usage:
+            tokens = result.usage.total_tokens if hasattr(result.usage, "total_tokens") else 0
+            context.budget_tracker.add_tokens("graph_execution", tokens)
+        return output
+    async def _execute_state_node(
+        self, node: StateNode, query: str, context: GraphExecutionContext
+    ) -> Any:
+        """Execute a state node.
+        Special handling for deep research state nodes:
+        - "store_plan": Stores ReportPlan in context for parallel loops
+        - "collect_drafts": Stores section drafts in context for synthesizer
+        Args:
+            node: The state node
+            query: The research query
+            context: Execution context
+        Returns:
+            State update result
+        """
+        # Get previous result for state update
+        # For "store_plan", get from planner node
+        # For "collect_drafts", get from parallel_loops node
+        if node.node_id == "store_plan":
+            prev_result = context.get_node_result("planner")
+        elif node.node_id == "collect_drafts":
+            prev_result = context.get_node_result("parallel_loops")
+        else:
+            prev_result = context.get_node_result(context.current_node)
+        # Update state
+        updated_state = node.state_updater(context.state, prev_result)
+        context.state = updated_state
+        # Store result in context for next nodes to access
+        context.set_node_result(node.node_id, prev_result)
+        # Read state if needed
+        if node.state_reader:
+            return node.state_reader(context.state)
+        return prev_result  # Return the stored result for next nodes
+    async def _execute_decision_node(
+        self, node: DecisionNode, query: str, context: GraphExecutionContext
+    ) -> str:
+        """Execute a decision node.
+        Args:
+            node: The decision node
+            query: The research query
+            context: Execution context
+        Returns:
+            Next node ID
+        """
+        # Get previous result for decision
+        prev_result = context.get_node_result(context.current_node)
+        # Make decision
+        next_node_id = node.decision_function(prev_result)
+        # Validate decision
+        if next_node_id not in node.options:
+            self.logger.warning(
+                "Decision function returned invalid node",
+                node_id=node.node_id,
+                returned=next_node_id,
+                options=node.options,
+            )
+            # Default to first option
+            next_node_id = node.options[0]
+        return next_node_id
+    async def _execute_parallel_node(
+        self, node: ParallelNode, query: str, context: GraphExecutionContext
+    ) -> list[Any]:
+        """Execute a parallel node.
+        Special handling for deep research "parallel_loops" node:
+        - Extracts report plan from previous node result
+        - Creates IterativeResearchFlow instances for each section
+        - Executes them in parallel
+        - Returns section drafts
+        Args:
+            node: The parallel node
+            query: The research query
+            context: Execution context
+        Returns:
+            List of results from parallel nodes
+        """
+        # Special handling for deep research parallel_loops node
+        if node.node_id == "parallel_loops":
+            return await self._execute_deep_research_parallel_loops(node, query, context)
+        # Standard parallel node execution
+        # Execute all parallel nodes concurrently
+        tasks = [
+            self._execute_node(parallel_node_id, query, context)
+            for parallel_node_id in node.parallel_nodes
+        ]
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+        # Handle exceptions
+        for i, result in enumerate(results):
+            if isinstance(result, Exception):
+                self.logger.error(
+                    "Parallel node execution failed",
+                    node_id=node.parallel_nodes[i] if i < len(node.parallel_nodes) else "unknown",
+                    error=str(result),
+                )
+                results[i] = None
+        # Aggregate if needed
+        if node.aggregator:
+            aggregated = node.aggregator(results)
+            # Type cast: aggregator returns Any, but we expect list[Any]
+            return list(aggregated) if isinstance(aggregated, list) else [aggregated]
+        return results
+    async def _execute_deep_research_parallel_loops(
+        self, node: ParallelNode, query: str, context: GraphExecutionContext
+    ) -> list[str]:
+        """Execute parallel iterative research loops for deep research.
+        Args:
+            node: The parallel node (should be "parallel_loops")
+            query: The research query
+            context: Execution context
+        Returns:
+            List of section draft strings
+        """
+        from src.agent_factory.judges import create_judge_handler
+        from src.orchestrator.research_flow import IterativeResearchFlow
+        from src.utils.models import ReportPlan
+        # Get report plan from previous node (store_plan)
+        # The plan should be stored in context.node_results from the planner node
+        planner_result = context.get_node_result("planner")
+        if not isinstance(planner_result, ReportPlan):
+            self.logger.error(
+                "Planner result is not a ReportPlan",
+                type=type(planner_result),
+            )
+            raise ValueError("Planner must return ReportPlan for deep research")
+        report_plan: ReportPlan = planner_result
+        self.logger.info(
+            "Executing parallel loops for deep research",
+            sections=len(report_plan.report_outline),
+        )
+        # Create judge handler for iterative flows
+        judge_handler = create_judge_handler()
+        # Create and execute iterative research flows for each section
+        async def run_section_research(section_index: int) -> str:
+            """Run iterative research for a single section."""
+            section = report_plan.report_outline[section_index]
+            try:
+                # Create iterative research flow
+                flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                    verbose=False,  # Less verbose in parallel execution
+                    use_graph=False,  # Use agent chains for section research
+                    judge_handler=judge_handler,
+                )
+                # Run research for this section
+                section_draft = await flow.run(
+                    query=section.key_question,
+                    background_context=report_plan.background_context,
+                )
+                self.logger.info(
+                    "Section research completed",
+                    section_index=section_index,
+                    section_title=section.title,
+                    draft_length=len(section_draft),
+                )
+                return section_draft
+            except Exception as e:
+                self.logger.error(
+                    "Section research failed",
+                    section_index=section_index,
+                    section_title=section.title,
+                    error=str(e),
+                )
+                # Return empty string for failed sections
+                return f"# {section.title}\n\n[Research failed: {e!s}]"
+        # Execute all sections in parallel
+        section_drafts = await asyncio.gather(
+            *(run_section_research(i) for i in range(len(report_plan.report_outline))),
+            return_exceptions=True,
+        )
+        # Handle exceptions and filter None results
+        filtered_drafts: list[str] = []
+        for i, draft in enumerate(section_drafts):
+            if isinstance(draft, Exception):
+                self.logger.error(
+                    "Section research exception",
+                    section_index=i,
+                    error=str(draft),
+                )
+                filtered_drafts.append(
+                    f"# {report_plan.report_outline[i].title}\n\n[Research failed: {draft!s}]"
+                )
+            elif draft is not None:
+                # Type narrowing: after Exception check, draft is str | None
+                assert isinstance(draft, str), "Expected str after Exception check"
+                filtered_drafts.append(draft)
+        self.logger.info(
+            "Parallel loops completed",
+            sections=len(filtered_drafts),
+            total_sections=len(report_plan.report_outline),
+        )
+        return filtered_drafts
+    def _get_next_node(self, node_id: str, context: GraphExecutionContext) -> list[str]:
+        """Get next node(s) from current node.
+        Args:
+            node_id: Current node ID
+            context: Execution context
+        Returns:
+            List of next node IDs
+        """
+        if not self._graph:
+            return []
+        # Get node result for condition evaluation
+        node_result = context.get_node_result(node_id)
+        # Get next nodes
+        next_nodes = self._graph.get_next_nodes(node_id, context=node_result)
+        # If this was a decision node, use its result
+        node = self._graph.get_node(node_id)
+        if isinstance(node, DecisionNode):
+            decision_result = node_result
+            if isinstance(decision_result, str):
+                return [decision_result]
+        # Return next node IDs
+        return [next_node_id for next_node_id, _ in next_nodes]
+    async def _detect_research_mode(self, query: str) -> Literal["iterative", "deep"]:
+        """
+        Detect research mode from query using input parser agent.
+        Uses input parser agent to analyze query and determine research mode.
+        Falls back to heuristic if parser fails.
+        Args:
+            query: The research query
+        Returns:
+            Detected research mode
+        """
+        try:
+            # Use input parser agent for intelligent mode detection
+            input_parser = create_input_parser_agent()
+            parsed_query = await input_parser.parse(query)
+            self.logger.info(
+                "Research mode detected by input parser",
+                mode=parsed_query.research_mode,
+                query=query[:100],
+            )
+            return parsed_query.research_mode
+        except Exception as e:
+            # Fallback to heuristic if parser fails
+            self.logger.warning(
+                "Input parser failed, using heuristic",
+                error=str(e),
+                query=query[:100],
+            )
+            query_lower = query.lower()
+            if any(
+                keyword in query_lower
+                for keyword in [
+                    "section",
+                    "sections",
+                    "report",
+                    "outline",
+                    "structure",
+                    "comprehensive",
+                    "analyze",
+                    "analysis",
+                ]
+            ):
+                return "deep"
+            return "iterative"
+def create_graph_orchestrator(
+    mode: Literal["iterative", "deep", "auto"] = "auto",
+    max_iterations: int = 5,
+    max_time_minutes: int = 10,
+    use_graph: bool = True,
+) -> GraphOrchestrator:
+    """
+    Factory function to create a graph orchestrator.
+    Args:
+        mode: Research mode
+        max_iterations: Maximum iterations per loop
+        max_time_minutes: Maximum time per loop
+        use_graph: Whether to use graph execution (True) or agent chains (False)
+    Returns:
+        Configured GraphOrchestrator instance
+    """
+    return GraphOrchestrator(
+        mode=mode,
+        max_iterations=max_iterations,
+        max_time_minutes=max_time_minutes,
+        use_graph=use_graph,
+    )

src/orchestrator/planner_agent.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""Planner agent for creating report plans with sections and background context.
+Converts the folder/planner_agent.py implementation to use Pydantic AI.
+"""
+from datetime import datetime
+from typing import Any
+import structlog
+from pydantic_ai import Agent
+from src.agent_factory.judges import get_model
+from src.tools.crawl_adapter import crawl_website
+from src.tools.web_search_adapter import web_search
+from src.utils.exceptions import ConfigurationError, JudgeError
+from src.utils.models import ReportPlan, ReportPlanSection
+logger = structlog.get_logger()
+# System prompt for the planner agent
+SYSTEM_PROMPT = f"""
+You are a research manager, managing a team of research agents. Today's date is {datetime.now().strftime("%Y-%m-%d")}.
+Given a research query, your job is to produce an initial outline of the report (section titles and key questions),
+as well as some background context. Each section will be assigned to a different researcher in your team who will then
+carry out research on the section.
+You will be given:
+- An initial research query
+Your task is to:
+1. Produce 1-2 paragraphs of initial background context (if needed) on the query by running web searches or crawling websites
+2. Produce an outline of the report that includes a list of section titles and the key question to be addressed in each section
+3. Provide a title for the report that will be used as the main heading
+Guidelines:
+- Each section should cover a single topic/question that is independent of other sections
+- The key question for each section should include both the NAME and DOMAIN NAME / WEBSITE (if available and applicable) if it is related to a company, product or similar
+- The background_context should not be more than 2 paragraphs
+- The background_context should be very specific to the query and include any information that is relevant for researchers across all sections of the report
+- The background_context should be drawn only from web search or crawl results rather than prior knowledge (i.e. it should only be included if you have called tools)
+- For example, if the query is about a company, the background context should include some basic information about what the company does
+- DO NOT do more than 2 tool calls
+Only output JSON. Follow the JSON schema for ReportPlan. Do not output anything else.
+"""
+class PlannerAgent:
+    """
+    Planner agent that creates report plans with sections and background context.
+    Uses Pydantic AI to generate structured ReportPlan output with optional
+    web search and crawl tool usage for background context.
+    """
+    def __init__(
+        self,
+        model: Any | None = None,
+        web_search_tool: Any | None = None,
+        crawl_tool: Any | None = None,
+    ) -> None:
+        """
+        Initialize the planner agent.
+        Args:
+            model: Optional Pydantic AI model. If None, uses config default.
+            web_search_tool: Optional web search tool function. If None, uses default.
+            crawl_tool: Optional crawl tool function. If None, uses default.
+        """
+        self.model = model or get_model()
+        self.web_search_tool = web_search_tool or web_search
+        self.crawl_tool = crawl_tool or crawl_website
+        self.logger = logger
+        # Validate tools are callable
+        if not callable(self.web_search_tool):
+            raise ConfigurationError("web_search_tool must be callable")
+        if not callable(self.crawl_tool):
+            raise ConfigurationError("crawl_tool must be callable")
+        # Initialize Pydantic AI Agent
+        self.agent = Agent(
+            model=self.model,
+            output_type=ReportPlan,
+            system_prompt=SYSTEM_PROMPT,
+            tools=[self.web_search_tool, self.crawl_tool],
+            retries=3,
+        )
+    async def run(self, query: str) -> ReportPlan:
+        """
+        Run the planner agent to generate a report plan.
+        Args:
+            query: The user's research query
+        Returns:
+            ReportPlan with sections, background context, and report title
+        Raises:
+            JudgeError: If planning fails after retries
+            ConfigurationError: If agent configuration is invalid
+        """
+        self.logger.info("Starting report planning", query=query[:100])
+        user_message = f"QUERY: {query}"
+        try:
+            # Run the agent
+            result = await self.agent.run(user_message)
+            report_plan = result.output
+            # Validate report plan
+            if not report_plan.report_outline:
+                self.logger.warning("Report plan has no sections", query=query[:100])
+                raise JudgeError("Report plan must have at least one section")
+            if not report_plan.report_title:
+                self.logger.warning("Report plan has no title", query=query[:100])
+                raise JudgeError("Report plan must have a title")
+            self.logger.info(
+                "Report plan created",
+                sections=len(report_plan.report_outline),
+                has_background=bool(report_plan.background_context),
+            )
+            return report_plan
+        except Exception as e:
+            self.logger.error("Planning failed", error=str(e), query=query[:100])
+            # Fallback: return minimal report plan
+            if isinstance(e, JudgeError | ConfigurationError):
+                raise
+            # For other errors, return a minimal plan
+            return ReportPlan(
+                background_context="",
+                report_outline=[
+                    ReportPlanSection(
+                        title="Research Findings",
+                        key_question=query,
+                    )
+                ],
+                report_title=f"Research Report: {query[:50]}",
+            )
+def create_planner_agent(model: Any | None = None) -> PlannerAgent:
+    """
+    Factory function to create a planner agent.
+    Args:
+        model: Optional Pydantic AI model. If None, uses settings default.
+    Returns:
+        Configured PlannerAgent instance
+    Raises:
+        ConfigurationError: If required API keys are missing
+    """
+    try:
+        # Get model from settings if not provided
+        if model is None:
+            model = get_model()
+        # Create and return planner agent
+        return PlannerAgent(model=model)
+    except Exception as e:
+        logger.error("Failed to create planner agent", error=str(e))
+        raise ConfigurationError(f"Failed to create planner agent: {e}") from e

src/orchestrator/research_flow.py ADDED Viewed

	@@ -0,0 +1,999 @@

+"""Research flow implementations for iterative and deep research patterns.
+Converts the folder/iterative_research.py and folder/deep_research.py
+implementations to use Pydantic AI agents.
+"""
+import asyncio
+import time
+from typing import Any
+import structlog
+from src.agent_factory.agents import (
+    create_graph_orchestrator,
+    create_knowledge_gap_agent,
+    create_long_writer_agent,
+    create_planner_agent,
+    create_proofreader_agent,
+    create_thinking_agent,
+    create_tool_selector_agent,
+    create_writer_agent,
+)
+from src.agent_factory.judges import create_judge_handler
+from src.middleware.budget_tracker import BudgetTracker
+from src.middleware.state_machine import get_workflow_state, init_workflow_state
+from src.middleware.workflow_manager import WorkflowManager
+from src.services.llamaindex_rag import LlamaIndexRAGService, get_rag_service
+from src.tools.tool_executor import execute_tool_tasks
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import (
+    AgentSelectionPlan,
+    AgentTask,
+    Citation,
+    Conversation,
+    Evidence,
+    JudgeAssessment,
+    KnowledgeGapOutput,
+    ReportDraft,
+    ReportDraftSection,
+    ReportPlan,
+    SourceName,
+    ToolAgentOutput,
+)
+logger = structlog.get_logger()
+class IterativeResearchFlow:
+    """
+    Iterative research flow that runs a single research loop.
+    Pattern: Generate observations → Evaluate gaps → Select tools → Execute → Repeat
+    until research is complete or constraints are met.
+    """
+    def __init__(
+        self,
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        verbose: bool = True,
+        use_graph: bool = False,
+        judge_handler: Any | None = None,
+    ) -> None:
+        """
+        Initialize iterative research flow.
+        Args:
+            max_iterations: Maximum number of iterations
+            max_time_minutes: Maximum time in minutes
+            verbose: Whether to log progress
+            use_graph: Whether to use graph-based execution (True) or agent chains (False)
+        """
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.verbose = verbose
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize agents (only needed for agent chain execution)
+        if not use_graph:
+            self.knowledge_gap_agent = create_knowledge_gap_agent()
+            self.tool_selector_agent = create_tool_selector_agent()
+            self.thinking_agent = create_thinking_agent()
+            self.writer_agent = create_writer_agent()
+            # Initialize judge handler (use provided or create new)
+            self.judge_handler = judge_handler or create_judge_handler()
+        # Initialize state (only needed for agent chain execution)
+        if not use_graph:
+            self.conversation = Conversation()
+            self.iteration = 0
+            self.start_time: float | None = None
+            self.should_continue = True
+            # Initialize budget tracker
+            self.budget_tracker = BudgetTracker()
+            self.loop_id = "iterative_flow"
+            self.budget_tracker.create_budget(
+                loop_id=self.loop_id,
+                tokens_limit=100000,
+                time_limit_seconds=max_time_minutes * 60,
+                iterations_limit=max_iterations,
+            )
+            self.budget_tracker.start_timer(self.loop_id)
+            # Initialize RAG service (lazy, may be None if unavailable)
+            self._rag_service: LlamaIndexRAGService | None = None
+        # Graph orchestrator (lazy initialization)
+        self._graph_orchestrator: Any = None
+    async def run(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow.
+        Args:
+            query: The research query
+            background_context: Optional background context
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Final report string
+        """
+        if self.use_graph:
+            return await self._run_with_graph(
+                query, background_context, output_length, output_instructions
+            )
+        else:
+            return await self._run_with_chains(
+                query, background_context, output_length, output_instructions
+            )
+    async def _run_with_chains(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow using agent chains.
+        Args:
+            query: The research query
+            background_context: Optional background context
+            output_length: Optional description of desired output length
+            output_instructions: Optional additional instructions
+        Returns:
+            Final report string
+        """
+        self.start_time = time.time()
+        self.logger.info("Starting iterative research (agent chains)", query=query[:100])
+        # Initialize conversation with first iteration
+        self.conversation.add_iteration()
+        # Main research loop
+        while self.should_continue and self._check_constraints():
+            self.iteration += 1
+            self.logger.info("Starting iteration", iteration=self.iteration)
+            # Add new iteration to conversation
+            self.conversation.add_iteration()
+            # 1. Generate observations
+            await self._generate_observations(query, background_context)
+            # 2. Evaluate gaps
+            evaluation = await self._evaluate_gaps(query, background_context)
+            # 3. Assess with judge (after tools execute, we'll assess again)
+            # For now, check knowledge gap evaluation
+            # After tool execution, we'll do a full judge assessment
+            # Check if research is complete (knowledge gap agent says complete)
+            if evaluation.research_complete:
+                self.should_continue = False
+                self.logger.info("Research marked as complete by knowledge gap agent")
+                break
+            # 4. Select tools for next gap
+            next_gap = evaluation.outstanding_gaps[0] if evaluation.outstanding_gaps else query
+            selection_plan = await self._select_agents(next_gap, query, background_context)
+            # 5. Execute tools
+            await self._execute_tools(selection_plan.tasks)
+            # 6. Assess evidence sufficiency with judge
+            judge_assessment = await self._assess_with_judge(query)
+            # Check if judge says evidence is sufficient
+            if judge_assessment.sufficient:
+                self.should_continue = False
+                self.logger.info(
+                    "Research marked as complete by judge",
+                    confidence=judge_assessment.confidence,
+                    reasoning=judge_assessment.reasoning[:100],
+                )
+                break
+            # Update budget tracker
+            self.budget_tracker.increment_iteration(self.loop_id)
+            self.budget_tracker.update_timer(self.loop_id)
+        # Create final report
+        report = await self._create_final_report(query, output_length, output_instructions)
+        elapsed = time.time() - (self.start_time or time.time())
+        self.logger.info(
+            "Iterative research completed",
+            iterations=self.iteration,
+            elapsed_minutes=elapsed / 60,
+        )
+        return report
+    async def _run_with_graph(
+        self,
+        query: str,
+        background_context: str = "",
+        output_length: str = "",
+        output_instructions: str = "",
+    ) -> str:
+        """
+        Run the iterative research flow using graph execution.
+        Args:
+            query: The research query
+            background_context: Optional background context (currently ignored in graph execution)
+            output_length: Optional description of desired output length (currently ignored in graph execution)
+            output_instructions: Optional additional instructions (currently ignored in graph execution)
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting iterative research (graph execution)", query=query[:100])
+        # Create graph orchestrator (lazy initialization)
+        if self._graph_orchestrator is None:
+            self._graph_orchestrator = create_graph_orchestrator(
+                mode="iterative",
+                max_iterations=self.max_iterations,
+                max_time_minutes=self.max_time_minutes,
+                use_graph=True,
+            )
+        # Run orchestrator and collect events
+        final_report = ""
+        async for event in self._graph_orchestrator.run(query):
+            if event.type == "complete":
+                final_report = event.message
+                break
+            elif event.type == "error":
+                self.logger.error("Graph execution error", error=event.message)
+                raise RuntimeError(f"Graph execution failed: {event.message}")
+        if not final_report:
+            self.logger.warning("No complete event received from graph orchestrator")
+            final_report = "Research completed but no report was generated."
+        self.logger.info("Iterative research completed (graph execution)")
+        return final_report
+    def _check_constraints(self) -> bool:
+        """Check if we've exceeded constraints."""
+        if self.iteration >= self.max_iterations:
+            self.logger.info("Max iterations reached", max=self.max_iterations)
+            return False
+        if self.start_time:
+            elapsed_minutes = (time.time() - self.start_time) / 60
+            if elapsed_minutes >= self.max_time_minutes:
+                self.logger.info("Max time reached", max=self.max_time_minutes)
+                return False
+        # Check budget tracker
+        self.budget_tracker.update_timer(self.loop_id)
+        exceeded, reason = self.budget_tracker.check_budget(self.loop_id)
+        if exceeded:
+            self.logger.info("Budget exceeded", reason=reason)
+            return False
+        return True
+    async def _generate_observations(self, query: str, background_context: str = "") -> str:
+        """Generate observations from current research state."""
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        # Build background context section separately to avoid backslash in f-string
+        background_section = (
+            f"BACKGROUND CONTEXT:\n{background_context}\n\n" if background_context else ""
+        )
+        input_prompt = f"""
+You are starting iteration {self.iteration} of your research process.
+ORIGINAL QUERY:
+{query}
+{background_section}HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        observations = await self.thinking_agent.generate_observations(
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+            iteration=self.iteration,
+        )
+        # Track tokens for this iteration
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, observations)
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for thinking agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        self.conversation.set_latest_thought(observations)
+        return observations
+    async def _evaluate_gaps(self, query: str, background_context: str = "") -> KnowledgeGapOutput:
+        """Evaluate knowledge gaps in current research."""
+        if self.start_time:
+            elapsed_minutes = (time.time() - self.start_time) / 60
+        else:
+            elapsed_minutes = 0.0
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        input_prompt = f"""
+Current Iteration Number: {self.iteration}
+Time Elapsed: {elapsed_minutes:.2f} minutes of maximum {self.max_time_minutes} minutes
+ORIGINAL QUERY:
+{query}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        evaluation = await self.knowledge_gap_agent.evaluate(
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+            iteration=self.iteration,
+            time_elapsed_minutes=elapsed_minutes,
+            max_time_minutes=self.max_time_minutes,
+        )
+        # Track tokens for this iteration
+        evaluation_text = f"research_complete={evaluation.research_complete}, gaps={len(evaluation.outstanding_gaps)}"
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            input_prompt, evaluation_text
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for knowledge gap agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        if not evaluation.research_complete and evaluation.outstanding_gaps:
+            self.conversation.set_latest_gap(evaluation.outstanding_gaps[0])
+        return evaluation
+    async def _assess_with_judge(self, query: str) -> JudgeAssessment:
+        """Assess evidence sufficiency using JudgeHandler.
+        Args:
+            query: The research query
+        Returns:
+            JudgeAssessment with sufficiency evaluation
+        """
+        state = get_workflow_state()
+        evidence = state.evidence  # Get all collected evidence
+        self.logger.info(
+            "Assessing evidence with judge",
+            query=query[:100],
+            evidence_count=len(evidence),
+        )
+        assessment = await self.judge_handler.assess(query, evidence)
+        # Track tokens for judge call
+        # Estimate tokens from query + evidence + assessment
+        evidence_text = "\n".join([e.content[:500] for e in evidence[:10]])  # Sample
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            query + evidence_text, str(assessment.reasoning)
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.info(
+            "Judge assessment complete",
+            sufficient=assessment.sufficient,
+            confidence=assessment.confidence,
+            recommendation=assessment.recommendation,
+        )
+        return assessment
+    async def _select_agents(
+        self, gap: str, query: str, background_context: str = ""
+    ) -> AgentSelectionPlan:
+        """Select tools to address knowledge gap."""
+        # Build input prompt for token estimation
+        conversation_history = self.conversation.compile_conversation_history()
+        background = f"BACKGROUND CONTEXT:\n{background_context}" if background_context else ""
+        input_prompt = f"""
+ORIGINAL QUERY:
+{query}
+KNOWLEDGE GAP TO ADDRESS:
+{gap}
+{background}
+HISTORY OF ACTIONS, FINDINGS AND THOUGHTS:
+{conversation_history or "No previous actions, findings or thoughts available."}
+"""
+        selection_plan = await self.tool_selector_agent.select_tools(
+            gap=gap,
+            query=query,
+            background_context=background_context,
+            conversation_history=conversation_history,
+        )
+        # Track tokens for this iteration
+        selection_text = f"tasks={len(selection_plan.tasks)}, agents={[task.agent for task in selection_plan.tasks]}"
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+            input_prompt, selection_text
+        )
+        self.budget_tracker.add_iteration_tokens(self.loop_id, self.iteration, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for tool selector agent",
+            iteration=self.iteration,
+            tokens=estimated_tokens,
+        )
+        # Store tool calls in conversation
+        tool_calls = [
+            f"[Agent] {task.agent} [Query] {task.query} [Entity] {task.entity_website or 'null'}"
+            for task in selection_plan.tasks
+        ]
+        self.conversation.set_latest_tool_calls(tool_calls)
+        return selection_plan
+    def _get_rag_service(self) -> LlamaIndexRAGService | None:
+        """
+        Get or create RAG service instance.
+        Returns:
+            RAG service instance, or None if unavailable
+        """
+        if self._rag_service is None:
+            try:
+                self._rag_service = get_rag_service()
+                self.logger.info("RAG service initialized for research flow")
+            except (ConfigurationError, ImportError) as e:
+                self.logger.warning(
+                    "RAG service unavailable", error=str(e), hint="OPENAI_API_KEY required"
+                )
+                return None
+        return self._rag_service
+    async def _execute_tools(self, tasks: list[AgentTask]) -> dict[str, ToolAgentOutput]:
+        """Execute selected tools concurrently."""
+        try:
+            results = await execute_tool_tasks(tasks)
+        except Exception as e:
+            # Handle tool execution errors gracefully
+            self.logger.error(
+                "Tool execution failed",
+                error=str(e),
+                task_count=len(tasks),
+                exc_info=True,
+            )
+            # Return empty results to allow research flow to continue
+            # The flow can still generate a report based on previous iterations
+            results = {}
+        # Store findings in conversation (only if we have results)
+        evidence_list: list[Evidence] = []
+        if results:
+            findings = [result.output for result in results.values()]
+            self.conversation.set_latest_findings(findings)
+            # Convert tool outputs to Evidence objects and store in workflow state
+            evidence_list = self._convert_tool_outputs_to_evidence(results)
+        if evidence_list:
+            state = get_workflow_state()
+            added_count = state.add_evidence(evidence_list)
+            self.logger.info(
+                "Evidence added to workflow state",
+                count=added_count,
+                total_evidence=len(state.evidence),
+            )
+            # Ingest evidence into RAG if available (Phase 6 requirement)
+            rag_service = self._get_rag_service()
+            if rag_service is not None:
+                try:
+                    # ingest_evidence is synchronous, run in executor to avoid blocking
+                    loop = asyncio.get_event_loop()
+                    await loop.run_in_executor(None, rag_service.ingest_evidence, evidence_list)
+                    self.logger.info(
+                        "Evidence ingested into RAG",
+                        count=len(evidence_list),
+                    )
+                except Exception as e:
+                    # Don't fail the research loop if RAG ingestion fails
+                    self.logger.warning(
+                        "Failed to ingest evidence into RAG",
+                        error=str(e),
+                        count=len(evidence_list),
+                    )
+        return results
+    def _convert_tool_outputs_to_evidence(
+        self, tool_results: dict[str, ToolAgentOutput]
+    ) -> list[Evidence]:
+        """Convert ToolAgentOutput to Evidence objects.
+        Args:
+            tool_results: Dictionary of tool execution results
+        Returns:
+            List of Evidence objects
+        """
+        evidence_list = []
+        for key, result in tool_results.items():
+            # Extract URLs from sources
+            if result.sources:
+                # Create one Evidence object per source URL
+                for url in result.sources:
+                    # Determine source type from URL or tool name
+                    # Default to "web" for unknown web sources
+                    source_type: SourceName = "web"
+                    if "pubmed" in url.lower() or "ncbi" in url.lower():
+                        source_type = "pubmed"
+                    elif "clinicaltrials" in url.lower():
+                        source_type = "clinicaltrials"
+                    elif "europepmc" in url.lower():
+                        source_type = "europepmc"
+                    elif "biorxiv" in url.lower():
+                        source_type = "biorxiv"
+                    elif "arxiv" in url.lower() or "preprint" in url.lower():
+                        source_type = "preprint"
+                    # Note: "web" is now a valid SourceName for general web sources
+                    citation = Citation(
+                        title=f"Tool Result: {key}",
+                        url=url,
+                        source=source_type,
+                        date="n.d.",
+                        authors=[],
+                    )
+                    # Truncate content to reasonable length for judge (1500 chars)
+                    content = result.output[:1500]
+                    if len(result.output) > 1500:
+                        content += "... [truncated]"
+                    evidence = Evidence(
+                        content=content,
+                        citation=citation,
+                        relevance=0.5,  # Default relevance
+                    )
+                    evidence_list.append(evidence)
+            else:
+                # No URLs, create a single Evidence object with tool output
+                # Use a placeholder URL based on the tool name
+                # Determine source type from tool name
+                tool_source_type: SourceName = "web"  # Default for unknown sources
+                if "RAG" in key:
+                    tool_source_type = "rag"
+                elif "WebSearch" in key or "SiteCrawler" in key:
+                    tool_source_type = "web"
+                # "web" is now a valid SourceName for general web sources
+                citation = Citation(
+                    title=f"Tool Result: {key}",
+                    url=f"tool://{key}",
+                    source=tool_source_type,
+                    date="n.d.",
+                    authors=[],
+                )
+                content = result.output[:1500]
+                if len(result.output) > 1500:
+                    content += "... [truncated]"
+                evidence = Evidence(
+                    content=content,
+                    citation=citation,
+                    relevance=0.5,
+                )
+                evidence_list.append(evidence)
+        return evidence_list
+    async def _create_final_report(
+        self, query: str, length: str = "", instructions: str = ""
+    ) -> str:
+        """Create final report from all findings."""
+        all_findings = "\n\n".join(self.conversation.get_all_findings())
+        if not all_findings:
+            all_findings = "No findings available yet."
+        # Build input prompt for token estimation
+        length_str = f"* The full response should be approximately {length}.\n" if length else ""
+        instructions_str = f"* {instructions}" if instructions else ""
+        guidelines_str = (
+            ("\n\nGUIDELINES:\n" + length_str + instructions_str).strip("\n")
+            if length or instructions
+            else ""
+        )
+        input_prompt = f"""
+Provide a response based on the query and findings below with as much detail as possible. {guidelines_str}
+QUERY: {query}
+FINDINGS:
+{all_findings}
+"""
+        report = await self.writer_agent.write_report(
+            query=query,
+            findings=all_findings,
+            output_length=length,
+            output_instructions=instructions,
+        )
+        # Track tokens for final report (not per iteration, just total)
+        estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, report)
+        self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+        self.logger.debug(
+            "Tokens tracked for writer agent (final report)",
+            tokens=estimated_tokens,
+        )
+        # Note: Citation validation for markdown reports would require Evidence objects
+        # Currently, findings are strings, not Evidence objects. For full validation,
+        # consider using ResearchReport format or passing Evidence objects separately.
+        # See src/utils/citation_validator.py for markdown citation validation utilities.
+        return report
+class DeepResearchFlow:
+    """
+    Deep research flow that runs parallel iterative loops per section.
+    Pattern: Plan → Parallel Iterative Loops (one per section) → Synthesis
+    """
+    def __init__(
+        self,
+        max_iterations: int = 5,
+        max_time_minutes: int = 10,
+        verbose: bool = True,
+        use_long_writer: bool = True,
+        use_graph: bool = False,
+    ) -> None:
+        """
+        Initialize deep research flow.
+        Args:
+            max_iterations: Maximum iterations per section
+            max_time_minutes: Maximum time per section
+            verbose: Whether to log progress
+            use_long_writer: Whether to use long writer (True) or proofreader (False)
+            use_graph: Whether to use graph-based execution (True) or agent chains (False)
+        """
+        self.max_iterations = max_iterations
+        self.max_time_minutes = max_time_minutes
+        self.verbose = verbose
+        self.use_long_writer = use_long_writer
+        self.use_graph = use_graph
+        self.logger = logger
+        # Initialize agents (only needed for agent chain execution)
+        if not use_graph:
+            self.planner_agent = create_planner_agent()
+            self.long_writer_agent = create_long_writer_agent()
+            self.proofreader_agent = create_proofreader_agent()
+            # Initialize judge handler for section loop completion
+            self.judge_handler = create_judge_handler()
+            # Initialize budget tracker for token tracking
+            self.budget_tracker = BudgetTracker()
+            self.loop_id = "deep_research_flow"
+            self.budget_tracker.create_budget(
+                loop_id=self.loop_id,
+                tokens_limit=200000,  # Higher limit for deep research
+                time_limit_seconds=max_time_minutes
+                * 60
+                * 2,  # Allow more time for parallel sections
+                iterations_limit=max_iterations * 10,  # Allow for multiple sections
+            )
+            self.budget_tracker.start_timer(self.loop_id)
+        # Graph orchestrator (lazy initialization)
+        self._graph_orchestrator: Any = None
+    async def run(self, query: str) -> str:
+        """
+        Run the deep research flow.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        if self.use_graph:
+            return await self._run_with_graph(query)
+        else:
+            return await self._run_with_chains(query)
+    async def _run_with_chains(self, query: str) -> str:
+        """
+        Run the deep research flow using agent chains.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting deep research (agent chains)", query=query[:100])
+        # Initialize workflow state for deep research
+        try:
+            from src.services.embeddings import get_embedding_service
+            embedding_service = get_embedding_service()
+        except (ImportError, Exception):
+            # If embedding service is unavailable, initialize without it
+            embedding_service = None
+            self.logger.debug("Embedding service unavailable, initializing state without it")
+        init_workflow_state(embedding_service=embedding_service)
+        self.logger.debug("Workflow state initialized for deep research")
+        # 1. Build report plan
+        report_plan = await self._build_report_plan(query)
+        self.logger.info(
+            "Report plan created",
+            sections=len(report_plan.report_outline),
+            title=report_plan.report_title,
+        )
+        # 2. Run parallel research loops with state synchronization
+        section_drafts = await self._run_research_loops(report_plan)
+        # Verify state synchronization - log evidence count
+        state = get_workflow_state()
+        self.logger.info(
+            "State synchronization complete",
+            total_evidence=len(state.evidence),
+            sections_completed=len(section_drafts),
+        )
+        # 3. Create final report
+        final_report = await self._create_final_report(query, report_plan, section_drafts)
+        self.logger.info(
+            "Deep research completed",
+            sections=len(section_drafts),
+            final_report_length=len(final_report),
+        )
+        return final_report
+    async def _run_with_graph(self, query: str) -> str:
+        """
+        Run the deep research flow using graph execution.
+        Args:
+            query: The research query
+        Returns:
+            Final report string
+        """
+        self.logger.info("Starting deep research (graph execution)", query=query[:100])
+        # Create graph orchestrator (lazy initialization)
+        if self._graph_orchestrator is None:
+            self._graph_orchestrator = create_graph_orchestrator(
+                mode="deep",
+                max_iterations=self.max_iterations,
+                max_time_minutes=self.max_time_minutes,
+                use_graph=True,
+            )
+        # Run orchestrator and collect events
+        final_report = ""
+        async for event in self._graph_orchestrator.run(query):
+            if event.type == "complete":
+                final_report = event.message
+                break
+            elif event.type == "error":
+                self.logger.error("Graph execution error", error=event.message)
+                raise RuntimeError(f"Graph execution failed: {event.message}")
+        if not final_report:
+            self.logger.warning("No complete event received from graph orchestrator")
+            final_report = "Research completed but no report was generated."
+        self.logger.info("Deep research completed (graph execution)")
+        return final_report
+    async def _build_report_plan(self, query: str) -> ReportPlan:
+        """Build the initial report plan."""
+        self.logger.info("Building report plan")
+        # Build input prompt for token estimation
+        input_prompt = f"QUERY: {query}"
+        report_plan = await self.planner_agent.run(query)
+        # Track tokens for planner agent
+        if not self.use_graph and hasattr(self, "budget_tracker"):
+            plan_text = (
+                f"title={report_plan.report_title}, sections={len(report_plan.report_outline)}"
+            )
+            estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(input_prompt, plan_text)
+            self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+            self.logger.debug(
+                "Tokens tracked for planner agent",
+                tokens=estimated_tokens,
+            )
+        self.logger.info(
+            "Report plan created",
+            sections=len(report_plan.report_outline),
+            has_background=bool(report_plan.background_context),
+        )
+        return report_plan
+    async def _run_research_loops(self, report_plan: ReportPlan) -> list[str]:
+        """Run parallel iterative research loops for each section."""
+        self.logger.info("Running research loops", sections=len(report_plan.report_outline))
+        # Create workflow manager for parallel execution
+        workflow_manager = WorkflowManager()
+        # Create loop configurations
+        loop_configs = [
+            {
+                "loop_id": f"section_{i}",
+                "query": section.key_question,
+                "section_title": section.title,
+                "background_context": report_plan.background_context,
+            }
+            for i, section in enumerate(report_plan.report_outline)
+        ]
+        async def run_research_for_section(config: dict[str, Any]) -> str:
+            """Run iterative research for a single section."""
+            loop_id = config.get("loop_id", "unknown")
+            query = config.get("query", "")
+            background_context = config.get("background_context", "")
+            try:
+                # Update loop status
+                await workflow_manager.update_loop_status(loop_id, "running")
+                # Create iterative research flow
+                flow = IterativeResearchFlow(
+                    max_iterations=self.max_iterations,
+                    max_time_minutes=self.max_time_minutes,
+                    verbose=self.verbose,
+                    use_graph=self.use_graph,
+                    judge_handler=self.judge_handler if not self.use_graph else None,
+                )
+                # Run research
+                result = await flow.run(
+                    query=query,
+                    background_context=background_context,
+                )
+                # Sync evidence from flow to loop
+                state = get_workflow_state()
+                if state.evidence:
+                    await workflow_manager.add_loop_evidence(loop_id, state.evidence)
+                # Update loop status
+                await workflow_manager.update_loop_status(loop_id, "completed")
+                return result
+            except Exception as e:
+                error_msg = str(e)
+                await workflow_manager.update_loop_status(loop_id, "failed", error=error_msg)
+                self.logger.error(
+                    "Section research failed",
+                    loop_id=loop_id,
+                    error=error_msg,
+                )
+                raise
+        # Run all sections in parallel using workflow manager
+        section_drafts = await workflow_manager.run_loops_parallel(
+            loop_configs=loop_configs,
+            loop_func=run_research_for_section,
+            judge_handler=self.judge_handler if not self.use_graph else None,
+            budget_tracker=self.budget_tracker if not self.use_graph else None,
+        )
+        # Sync evidence from all loops to global state
+        for config in loop_configs:
+            loop_id = config.get("loop_id")
+            if loop_id:
+                await workflow_manager.sync_loop_evidence_to_state(loop_id)
+        # Filter out None results (failed loops)
+        section_drafts = [draft for draft in section_drafts if draft is not None]
+        self.logger.info(
+            "Research loops completed",
+            drafts=len(section_drafts),
+            total_sections=len(report_plan.report_outline),
+        )
+        return section_drafts
+    async def _create_final_report(
+        self, query: str, report_plan: ReportPlan, section_drafts: list[str]
+    ) -> str:
+        """Create final report from section drafts."""
+        self.logger.info("Creating final report")
+        # Create ReportDraft from section drafts
+        report_draft = ReportDraft(
+            sections=[
+                ReportDraftSection(
+                    section_title=section.title,
+                    section_content=draft,
+                )
+                for section, draft in zip(report_plan.report_outline, section_drafts, strict=False)
+            ]
+        )
+        # Build input prompt for token estimation
+        draft_text = "\n".join(
+            [s.section_content[:500] for s in report_draft.sections[:5]]
+        )  # Sample
+        input_prompt = f"QUERY: {query}\nTITLE: {report_plan.report_title}\nDRAFT: {draft_text}"
+        if self.use_long_writer:
+            # Use long writer agent
+            final_report = await self.long_writer_agent.write_report(
+                original_query=query,
+                report_title=report_plan.report_title,
+                report_draft=report_draft,
+            )
+        else:
+            # Use proofreader agent
+            final_report = await self.proofreader_agent.proofread(
+                query=query,
+                report_draft=report_draft,
+            )
+        # Track tokens for final report synthesis
+        if not self.use_graph and hasattr(self, "budget_tracker"):
+            estimated_tokens = self.budget_tracker.estimate_llm_call_tokens(
+                input_prompt, final_report
+            )
+            self.budget_tracker.add_tokens(self.loop_id, estimated_tokens)
+            self.logger.debug(
+                "Tokens tracked for final report synthesis",
+                tokens=estimated_tokens,
+                agent="long_writer" if self.use_long_writer else "proofreader",
+            )
+        self.logger.info("Final report created", length=len(final_report))
+        return final_report

src/orchestrator_factory.py CHANGED Viewed

@@ -2,7 +2,7 @@
 from typing import Any, Literal
-from src.orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
 from src.utils.models import OrchestratorConfig

 from typing import Any, Literal
+from src.legacy_orchestrator import JudgeHandlerProtocol, Orchestrator, SearchHandlerProtocol
 from src.utils.models import OrchestratorConfig

src/tools/__init__.py CHANGED Viewed

@@ -2,7 +2,14 @@
 from src.tools.base import SearchTool
 from src.tools.pubmed import PubMedTool
 from src.tools.search_handler import SearchHandler
 # Re-export
-__all__ = ["PubMedTool", "SearchHandler", "SearchTool"]

 from src.tools.base import SearchTool
 from src.tools.pubmed import PubMedTool
+from src.tools.rag_tool import RAGTool, create_rag_tool
 from src.tools.search_handler import SearchHandler
 # Re-export
+__all__ = [
+    "PubMedTool",
+    "SearchHandler",
+    "SearchTool",
+    "RAGTool",
+    "create_rag_tool",
+]

src/tools/crawl_adapter.py ADDED Viewed

	@@ -0,0 +1,58 @@

+"""Website crawl tool adapter for Pydantic AI agents.
+Adapts the folder/tools/crawl_website.py implementation to work with Pydantic AI.
+"""
+import structlog
+logger = structlog.get_logger()
+async def crawl_website(starting_url: str) -> str:
+    """
+    Crawl a website starting from the given URL and return formatted results.
+    Use this tool to crawl a website for information relevant to the query.
+    Provide a starting URL as input.
+    Args:
+        starting_url: The starting URL to crawl (e.g., "https://example.com")
+    Returns:
+        Formatted string with crawled content including titles, descriptions, and URLs
+    """
+    try:
+        # Lazy import to avoid requiring folder/ dependencies at import time
+        from folder.tools.crawl_website import crawl_website as crawl_tool
+        # Call the tool function
+        # The tool returns List[ScrapeResult] or str
+        results = await crawl_tool(starting_url)
+        if isinstance(results, str):
+            # Error message returned
+            logger.warning("Crawl returned error", error=results)
+            return results
+        if not results:
+            return f"No content found when crawling: {starting_url}"
+        # Format results for agent consumption
+        formatted = [f"Found {len(results)} pages from {starting_url}:\n"]
+        for i, result in enumerate(results[:10], 1):  # Limit to 10 pages
+            formatted.append(f"{i}. **{result.title or 'Untitled'}**")
+            if result.description:
+                formatted.append(f"   {result.description[:200]}...")
+            formatted.append(f"   URL: {result.url}")
+            if result.text:
+                formatted.append(f"   Content: {result.text[:500]}...")
+            formatted.append("")
+        return "\n".join(formatted)
+    except ImportError as e:
+        logger.error("Crawl tool not available", error=str(e))
+        return f"Crawl tool not available: {e!s}"
+    except Exception as e:
+        logger.error("Crawl failed", error=str(e), url=starting_url)
+        return f"Error crawling website: {e!s}"

src/tools/rag_tool.py ADDED Viewed

	@@ -0,0 +1,183 @@

+"""RAG tool for semantic search within collected evidence.
+Implements SearchTool protocol to enable RAG as a search option in the research workflow.
+"""
+from typing import TYPE_CHECKING, Any
+import structlog
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import Citation, Evidence, SourceName
+if TYPE_CHECKING:
+    from src.services.llamaindex_rag import LlamaIndexRAGService
+logger = structlog.get_logger()
+class RAGTool:
+    """Search tool that uses LlamaIndex RAG for semantic search within collected evidence.
+    Wraps LlamaIndexRAGService to implement the SearchTool protocol.
+    Returns Evidence objects from RAG retrieval results.
+    """
+    def __init__(self, rag_service: "LlamaIndexRAGService | None" = None) -> None:
+        """
+        Initialize RAG tool.
+        Args:
+            rag_service: Optional RAG service instance. If None, will be lazy-initialized.
+        """
+        self._rag_service = rag_service
+        self.logger = logger
+    @property
+    def name(self) -> str:
+        """Return the tool name."""
+        return "rag"
+    def _get_rag_service(self) -> "LlamaIndexRAGService":
+        """
+        Get or create RAG service instance.
+        Returns:
+            LlamaIndexRAGService instance
+        Raises:
+            ConfigurationError: If RAG service cannot be initialized
+        """
+        if self._rag_service is None:
+            try:
+                from src.services.llamaindex_rag import get_rag_service
+                self._rag_service = get_rag_service()
+                self.logger.info("RAG service initialized")
+            except (ConfigurationError, ImportError) as e:
+                self.logger.error("Failed to initialize RAG service", error=str(e))
+                raise ConfigurationError("RAG service unavailable. OPENAI_API_KEY required.") from e
+        return self._rag_service
+    async def search(self, query: str, max_results: int = 10) -> list[Evidence]:
+        """
+        Search RAG system and return evidence.
+        Args:
+            query: The search query string
+            max_results: Maximum number of results to return
+        Returns:
+            List of Evidence objects from RAG retrieval
+        Note:
+            Returns empty list on error (does not raise exceptions).
+        """
+        try:
+            rag_service = self._get_rag_service()
+        except ConfigurationError:
+            self.logger.warning("RAG service unavailable, returning empty results")
+            return []
+        try:
+            # Retrieve documents from RAG
+            retrieved_docs = rag_service.retrieve(query, top_k=max_results)
+            if not retrieved_docs:
+                self.logger.info("No RAG results found", query=query[:50])
+                return []
+            # Convert retrieved documents to Evidence objects
+            evidence_list: list[Evidence] = []
+            for doc in retrieved_docs:
+                try:
+                    evidence = self._doc_to_evidence(doc)
+                    evidence_list.append(evidence)
+                except Exception as e:
+                    self.logger.warning(
+                        "Failed to convert document to evidence",
+                        error=str(e),
+                        doc_text=doc.get("text", "")[:50],
+                    )
+                    continue
+            self.logger.info(
+                "RAG search completed",
+                query=query[:50],
+                results=len(evidence_list),
+            )
+            return evidence_list
+        except Exception as e:
+            self.logger.error("RAG search failed", error=str(e), query=query[:50])
+            # Return empty list on error (graceful degradation)
+            return []
+    def _doc_to_evidence(self, doc: dict[str, Any]) -> Evidence:
+        """
+        Convert RAG document to Evidence object.
+        Args:
+            doc: Document dict with keys: text, score, metadata
+        Returns:
+            Evidence object
+        Raises:
+            ValueError: If document is missing required fields
+        """
+        text = doc.get("text", "")
+        if not text:
+            raise ValueError("Document missing text content")
+        metadata = doc.get("metadata", {})
+        score = doc.get("score", 0.0)
+        # Extract citation information from metadata
+        source: SourceName = "rag"  # RAG is the source
+        title = metadata.get("title", "Untitled")
+        url = metadata.get("url", "")
+        date = metadata.get("date", "Unknown")
+        authors_str = metadata.get("authors", "")
+        authors = [a.strip() for a in authors_str.split(",") if a.strip()] if authors_str else []
+        # Create citation
+        citation = Citation(
+            source=source,
+            title=title[:500],  # Enforce max length
+            url=url,
+            date=date,
+            authors=authors,
+        )
+        # Create evidence with relevance score (normalize score to 0-1 if needed)
+        relevance = min(max(float(score), 0.0), 1.0) if score else 0.0
+        return Evidence(
+            content=text,
+            citation=citation,
+            relevance=relevance,
+        )
+def create_rag_tool(
+    rag_service: "LlamaIndexRAGService | None" = None,
+) -> RAGTool:
+    """
+    Factory function to create a RAG tool.
+    Args:
+        rag_service: Optional RAG service instance. If None, will be lazy-initialized.
+    Returns:
+        Configured RAGTool instance
+    Raises:
+        ConfigurationError: If RAG service cannot be initialized and rag_service is None
+    """
+    try:
+        return RAGTool(rag_service=rag_service)
+    except Exception as e:
+        logger.error("Failed to create RAG tool", error=str(e))
+        raise ConfigurationError(f"Failed to create RAG tool: {e}") from e

src/tools/search_handler.py CHANGED Viewed

@@ -1,30 +1,74 @@
 """Search handler - orchestrates multiple search tools."""
 import asyncio
-from typing import cast
 import structlog
 from src.tools.base import SearchTool
-from src.utils.exceptions import SearchError
 from src.utils.models import Evidence, SearchResult, SourceName
 logger = structlog.get_logger()
 class SearchHandler:
     """Orchestrates parallel searches across multiple tools."""
-    def __init__(self, tools: list[SearchTool], timeout: float = 30.0) -> None:
         """
         Initialize the search handler.
         Args:
             tools: List of search tools to use
             timeout: Timeout for each search in seconds
         """
-        self.tools = tools
         self.timeout = timeout
     async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult:
         """
@@ -66,7 +110,7 @@ class SearchHandler:
                 sources_searched.append(tool_name)
                 logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
-        return SearchResult(
             query=query,
             evidence=all_evidence,
             sources_searched=sources_searched,
@@ -74,6 +118,24 @@ class SearchHandler:
             errors=errors,
         )
     async def _search_with_timeout(
         self,
         tool: SearchTool,

 """Search handler - orchestrates multiple search tools."""
 import asyncio
+from typing import TYPE_CHECKING, cast
 import structlog
 from src.tools.base import SearchTool
+from src.tools.rag_tool import create_rag_tool
+from src.utils.exceptions import ConfigurationError, SearchError
 from src.utils.models import Evidence, SearchResult, SourceName
+if TYPE_CHECKING:
+    from src.services.llamaindex_rag import LlamaIndexRAGService
 logger = structlog.get_logger()
 class SearchHandler:
     """Orchestrates parallel searches across multiple tools."""
+    def __init__(
+        self,
+        tools: list[SearchTool],
+        timeout: float = 30.0,
+        include_rag: bool = False,
+        auto_ingest_to_rag: bool = True,
+    ) -> None:
         """
         Initialize the search handler.
         Args:
             tools: List of search tools to use
             timeout: Timeout for each search in seconds
+            include_rag: Whether to include RAG tool in searches
+            auto_ingest_to_rag: Whether to automatically ingest results into RAG
         """
+        self.tools = list(tools)  # Make a copy
         self.timeout = timeout
+        self.auto_ingest_to_rag = auto_ingest_to_rag
+        self._rag_service: "LlamaIndexRAGService | None" = None
+        if include_rag:
+            self.add_rag_tool()
+    def add_rag_tool(self) -> None:
+        """Add RAG tool to the tools list if available."""
+        try:
+            rag_tool = create_rag_tool()
+            self.tools.append(rag_tool)
+            logger.info("RAG tool added to search handler")
+        except ConfigurationError:
+            logger.warning(
+                "RAG tool unavailable, not adding to search handler",
+                hint="OPENAI_API_KEY required",
+            )
+        except Exception as e:
+            logger.error("Failed to add RAG tool", error=str(e))
+    def _get_rag_service(self) -> "LlamaIndexRAGService | None":
+        """Get or create RAG service for ingestion."""
+        if self._rag_service is None and self.auto_ingest_to_rag:
+            try:
+                from src.services.llamaindex_rag import get_rag_service
+                self._rag_service = get_rag_service()
+                logger.info("RAG service initialized for ingestion")
+            except (ConfigurationError, ImportError):
+                logger.warning("RAG service unavailable for ingestion")
+                return None
+        return self._rag_service
     async def execute(self, query: str, max_results_per_tool: int = 10) -> SearchResult:
         """
                 sources_searched.append(tool_name)
                 logger.info("Search tool succeeded", tool=tool.name, count=len(success_result))
+        search_result = SearchResult(
             query=query,
             evidence=all_evidence,
             sources_searched=sources_searched,
             errors=errors,
         )
+        # Ingest evidence into RAG if enabled and available
+        if self.auto_ingest_to_rag and all_evidence:
+            rag_service = self._get_rag_service()
+            if rag_service:
+                try:
+                    # Filter out RAG-sourced evidence (avoid circular ingestion)
+                    evidence_to_ingest = [e for e in all_evidence if e.citation.source != "rag"]
+                    if evidence_to_ingest:
+                        rag_service.ingest_evidence(evidence_to_ingest)
+                        logger.info(
+                            "Ingested evidence into RAG",
+                            count=len(evidence_to_ingest),
+                        )
+                except Exception as e:
+                    logger.warning("Failed to ingest evidence into RAG", error=str(e))
+        return search_result
     async def _search_with_timeout(
         self,
         tool: SearchTool,

src/tools/tool_executor.py ADDED Viewed

	@@ -0,0 +1,193 @@

+"""Tool executor for running AgentTask objects.
+Executes tool tasks selected by the tool selector agent and returns ToolAgentOutput.
+"""
+import structlog
+from src.tools.crawl_adapter import crawl_website
+from src.tools.rag_tool import RAGTool, create_rag_tool
+from src.tools.web_search_adapter import web_search
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import AgentTask, Evidence, ToolAgentOutput
+logger = structlog.get_logger()
+# Module-level RAG tool instance (lazy initialization)
+_rag_tool: RAGTool | None = None
+def _get_rag_tool() -> RAGTool | None:
+    """
+    Get or create RAG tool instance.
+    Returns:
+        RAGTool instance, or None if unavailable
+    """
+    global _rag_tool
+    if _rag_tool is None:
+        try:
+            _rag_tool = create_rag_tool()
+            logger.info("RAG tool initialized")
+        except ConfigurationError:
+            logger.warning("RAG tool unavailable (OPENAI_API_KEY required)")
+            return None
+        except Exception as e:
+            logger.error("Failed to initialize RAG tool", error=str(e))
+            return None
+    return _rag_tool
+def _evidence_to_text(evidence_list: list[Evidence]) -> str:
+    """
+    Convert Evidence objects to formatted text.
+    Args:
+        evidence_list: List of Evidence objects
+    Returns:
+        Formatted text string with citations and content
+    """
+    if not evidence_list:
+        return "No evidence found."
+    formatted_parts = []
+    for i, evidence in enumerate(evidence_list, 1):
+        citation = evidence.citation
+        citation_str = f"{citation.formatted}"
+        if citation.url:
+            citation_str += f" [{citation.url}]"
+        formatted_parts.append(f"[{i}] {citation_str}\n\n{evidence.content}\n\n---\n")
+    return "\n".join(formatted_parts)
+async def execute_agent_task(task: AgentTask) -> ToolAgentOutput:
+    """
+    Execute a single agent task and return ToolAgentOutput.
+    Args:
+        task: AgentTask specifying which tool to use and what query to run
+    Returns:
+        ToolAgentOutput with results and source URLs
+    """
+    logger.info(
+        "Executing agent task",
+        agent=task.agent,
+        query=task.query[:100] if task.query else "",
+        gap=task.gap[:100] if task.gap else "",
+    )
+    try:
+        if task.agent == "WebSearchAgent":
+            # Use web search adapter
+            result_text = await web_search(task.query)
+            # Extract URLs from result (simple heuristic - look for http/https)
+            import re
+            urls = re.findall(r"https?://[^\s\)]+", result_text)
+            sources = list(set(urls))  # Deduplicate
+            return ToolAgentOutput(output=result_text, sources=sources)
+        elif task.agent == "SiteCrawlerAgent":
+            # Use crawl adapter
+            if task.entity_website:
+                starting_url = task.entity_website
+            elif task.query.startswith(("http://", "https://")):
+                starting_url = task.query
+            else:
+                # Try to construct URL from query
+                starting_url = f"https://{task.query}"
+            result_text = await crawl_website(starting_url)
+            # Extract URLs from result
+            import re
+            urls = re.findall(r"https?://[^\s\)]+", result_text)
+            sources = list(set(urls))  # Deduplicate
+            return ToolAgentOutput(output=result_text, sources=sources)
+        elif task.agent == "RAGAgent":
+            # Use RAG tool for semantic search
+            rag_tool = _get_rag_tool()
+            if rag_tool is None:
+                return ToolAgentOutput(
+                    output="RAG service unavailable. OPENAI_API_KEY required.",
+                    sources=[],
+                )
+            # Search RAG and get Evidence objects
+            evidence_list = await rag_tool.search(task.query, max_results=10)
+            if not evidence_list:
+                return ToolAgentOutput(
+                    output="No relevant evidence found in collected research.",
+                    sources=[],
+                )
+            # Convert Evidence to formatted text
+            result_text = _evidence_to_text(evidence_list)
+            # Extract URLs from evidence citations
+            sources = [evidence.citation.url for evidence in evidence_list if evidence.citation.url]
+            return ToolAgentOutput(output=result_text, sources=sources)
+        else:
+            logger.warning("Unknown agent type", agent=task.agent)
+            return ToolAgentOutput(
+                output=f"Unknown agent type: {task.agent}. Available: WebSearchAgent, SiteCrawlerAgent, RAGAgent",
+                sources=[],
+            )
+    except Exception as e:
+        logger.error("Tool execution failed", error=str(e), agent=task.agent)
+        return ToolAgentOutput(
+            output=f"Error executing {task.agent} for gap '{task.gap}': {e!s}",
+            sources=[],
+        )
+async def execute_tool_tasks(
+    tasks: list[AgentTask],
+) -> dict[str, ToolAgentOutput]:
+    """
+    Execute multiple agent tasks concurrently.
+    Args:
+        tasks: List of AgentTask objects to execute
+    Returns:
+        Dictionary mapping task keys to ToolAgentOutput results
+    """
+    import asyncio
+    logger.info("Executing tool tasks", count=len(tasks))
+    # Create async tasks
+    async_tasks = [execute_agent_task(task) for task in tasks]
+    # Run concurrently
+    results_list = await asyncio.gather(*async_tasks, return_exceptions=True)
+    # Build results dictionary
+    results: dict[str, ToolAgentOutput] = {}
+    for i, (task, result) in enumerate(zip(tasks, results_list, strict=False)):
+        if isinstance(result, Exception):
+            logger.error("Task execution failed", error=str(result), task_index=i)
+            results[f"{task.agent}_{i}"] = ToolAgentOutput(output=f"Error: {result!s}", sources=[])
+        else:
+            # Type narrowing: result is ToolAgentOutput after Exception check
+            assert isinstance(
+                result, ToolAgentOutput
+            ), "Expected ToolAgentOutput after Exception check"
+            key = f"{task.agent}_{task.gap or i}" if task.gap else f"{task.agent}_{i}"
+            results[key] = result
+    logger.info("Tool tasks completed", completed=len(results))
+    return results

src/tools/web_search_adapter.py ADDED Viewed

	@@ -0,0 +1,63 @@

+"""Web search tool adapter for Pydantic AI agents.
+Adapts the folder/tools/web_search.py implementation to work with Pydantic AI.
+"""
+import structlog
+logger = structlog.get_logger()
+async def web_search(query: str) -> str:
+    """
+    Perform a web search for a given query and return formatted results.
+    Use this tool to search the web for information relevant to the query.
+    Provide a query with 3-6 words as input.
+    Args:
+        query: The search query (3-6 words recommended)
+    Returns:
+        Formatted string with search results including titles, descriptions, and URLs
+    """
+    try:
+        # Lazy import to avoid requiring folder/ dependencies at import time
+        # This will use the existing web_search tool from folder/tools
+        from folder.llm_config import create_default_config
+        from folder.tools.web_search import create_web_search_tool
+        config = create_default_config()
+        web_search_tool = create_web_search_tool(config)
+        # Call the tool function
+        # The tool returns List[ScrapeResult] or str
+        results = await web_search_tool(query)
+        if isinstance(results, str):
+            # Error message returned
+            logger.warning("Web search returned error", error=results)
+            return results
+        if not results:
+            return f"No web search results found for: {query}"
+        # Format results for agent consumption
+        formatted = [f"Found {len(results)} web search results:\n"]
+        for i, result in enumerate(results[:5], 1):  # Limit to 5 results
+            formatted.append(f"{i}. **{result.title}**")
+            if result.description:
+                formatted.append(f"   {result.description[:200]}...")
+            formatted.append(f"   URL: {result.url}")
+            if result.text:
+                formatted.append(f"   Content: {result.text[:300]}...")
+            formatted.append("")
+        return "\n".join(formatted)
+    except ImportError as e:
+        logger.error("Web search tool not available", error=str(e))
+        return f"Web search tool not available: {e!s}"
+    except Exception as e:
+        logger.error("Web search failed", error=str(e), query=query)
+        return f"Error performing web search: {e!s}"

src/utils/citation_validator.py CHANGED Viewed

@@ -85,3 +85,94 @@ def build_reference_from_evidence(evidence: "Evidence") -> dict[str, str]:
         "date": evidence.citation.date or "n.d.",
         "url": evidence.citation.url,
     }

         "date": evidence.citation.date or "n.d.",
         "url": evidence.citation.url,
     }
+def validate_markdown_citations(
+    markdown_report: str, evidence: list["Evidence"]
+) -> tuple[str, int]:
+    """Validate citations in a markdown report against collected evidence.
+    This function validates citations in markdown format (e.g., [1], [2]) by:
+    1. Extracting URLs from the references section
+    2. Matching them against Evidence objects
+    3. Removing invalid citations from the report
+    Note:
+        This is a basic validation. For full validation, use ResearchReport
+        objects with validate_references().
+    Args:
+        markdown_report: The markdown report string with citations
+        evidence: List of Evidence objects collected during research
+    Returns:
+        Tuple of (validated_markdown, removed_count)
+    """
+    import re
+    # Build set of valid URLs from evidence
+    valid_urls = {e.citation.url for e in evidence}
+    valid_urls_lower = {url.lower() for url in valid_urls}
+    # Extract references section (everything after "## References" or "References:")
+    ref_section_pattern = r"(?i)(?:##\s*)?References:?\s*\n(.*?)(?=\n##|\Z)"
+    ref_match = re.search(ref_section_pattern, markdown_report, re.DOTALL)
+    if not ref_match:
+        # No references section found, return as-is
+        return markdown_report, 0
+    ref_section = ref_match.group(1)
+    ref_lines = ref_section.strip().split("\n")
+    # Parse references: [1] https://example.com or [1] https://example.com Title
+    valid_refs = []
+    removed_count = 0
+    for ref_line in ref_lines:
+        stripped_line = ref_line.strip()
+        if not stripped_line:
+            continue
+        # Extract URL from reference line
+        # Pattern: [N] URL or [N] URL Title
+        url_match = re.search(r"https?://[^\s\)]+", stripped_line)
+        if url_match:
+            url = url_match.group(0).rstrip(".,;")
+            url_lower = url.lower()
+            # Check if URL is valid
+            if url in valid_urls or url_lower in valid_urls_lower:
+                valid_refs.append(stripped_line)
+            else:
+                removed_count += 1
+                logger.warning(
+                    f"Removed invalid citation from markdown: {url[:80]}"
+                    + ("..." if len(url) > 80 else "")
+                )
+        else:
+            # No URL found, keep the line (might be formatted differently)
+            valid_refs.append(stripped_line)
+    # Rebuild references section
+    if valid_refs:
+        new_ref_section = "\n".join(valid_refs)
+        # Replace the old references section
+        validated_markdown = (
+            markdown_report[: ref_match.start(1)]
+            + new_ref_section
+            + markdown_report[ref_match.end(1) :]
+        )
+    else:
+        # No valid references, remove the entire section
+        validated_markdown = (
+            markdown_report[: ref_match.start()] + markdown_report[ref_match.end() :]
+        )
+    if removed_count > 0:
+        logger.info(
+            f"Citation validation removed {removed_count} invalid citations from markdown report. "
+            f"{len(valid_refs)} valid citations remain."
+        )
+    return validated_markdown, removed_count

src/utils/config.py CHANGED Viewed

@@ -41,15 +41,65 @@ class Settings(BaseSettings):
         default="all-MiniLM-L6-v2",
         description="Local sentence-transformers model (used by EmbeddingService)",
     )
     # PubMed Configuration
     ncbi_api_key: str | None = Field(
         default=None, description="NCBI API key for higher rate limits"
     )
     # Agent Configuration
     max_iterations: int = Field(default=10, ge=1, le=50)
     search_timeout: int = Field(default=30, description="Seconds to wait for search")
     # Logging
     log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
@@ -58,6 +108,34 @@ class Settings(BaseSettings):
     modal_token_id: str | None = Field(default=None, description="Modal token ID")
     modal_token_secret: str | None = Field(default=None, description="Modal token secret")
     chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
     @property
     def modal_available(self) -> bool:
@@ -102,6 +180,26 @@ class Settings(BaseSettings):
         """Check if any LLM API key is available."""
         return self.has_openai_key or self.has_anthropic_key
 def get_settings() -> Settings:
     """Factory function to get settings (allows mocking in tests)."""

         default="all-MiniLM-L6-v2",
         description="Local sentence-transformers model (used by EmbeddingService)",
     )
+    embedding_provider: Literal["openai", "local", "huggingface"] = Field(
+        default="local",
+        description="Embedding provider to use",
+    )
+    huggingface_embedding_model: str = Field(
+        default="sentence-transformers/all-MiniLM-L6-v2",
+        description="HuggingFace embedding model ID",
+    )
+    # HuggingFace Configuration
+    huggingface_api_key: str | None = Field(
+        default=None, description="HuggingFace API token (HF_TOKEN or HUGGINGFACE_API_KEY)"
+    )
+    huggingface_model: str = Field(
+        default="meta-llama/Llama-3.1-8B-Instruct",
+        description="Default HuggingFace model ID for inference",
+    )
     # PubMed Configuration
     ncbi_api_key: str | None = Field(
         default=None, description="NCBI API key for higher rate limits"
     )
+    # Web Search Configuration
+    web_search_provider: Literal["serper", "searchxng", "brave", "tavily", "duckduckgo"] = Field(
+        default="duckduckgo",
+        description="Web search provider to use",
+    )
+    serper_api_key: str | None = Field(default=None, description="Serper API key for Google search")
+    searchxng_host: str | None = Field(default=None, description="SearchXNG host URL")
+    brave_api_key: str | None = Field(default=None, description="Brave Search API key")
+    tavily_api_key: str | None = Field(default=None, description="Tavily API key")
     # Agent Configuration
     max_iterations: int = Field(default=10, ge=1, le=50)
     search_timeout: int = Field(default=30, description="Seconds to wait for search")
+    use_graph_execution: bool = Field(
+        default=False, description="Use graph-based execution for research flows"
+    )
+    # Budget & Rate Limiting Configuration
+    default_token_limit: int = Field(
+        default=100000,
+        ge=1000,
+        le=1000000,
+        description="Default token budget per research loop",
+    )
+    default_time_limit_minutes: int = Field(
+        default=10,
+        ge=1,
+        le=120,
+        description="Default time limit per research loop (minutes)",
+    )
+    default_iterations_limit: int = Field(
+        default=10,
+        ge=1,
+        le=50,
+        description="Default iterations limit per research loop",
+    )
     # Logging
     log_level: Literal["DEBUG", "INFO", "WARNING", "ERROR"] = "INFO"
     modal_token_id: str | None = Field(default=None, description="Modal token ID")
     modal_token_secret: str | None = Field(default=None, description="Modal token secret")
     chroma_db_path: str = Field(default="./chroma_db", description="ChromaDB storage path")
+    chroma_db_persist: bool = Field(
+        default=True,
+        description="Whether to persist ChromaDB to disk",
+    )
+    chroma_db_host: str | None = Field(
+        default=None,
+        description="ChromaDB server host (for remote ChromaDB)",
+    )
+    chroma_db_port: int | None = Field(
+        default=None,
+        description="ChromaDB server port (for remote ChromaDB)",
+    )
+    # RAG Service Configuration
+    rag_collection_name: str = Field(
+        default="deepcritical_evidence",
+        description="ChromaDB collection name for RAG",
+    )
+    rag_similarity_top_k: int = Field(
+        default=5,
+        ge=1,
+        le=50,
+        description="Number of top results to retrieve from RAG",
+    )
+    rag_auto_ingest: bool = Field(
+        default=True,
+        description="Automatically ingest evidence into RAG",
+    )
     @property
     def modal_available(self) -> bool:
         """Check if any LLM API key is available."""
         return self.has_openai_key or self.has_anthropic_key
+    @property
+    def has_huggingface_key(self) -> bool:
+        """Check if HuggingFace API key is available."""
+        return bool(self.huggingface_api_key)
+    @property
+    def web_search_available(self) -> bool:
+        """Check if web search is available (either no-key provider or API key present)."""
+        if self.web_search_provider == "duckduckgo":
+            return True  # No API key required
+        if self.web_search_provider == "serper":
+            return bool(self.serper_api_key)
+        if self.web_search_provider == "searchxng":
+            return bool(self.searchxng_host)
+        if self.web_search_provider == "brave":
+            return bool(self.brave_api_key)
+        if self.web_search_provider == "tavily":
+            return bool(self.tavily_api_key)
+        return False
 def get_settings() -> Settings:
     """Factory function to get settings (allows mocking in tests)."""

src/utils/models.py CHANGED Viewed

@@ -6,7 +6,7 @@ from typing import Any, ClassVar, Literal
 from pydantic import BaseModel, Field
 # Centralized source type - add new sources here (e.g., "biorxiv" in Phase 11)
-SourceName = Literal["pubmed", "clinicaltrials", "biorxiv", "europepmc", "preprint"]
 class Citation(BaseModel):
@@ -303,3 +303,269 @@ class OrchestratorConfig(BaseModel):
     max_iterations: int = Field(default=10, ge=1, le=20)
     max_results_per_tool: int = Field(default=10, ge=1, le=50)
     search_timeout: float = Field(default=30.0, ge=5.0, le=120.0)

 from pydantic import BaseModel, Field
 # Centralized source type - add new sources here (e.g., "biorxiv" in Phase 11)
+SourceName = Literal["pubmed", "clinicaltrials", "biorxiv", "europepmc", "preprint", "rag", "web"]
 class Citation(BaseModel):
     max_iterations: int = Field(default=10, ge=1, le=20)
     max_results_per_tool: int = Field(default=10, ge=1, le=50)
     search_timeout: float = Field(default=30.0, ge=5.0, le=120.0)
+# Models for iterative/deep research patterns
+class IterationData(BaseModel):
+    """Data for a single iteration of the research loop."""
+    gap: str = Field(description="The gap addressed in the iteration", default="")
+    tool_calls: list[str] = Field(description="The tool calls made", default_factory=list)
+    findings: list[str] = Field(
+        description="The findings collected from tool calls", default_factory=list
+    )
+    thought: str = Field(
+        description="The thinking done to reflect on the success of the iteration and next steps",
+        default="",
+    )
+    model_config = {"frozen": True}
+class Conversation(BaseModel):
+    """A conversation between the user and the iterative researcher."""
+    history: list[IterationData] = Field(
+        description="The data for each iteration of the research loop",
+        default_factory=list,
+    )
+    def add_iteration(self, iteration_data: IterationData | None = None) -> None:
+        """Add a new iteration to the conversation history."""
+        if iteration_data is None:
+            iteration_data = IterationData()
+        self.history.append(iteration_data)
+    def set_latest_gap(self, gap: str) -> None:
+        """Set the gap for the latest iteration."""
+        if not self.history:
+            self.add_iteration()
+        # Use model_copy() since IterationData is frozen
+        self.history[-1] = self.history[-1].model_copy(update={"gap": gap})
+    def set_latest_tool_calls(self, tool_calls: list[str]) -> None:
+        """Set the tool calls for the latest iteration."""
+        if not self.history:
+            self.add_iteration()
+        # Use model_copy() since IterationData is frozen
+        self.history[-1] = self.history[-1].model_copy(update={"tool_calls": tool_calls})
+    def set_latest_findings(self, findings: list[str]) -> None:
+        """Set the findings for the latest iteration."""
+        if not self.history:
+            self.add_iteration()
+        # Use model_copy() since IterationData is frozen
+        self.history[-1] = self.history[-1].model_copy(update={"findings": findings})
+    def set_latest_thought(self, thought: str) -> None:
+        """Set the thought for the latest iteration."""
+        if not self.history:
+            self.add_iteration()
+        # Use model_copy() since IterationData is frozen
+        self.history[-1] = self.history[-1].model_copy(update={"thought": thought})
+    def get_latest_gap(self) -> str:
+        """Get the gap from the latest iteration."""
+        if not self.history:
+            return ""
+        return self.history[-1].gap
+    def get_latest_tool_calls(self) -> list[str]:
+        """Get the tool calls from the latest iteration."""
+        if not self.history:
+            return []
+        return self.history[-1].tool_calls
+    def get_latest_findings(self) -> list[str]:
+        """Get the findings from the latest iteration."""
+        if not self.history:
+            return []
+        return self.history[-1].findings
+    def get_latest_thought(self) -> str:
+        """Get the thought from the latest iteration."""
+        if not self.history:
+            return ""
+        return self.history[-1].thought
+    def get_all_findings(self) -> list[str]:
+        """Get all findings from all iterations."""
+        return [finding for iteration_data in self.history for finding in iteration_data.findings]
+    def compile_conversation_history(self) -> str:
+        """Compile the conversation history into a string."""
+        conversation = ""
+        for iteration_num, iteration_data in enumerate(self.history):
+            conversation += f"[ITERATION {iteration_num + 1}]\n\n"
+            if iteration_data.thought:
+                conversation += f"{self.get_thought_string(iteration_num)}\n\n"
+            if iteration_data.gap:
+                conversation += f"{self.get_task_string(iteration_num)}\n\n"
+            if iteration_data.tool_calls:
+                conversation += f"{self.get_action_string(iteration_num)}\n\n"
+            if iteration_data.findings:
+                conversation += f"{self.get_findings_string(iteration_num)}\n\n"
+        return conversation
+    def get_task_string(self, iteration_num: int) -> str:
+        """Get the task for the specified iteration."""
+        if iteration_num < len(self.history) and self.history[iteration_num].gap:
+            return (
+                f"<task>\nAddress this knowledge gap: "
+                f"{self.history[iteration_num].gap}\n</task>"
+            )
+        return ""
+    def get_action_string(self, iteration_num: int) -> str:
+        """Get the action for the specified iteration."""
+        if iteration_num < len(self.history) and self.history[iteration_num].tool_calls:
+            joined_calls = "\n".join(self.history[iteration_num].tool_calls)
+            return (
+                "<action>\nCalling the following tools to address the knowledge gap:\n"
+                f"{joined_calls}\n</action>"
+            )
+        return ""
+    def get_findings_string(self, iteration_num: int) -> str:
+        """Get the findings for the specified iteration."""
+        if iteration_num < len(self.history) and self.history[iteration_num].findings:
+            joined_findings = "\n\n".join(self.history[iteration_num].findings)
+            return f"<findings>\n{joined_findings}\n</findings>"
+        return ""
+    def get_thought_string(self, iteration_num: int) -> str:
+        """Get the thought for the specified iteration."""
+        if iteration_num < len(self.history) and self.history[iteration_num].thought:
+            return f"<thought>\n{self.history[iteration_num].thought}\n</thought>"
+        return ""
+    def latest_task_string(self) -> str:
+        """Get the latest task."""
+        if not self.history:
+            return ""
+        return self.get_task_string(len(self.history) - 1)
+    def latest_action_string(self) -> str:
+        """Get the latest action."""
+        if not self.history:
+            return ""
+        return self.get_action_string(len(self.history) - 1)
+    def latest_findings_string(self) -> str:
+        """Get the latest findings."""
+        if not self.history:
+            return ""
+        return self.get_findings_string(len(self.history) - 1)
+    def latest_thought_string(self) -> str:
+        """Get the latest thought."""
+        if not self.history:
+            return ""
+        return self.get_thought_string(len(self.history) - 1)
+class ReportPlanSection(BaseModel):
+    """A section of the report that needs to be written."""
+    title: str = Field(description="The title of the section")
+    key_question: str = Field(description="The key question to be addressed in the section")
+    model_config = {"frozen": True}
+class ReportPlan(BaseModel):
+    """Output from the Report Planner Agent."""
+    background_context: str = Field(
+        description="A summary of supporting context that can be passed onto the research agents"
+    )
+    report_outline: list[ReportPlanSection] = Field(
+        description="List of sections that need to be written in the report"
+    )
+    report_title: str = Field(description="The title of the report")
+    model_config = {"frozen": True}
+class KnowledgeGapOutput(BaseModel):
+    """Output from the Knowledge Gap Agent."""
+    research_complete: bool = Field(
+        description="Whether the research and findings are complete enough to end the research loop"
+    )
+    outstanding_gaps: list[str] = Field(
+        description="List of knowledge gaps that still need to be addressed"
+    )
+    model_config = {"frozen": True}
+class AgentTask(BaseModel):
+    """A task for a specific agent to address knowledge gaps."""
+    gap: str | None = Field(description="The knowledge gap being addressed", default=None)
+    agent: str = Field(description="The name of the agent to use")
+    query: str = Field(description="The specific query for the agent")
+    entity_website: str | None = Field(
+        description="The website of the entity being researched, if known",
+        default=None,
+    )
+    model_config = {"frozen": True}
+class AgentSelectionPlan(BaseModel):
+    """Plan for which agents to use for knowledge gaps."""
+    tasks: list[AgentTask] = Field(description="List of agent tasks to address knowledge gaps")
+    model_config = {"frozen": True}
+class ReportDraftSection(BaseModel):
+    """A section of the report that needs to be written."""
+    section_title: str = Field(description="The title of the section")
+    section_content: str = Field(description="The content of the section")
+    model_config = {"frozen": True}
+class ReportDraft(BaseModel):
+    """Output from the Report Planner Agent."""
+    sections: list[ReportDraftSection] = Field(
+        description="List of sections that are in the report"
+    )
+    model_config = {"frozen": True}
+class ToolAgentOutput(BaseModel):
+    """Standard output for all tool agents."""
+    output: str = Field(description="The output from the tool agent")
+    sources: list[str] = Field(description="List of source URLs", default_factory=list)
+    model_config = {"frozen": True}
+class ParsedQuery(BaseModel):
+    """Parsed and improved user query with research mode detection."""
+    original_query: str = Field(description="The original user query")
+    improved_query: str = Field(description="Improved/refined query")
+    research_mode: Literal["iterative", "deep"] = Field(description="Detected research mode")
+    key_entities: list[str] = Field(
+        default_factory=list,
+        description="Key entities extracted from query",
+    )
+    research_questions: list[str] = Field(
+        default_factory=list,
+        description="Specific research questions extracted",
+    )
+    model_config = {"frozen": True}

tests/integration/test_deep_research.py ADDED Viewed

	@@ -0,0 +1,352 @@

+"""Integration tests for deep research flow.
+Tests the complete deep research pattern: plan → parallel loops → synthesis.
+"""
+from unittest.mock import AsyncMock, patch
+import pytest
+from src.middleware.state_machine import init_workflow_state
+from src.orchestrator.research_flow import DeepResearchFlow
+from src.utils.models import ReportPlan, ReportPlanSection
+@pytest.mark.integration
+class TestDeepResearchFlow:
+    """Integration tests for DeepResearchFlow."""
+    @pytest.mark.asyncio
+    async def test_deep_research_creates_plan(self) -> None:
+        """Test that deep research creates a report plan."""
+        # Initialize workflow state
+        init_workflow_state()
+        flow = DeepResearchFlow(
+            max_iterations=2,
+            max_time_minutes=5,
+            verbose=False,
+            use_graph=False,
+        )
+        # Mock the planner agent to return a simple plan
+        mock_plan = ReportPlan(
+            background_context="Test background context",
+            report_outline=[
+                ReportPlanSection(
+                    title="Section 1",
+                    key_question="What is the first question?",
+                ),
+                ReportPlanSection(
+                    title="Section 2",
+                    key_question="What is the second question?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        flow.planner_agent.run = AsyncMock(return_value=mock_plan)
+        # Mock the iterative research flows to return simple drafts
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            return f"# Draft for: {query}\n\nThis is a test draft."
+        # Mock the long writer to return a simple report
+        flow.long_writer_agent.write_report = AsyncMock(
+            return_value="# Test Report\n\n## Section 1\n\nDraft 1\n\n## Section 2\n\nDraft 2"
+        )
+        # We can't easily mock the IterativeResearchFlow.run() without more setup
+        # So we'll test the plan creation separately
+        plan = await flow._build_report_plan("Test query")
+        assert isinstance(plan, ReportPlan)
+        assert plan.report_title == "Test Report"
+        assert len(plan.report_outline) == 2
+        assert plan.report_outline[0].title == "Section 1"
+    @pytest.mark.asyncio
+    async def test_deep_research_parallel_loops_state_synchronization(self) -> None:
+        """Test that parallel loops properly synchronize state."""
+        # Initialize workflow state
+        state = init_workflow_state()
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+        )
+        # Create a simple report plan
+        report_plan = ReportPlan(
+            background_context="Test background",
+            report_outline=[
+                ReportPlanSection(
+                    title="Section 1",
+                    key_question="Question 1?",
+                ),
+                ReportPlanSection(
+                    title="Section 2",
+                    key_question="Question 2?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        # Mock iterative research flows to add evidence to state
+        from src.utils.models import Citation, Evidence
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            # Add evidence to state to test synchronization
+            ev = Evidence(
+                content=f"Evidence for {query}",
+                citation=Citation(
+                    source="pubmed",
+                    title=f"Title for {query}",
+                    url=f"https://example.com/{query.replace('?', '').replace(' ', '_')}",
+                    date="2024-01-01",
+                ),
+            )
+            state.add_evidence([ev])
+            return f"# Draft: {query}\n\nTest content."
+        # Patch IterativeResearchFlow.run
+        with patch(
+            "src.orchestrator.research_flow.IterativeResearchFlow.run",
+            side_effect=mock_iterative_run,
+        ):
+            section_drafts = await flow._run_research_loops(report_plan)
+        # Verify parallel execution
+        assert len(section_drafts) == 2
+        assert "Question 1" in section_drafts[0]
+        assert "Question 2" in section_drafts[1]
+        # Verify state has evidence from both sections
+        # Note: In real execution, evidence would be synced via WorkflowManager
+        # This test verifies the structure works
+    @pytest.mark.asyncio
+    async def test_deep_research_synthesizes_final_report(self) -> None:
+        """Test that deep research synthesizes final report from section drafts."""
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+            use_long_writer=True,
+        )
+        # Create report plan
+        report_plan = ReportPlan(
+            background_context="Test background",
+            report_outline=[
+                ReportPlanSection(
+                    title="Introduction",
+                    key_question="What is the topic?",
+                ),
+                ReportPlanSection(
+                    title="Conclusion",
+                    key_question="What are the conclusions?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        # Create section drafts
+        section_drafts = [
+            "# Introduction\n\nThis is the introduction section.",
+            "# Conclusion\n\nThis is the conclusion section.",
+        ]
+        # Mock long writer
+        flow.long_writer_agent.write_report = AsyncMock(
+            return_value="# Test Report\n\n## Introduction\n\nContent\n\n## Conclusion\n\nContent"
+        )
+        final_report = await flow._create_final_report("Test query", report_plan, section_drafts)
+        assert isinstance(final_report, str)
+        assert "Test Report" in final_report
+        # Verify long writer was called with correct parameters
+        flow.long_writer_agent.write_report.assert_called_once()
+        call_args = flow.long_writer_agent.write_report.call_args
+        assert call_args.kwargs["original_query"] == "Test query"
+        assert call_args.kwargs["report_title"] == "Test Report"
+        assert len(call_args.kwargs["report_draft"].sections) == 2
+    @pytest.mark.asyncio
+    async def test_deep_research_agent_chains_full_flow(self) -> None:
+        """Test full deep research flow with agent chains (mocked)."""
+        # Initialize workflow state
+        init_workflow_state()
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+        )
+        # Mock all agents
+        mock_plan = ReportPlan(
+            background_context="Background",
+            report_outline=[
+                ReportPlanSection(
+                    title="Section 1",
+                    key_question="Question 1?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        flow.planner_agent.run = AsyncMock(return_value=mock_plan)
+        # Mock iterative research
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            return f"# Draft\n\nAnswer to {query}"
+        with patch(
+            "src.orchestrator.research_flow.IterativeResearchFlow.run",
+            side_effect=mock_iterative_run,
+        ):
+            flow.long_writer_agent.write_report = AsyncMock(
+                return_value="# Test Report\n\n## Section 1\n\nDraft content"
+            )
+            # Run the full flow
+            result = await flow._run_with_chains("Test query")
+        assert isinstance(result, str)
+        assert "Test Report" in result
+        flow.planner_agent.run.assert_called_once()
+        flow.long_writer_agent.write_report.assert_called_once()
+    @pytest.mark.asyncio
+    async def test_deep_research_handles_multiple_sections(self) -> None:
+        """Test that deep research handles multiple sections correctly."""
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+        )
+        # Create plan with multiple sections
+        report_plan = ReportPlan(
+            background_context="Background",
+            report_outline=[
+                ReportPlanSection(
+                    title=f"Section {i}",
+                    key_question=f"Question {i}?",
+                )
+                for i in range(5)  # 5 sections
+            ],
+            report_title="Multi-Section Report",
+        )
+        # Mock iterative research to return unique drafts
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            section_num = query.split()[-1].replace("?", "")
+            return f"# Section {section_num} Draft\n\nContent for section {section_num}"
+        with patch(
+            "src.orchestrator.research_flow.IterativeResearchFlow.run",
+            side_effect=mock_iterative_run,
+        ):
+            section_drafts = await flow._run_research_loops(report_plan)
+        # Verify all sections were processed
+        assert len(section_drafts) == 5
+        for i, draft in enumerate(section_drafts):
+            assert f"Section {i}" in draft or f"section {i}" in draft.lower()
+    @pytest.mark.asyncio
+    async def test_deep_research_workflow_manager_integration(self) -> None:
+        """Test that deep research properly uses WorkflowManager."""
+        # Initialize workflow state
+        init_workflow_state()
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+        )
+        # Create report plan
+        report_plan = ReportPlan(
+            background_context="Background",
+            report_outline=[
+                ReportPlanSection(
+                    title="Section 1",
+                    key_question="Question 1?",
+                ),
+                ReportPlanSection(
+                    title="Section 2",
+                    key_question="Question 2?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        # Mock iterative research
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            return f"# Draft: {query}"
+        with patch(
+            "src.orchestrator.research_flow.IterativeResearchFlow.run",
+            side_effect=mock_iterative_run,
+        ):
+            section_drafts = await flow._run_research_loops(report_plan)
+        # Verify WorkflowManager was used (section_drafts should be returned)
+        assert len(section_drafts) == 2
+        # Each draft should be a string
+        assert all(isinstance(draft, str) for draft in section_drafts)
+    @pytest.mark.asyncio
+    async def test_deep_research_state_initialization(self) -> None:
+        """Test that deep research properly initializes workflow state."""
+        flow = DeepResearchFlow(
+            max_iterations=1,
+            max_time_minutes=2,
+            verbose=False,
+            use_graph=False,
+        )
+        # Mock the planner
+        mock_plan = ReportPlan(
+            background_context="Background",
+            report_outline=[
+                ReportPlanSection(
+                    title="Section 1",
+                    key_question="Question 1?",
+                ),
+            ],
+            report_title="Test Report",
+        )
+        flow.planner_agent.run = AsyncMock(return_value=mock_plan)
+        # Mock iterative research
+        async def mock_iterative_run(query: str, **kwargs: dict) -> str:
+            return "# Draft"
+        with patch(
+            "src.orchestrator.research_flow.IterativeResearchFlow.run",
+            side_effect=mock_iterative_run,
+        ):
+            flow.long_writer_agent.write_report = AsyncMock(return_value="# Test Report\n\nContent")
+            # Run with chains - should initialize state
+            # Note: _run_with_chains handles missing embedding service gracefully
+            await flow._run_with_chains("Test query")
+            # Verify state was initialized (get_workflow_state should not raise)
+            from src.middleware.state_machine import get_workflow_state
+            state = get_workflow_state()
+            assert state is not None

tests/integration/test_middleware_integration.py ADDED Viewed

	@@ -0,0 +1,245 @@

+"""Integration tests for middleware components.
+Tests the interaction between WorkflowState, WorkflowManager, and BudgetTracker.
+"""
+import pytest
+from src.middleware.budget_tracker import BudgetTracker
+from src.middleware.state_machine import init_workflow_state
+from src.middleware.workflow_manager import WorkflowManager
+from src.utils.models import Citation, Evidence
+@pytest.mark.integration
+class TestMiddlewareIntegration:
+    """Integration tests for middleware components."""
+    @pytest.mark.asyncio
+    async def test_state_manager_integration(self) -> None:
+        """Test WorkflowState and WorkflowManager integration."""
+        # Initialize state
+        state = init_workflow_state()
+        manager = WorkflowManager()
+        # Create a loop
+        loop = await manager.add_loop("test_loop", "Test query")
+        # Add evidence to loop
+        ev = Evidence(
+            content="Test evidence",
+            citation=Citation(
+                source="pubmed", title="Test Title", url="https://example.com/1", date="2024-01-01"
+            ),
+        )
+        await manager.add_loop_evidence("test_loop", [ev])
+        # Sync to global state
+        await manager.sync_loop_evidence_to_state("test_loop")
+        # Verify state has evidence
+        assert len(state.evidence) == 1
+        assert state.evidence[0].content == "Test evidence"
+        # Verify loop still has evidence
+        loop = await manager.get_loop("test_loop")
+        assert loop is not None
+        assert len(loop.evidence) == 1
+    @pytest.mark.asyncio
+    async def test_budget_tracker_with_workflow_manager(self) -> None:
+        """Test BudgetTracker integration with WorkflowManager."""
+        manager = WorkflowManager()
+        tracker = BudgetTracker()
+        # Create loop and budget
+        await manager.add_loop("budget_loop", "Test query")
+        tracker.create_budget("budget_loop", tokens_limit=1000, time_limit_seconds=60.0)
+        tracker.start_timer("budget_loop")
+        # Simulate some work
+        tracker.add_tokens("budget_loop", 500)
+        await manager.increment_loop_iteration("budget_loop")
+        tracker.increment_iteration("budget_loop")
+        # Check budget
+        can_continue = tracker.can_continue("budget_loop")
+        assert can_continue is True
+        # Exceed budget
+        tracker.add_tokens("budget_loop", 600)  # Total: 1100 > 1000
+        can_continue = tracker.can_continue("budget_loop")
+        assert can_continue is False
+        # Update loop status based on budget
+        if not can_continue:
+            await manager.update_loop_status("budget_loop", "cancelled")
+        loop = await manager.get_loop("budget_loop")
+        assert loop is not None
+        assert loop.status == "cancelled"
+    @pytest.mark.asyncio
+    async def test_parallel_loops_with_budget_tracking(self) -> None:
+        """Test parallel loops with budget tracking."""
+        async def mock_research_loop(config: dict) -> str:
+            """Mock research loop function."""
+            loop_id = config.get("loop_id", "unknown")
+            tracker = BudgetTracker()
+            manager = WorkflowManager()
+            # Get or create budget
+            budget = tracker.get_budget(loop_id)
+            if not budget:
+                tracker.create_budget(loop_id, tokens_limit=500, time_limit_seconds=10.0)
+                tracker.start_timer(loop_id)
+            # Simulate work
+            tracker.add_tokens(loop_id, 100)
+            await manager.increment_loop_iteration(loop_id)
+            tracker.increment_iteration(loop_id)
+            # Check if can continue
+            if not tracker.can_continue(loop_id):
+                await manager.update_loop_status(loop_id, "cancelled")
+                return f"Cancelled: {loop_id}"
+            await manager.update_loop_status(loop_id, "completed")
+            return f"Completed: {loop_id}"
+        manager = WorkflowManager()
+        tracker = BudgetTracker()
+        # Create budgets for all loops
+        configs = [
+            {"loop_id": "loop1", "query": "Query 1"},
+            {"loop_id": "loop2", "query": "Query 2"},
+            {"loop_id": "loop3", "query": "Query 3"},
+        ]
+        for config in configs:
+            loop_id = config["loop_id"]
+            await manager.add_loop(loop_id, config["query"])
+            tracker.create_budget(loop_id, tokens_limit=500, time_limit_seconds=10.0)
+            tracker.start_timer(loop_id)
+        # Run loops in parallel
+        results = await manager.run_loops_parallel(configs, mock_research_loop)
+        # Verify all loops completed
+        assert len(results) == 3
+        for config in configs:
+            loop_id = config["loop_id"]
+            loop = await manager.get_loop(loop_id)
+            assert loop is not None
+            assert loop.status in ("completed", "cancelled")
+    @pytest.mark.asyncio
+    async def test_state_conversation_integration(self) -> None:
+        """Test WorkflowState conversation integration."""
+        state = init_workflow_state()
+        # Add iteration data
+        state.conversation.add_iteration()
+        state.conversation.set_latest_gap("Knowledge gap 1")
+        state.conversation.set_latest_tool_calls(["tool1", "tool2"])
+        state.conversation.set_latest_findings(["finding1", "finding2"])
+        state.conversation.set_latest_thought("Thought about findings")
+        # Verify conversation history
+        assert len(state.conversation.history) == 1
+        assert state.conversation.get_latest_gap() == "Knowledge gap 1"
+        assert len(state.conversation.get_latest_tool_calls()) == 2
+        assert len(state.conversation.get_latest_findings()) == 2
+        # Compile history
+        history_str = state.conversation.compile_conversation_history()
+        assert "Knowledge gap 1" in history_str
+        assert "tool1" in history_str
+        assert "finding1" in history_str
+        assert "Thought about findings" in history_str
+    @pytest.mark.asyncio
+    async def test_multiple_iterations_with_budget(self) -> None:
+        """Test multiple iterations with budget enforcement."""
+        manager = WorkflowManager()
+        tracker = BudgetTracker()
+        loop_id = "iterative_loop"
+        await manager.add_loop(loop_id, "Iterative query")
+        tracker.create_budget(loop_id, tokens_limit=1000, iterations_limit=5)
+        tracker.start_timer(loop_id)
+        # Simulate multiple iterations
+        for _ in range(7):  # Try 7 iterations, but limit is 5
+            tracker.add_tokens(loop_id, 100)
+            await manager.increment_loop_iteration(loop_id)
+            tracker.increment_iteration(loop_id)
+            can_continue = tracker.can_continue(loop_id)
+            if not can_continue:
+                await manager.update_loop_status(loop_id, "cancelled")
+                break
+        loop = await manager.get_loop(loop_id)
+        assert loop is not None
+        # Should be cancelled after 5 iterations
+        assert loop.status == "cancelled"
+        assert loop.iteration_count == 5
+    @pytest.mark.asyncio
+    async def test_evidence_deduplication_across_loops(self) -> None:
+        """Test evidence deduplication when syncing from multiple loops."""
+        state = init_workflow_state()
+        manager = WorkflowManager()
+        # Create two loops with same evidence
+        ev1 = Evidence(
+            content="Same content",
+            citation=Citation(
+                source="pubmed", title="Title", url="https://example.com/1", date="2024"
+            ),
+        )
+        ev2 = Evidence(
+            content="Different content",
+            citation=Citation(
+                source="pubmed", title="Title 2", url="https://example.com/2", date="2024"
+            ),
+        )
+        # Add to loop1
+        await manager.add_loop("loop1", "Query 1")
+        await manager.add_loop_evidence("loop1", [ev1, ev2])
+        await manager.sync_loop_evidence_to_state("loop1")
+        # Add duplicate to loop2
+        await manager.add_loop("loop2", "Query 2")
+        ev1_duplicate = Evidence(
+            content="Same content (duplicate)",
+            citation=Citation(
+                source="pubmed", title="Title Duplicate", url="https://example.com/1", date="2024"
+            ),
+        )
+        await manager.add_loop_evidence("loop2", [ev1_duplicate])
+        await manager.sync_loop_evidence_to_state("loop2")
+        # State should have only 2 unique items (deduplicated by URL)
+        assert len(state.evidence) == 2
+    @pytest.mark.asyncio
+    async def test_global_budget_enforcement(self) -> None:
+        """Test global budget enforcement across all loops."""
+        tracker = BudgetTracker()
+        tracker.set_global_budget(tokens_limit=2000, time_limit_seconds=60.0)
+        # Simulate multiple loops consuming global budget
+        tracker.add_global_tokens(500)  # Loop 1
+        tracker.add_global_tokens(600)  # Loop 2
+        tracker.add_global_tokens(700)  # Loop 3
+        tracker.add_global_tokens(300)  # Loop 4 - would exceed
+        global_budget = tracker.get_global_budget()
+        assert global_budget is not None
+        assert global_budget.tokens_used == 2100
+        assert global_budget.is_exceeded() is True

tests/integration/test_parallel_loops_judge.py ADDED Viewed

	@@ -0,0 +1,396 @@

+"""Integration tests for Phase 7: Parallel loops with judge-based completion.
+These tests verify that WorkflowManager can coordinate parallel research loops
+and use the judge to determine when loops should complete.
+"""
+from unittest.mock import AsyncMock, MagicMock, patch
+import pytest
+from src.middleware.workflow_manager import WorkflowManager
+from src.orchestrator.research_flow import IterativeResearchFlow
+from src.utils.models import Citation, Evidence, JudgeAssessment
+@pytest.fixture
+def mock_judge_handler():
+    """Create a mock judge handler."""
+    judge = MagicMock()
+    judge.assess = AsyncMock()
+    return judge
+@pytest.fixture
+def mock_iterative_flow():
+    """Create a mock iterative research flow."""
+    flow = MagicMock(spec=IterativeResearchFlow)
+    flow.run = AsyncMock(return_value="# Test Report\n\nContent here.")
+    return flow
+@pytest.mark.integration
+@pytest.mark.asyncio
+class TestParallelLoopsWithJudge:
+    """Tests for parallel loops with judge-based completion."""
+    async def test_get_loop_evidence(self):
+        """get_loop_evidence should return evidence from a loop."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        # Add evidence to the loop
+        evidence = [
+            Evidence(
+                content="Test evidence",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.8,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        # Retrieve evidence
+        retrieved_evidence = await manager.get_loop_evidence("loop1")
+        assert len(retrieved_evidence) == 1
+        assert retrieved_evidence[0].content == "Test evidence"
+    async def test_get_loop_evidence_returns_empty_for_missing_loop(self):
+        """get_loop_evidence should return empty list for non-existent loop."""
+        manager = WorkflowManager()
+        evidence = await manager.get_loop_evidence("nonexistent")
+        assert evidence == []
+    async def test_check_loop_completion_with_sufficient_evidence(self, mock_judge_handler):
+        """check_loop_completion should return True when judge says sufficient."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        # Add evidence
+        evidence = [
+            Evidence(
+                content="Comprehensive evidence",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.9,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        # Mock judge to say sufficient
+        from src.utils.models import AssessmentDetails
+        mock_judge_handler.assess = AsyncMock(
+            return_value=JudgeAssessment(
+                details=AssessmentDetails(
+                    mechanism_score=5,
+                    mechanism_reasoning="Test mechanism reasoning that is long enough",
+                    clinical_evidence_score=5,
+                    clinical_reasoning="Test clinical reasoning that is long enough",
+                    drug_candidates=[],
+                    key_findings=[],
+                ),
+                sufficient=True,
+                confidence=0.95,
+                recommendation="synthesize",
+                reasoning="Evidence is sufficient to provide a comprehensive answer.",
+            )
+        )
+        should_complete, reason = await manager.check_loop_completion(
+            "loop1", "Test query", mock_judge_handler
+        )
+        assert should_complete is True
+        assert "sufficient" in reason.lower() or "judge" in reason.lower()
+        assert mock_judge_handler.assess.called
+    async def test_check_loop_completion_with_insufficient_evidence(self, mock_judge_handler):
+        """check_loop_completion should return False when judge says insufficient."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        # Add minimal evidence
+        evidence = [
+            Evidence(
+                content="Minimal evidence",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.3,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        # Mock judge to say insufficient
+        from src.utils.models import AssessmentDetails
+        mock_judge_handler.assess = AsyncMock(
+            return_value=JudgeAssessment(
+                details=AssessmentDetails(
+                    mechanism_score=3,
+                    mechanism_reasoning="Test mechanism reasoning that is long enough",
+                    clinical_evidence_score=3,
+                    clinical_reasoning="Test clinical reasoning that is long enough",
+                    drug_candidates=[],
+                    key_findings=[],
+                ),
+                sufficient=False,
+                confidence=0.4,
+                recommendation="continue",
+                reasoning="Need more evidence to provide a comprehensive answer.",
+            )
+        )
+        should_complete, reason = await manager.check_loop_completion(
+            "loop1", "Test query", mock_judge_handler
+        )
+        assert should_complete is False
+        assert "judge" in reason.lower() or "evidence" in reason.lower()
+        assert mock_judge_handler.assess.called
+    async def test_check_loop_completion_with_no_evidence(self, mock_judge_handler):
+        """check_loop_completion should return False when no evidence exists."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        # Don't add any evidence
+        should_complete, reason = await manager.check_loop_completion(
+            "loop1", "Test query", mock_judge_handler
+        )
+        assert should_complete is False
+        assert "no evidence" in reason.lower() or "not" in reason.lower()
+        # Judge should not be called if no evidence
+        assert not mock_judge_handler.assess.called
+    async def test_check_loop_completion_handles_judge_error(self, mock_judge_handler):
+        """check_loop_completion should handle judge errors gracefully."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        evidence = [
+            Evidence(
+                content="Test evidence",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.8,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        # Mock judge to raise error
+        mock_judge_handler.assess = AsyncMock(side_effect=Exception("Judge error"))
+        should_complete, reason = await manager.check_loop_completion(
+            "loop1", "Test query", mock_judge_handler
+        )
+        assert should_complete is False
+        assert "error" in reason.lower() or "failed" in reason.lower()
+    async def test_parallel_loops_with_judge_early_termination(
+        self, mock_judge_handler, mock_iterative_flow
+    ):
+        """Parallel loops should terminate early when judge says sufficient."""
+        manager = WorkflowManager()
+        # Create multiple loops
+        loop_configs = [
+            {"loop_id": "loop1", "query": "Query 1"},
+            {"loop_id": "loop2", "query": "Query 2"},
+        ]
+        # Define loop function that extracts loop_func from config if needed
+        async def loop_func(config: dict) -> str:
+            return await mock_iterative_flow.run(config.get("query", ""))
+        # Add evidence to loop1 that will trigger early completion
+        await manager.add_loop("loop1", "Query 1")
+        evidence = [
+            Evidence(
+                content="Comprehensive evidence for query 1",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.95,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        # Mock judge to say sufficient for loop1
+        call_count = {"count": 0}
+        def mock_assess(query: str, evidence_list: list[Evidence]) -> JudgeAssessment:
+            from src.utils.models import AssessmentDetails
+            call_count["count"] += 1
+            if "Query 1" in query or len(evidence_list) > 0:
+                return JudgeAssessment(
+                    details=AssessmentDetails(
+                        mechanism_score=5,
+                        mechanism_reasoning="Test mechanism reasoning that is long enough",
+                        clinical_evidence_score=5,
+                        clinical_reasoning="Test clinical reasoning that is long enough",
+                        drug_candidates=[],
+                        key_findings=[],
+                    ),
+                    sufficient=True,
+                    confidence=0.95,
+                    recommendation="synthesize",
+                    reasoning="Sufficient evidence has been collected to answer the query.",
+                )
+            return JudgeAssessment(
+                details=AssessmentDetails(
+                    mechanism_score=3,
+                    mechanism_reasoning="Test mechanism reasoning that is long enough",
+                    clinical_evidence_score=3,
+                    clinical_reasoning="Test clinical reasoning that is long enough",
+                    drug_candidates=[],
+                    key_findings=[],
+                ),
+                sufficient=False,
+                confidence=0.5,
+                recommendation="continue",
+                reasoning="Need more evidence to provide a comprehensive answer.",
+            )
+        mock_judge_handler.assess = AsyncMock(side_effect=mock_assess)
+        # Run loops in parallel
+        with patch("src.middleware.workflow_manager.get_workflow_state") as mock_state:
+            mock_state_obj = MagicMock()
+            mock_state_obj.evidence = []
+            mock_state.return_value = mock_state_obj
+            results = await manager.run_loops_parallel(
+                loop_configs, loop_func=loop_func, judge_handler=mock_judge_handler
+            )
+            # Both loops should complete
+            assert len(results) == 2
+            assert all(isinstance(r, str) for r in results)
+    async def test_parallel_loops_aggregate_evidence(self, mock_judge_handler):
+        """Parallel loops should aggregate evidence from all loops."""
+        manager = WorkflowManager()
+        # Create loops
+        await manager.add_loop("loop1", "Query 1")
+        await manager.add_loop("loop2", "Query 2")
+        # Add evidence to each loop
+        evidence1 = [
+            Evidence(
+                content="Evidence from loop 1",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test 1",
+                    url="https://example.com/1",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.8,
+            )
+        ]
+        evidence2 = [
+            Evidence(
+                content="Evidence from loop 2",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test 2",
+                    url="https://example.com/2",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.9,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence1)
+        await manager.add_loop_evidence("loop2", evidence2)
+        # Get evidence from both loops
+        evidence1_retrieved = await manager.get_loop_evidence("loop1")
+        evidence2_retrieved = await manager.get_loop_evidence("loop2")
+        assert len(evidence1_retrieved) == 1
+        assert len(evidence2_retrieved) == 1
+        assert evidence1_retrieved[0].content == "Evidence from loop 1"
+        assert evidence2_retrieved[0].content == "Evidence from loop 2"
+    async def test_loop_status_updated_on_completion(self, mock_judge_handler):
+        """Loop status should be updated when judge determines completion."""
+        manager = WorkflowManager()
+        await manager.add_loop("loop1", "Test query")
+        # Add sufficient evidence
+        evidence = [
+            Evidence(
+                content="Sufficient evidence",
+                citation=Citation(
+                    source="rag",  # Use valid SourceName
+                    title="Test",
+                    url="https://example.com",
+                    date="2024-01-01",
+                    authors=[],
+                ),
+                relevance=0.95,
+            )
+        ]
+        await manager.add_loop_evidence("loop1", evidence)
+        from src.utils.models import AssessmentDetails
+        mock_judge_handler.assess = AsyncMock(
+            return_value=JudgeAssessment(
+                details=AssessmentDetails(
+                    mechanism_score=5,
+                    mechanism_reasoning="Test mechanism reasoning that is long enough",
+                    clinical_evidence_score=5,
+                    clinical_reasoning="Test clinical reasoning that is long enough",
+                    drug_candidates=[],
+                    key_findings=[],
+                ),
+                sufficient=True,
+                confidence=0.95,
+                recommendation="synthesize",
+                reasoning="Complete evidence has been collected to answer the query.",
+            )
+        )
+        # Check completion (this should update status internally if implemented)
+        should_complete, _ = await manager.check_loop_completion(
+            "loop1", "Test query", mock_judge_handler
+        )
+        assert should_complete is True
+        # Status update would happen in run_loops_parallel, not in check_loop_completion
+        loop = await manager.get_loop("loop1")
+        assert loop is not None
+        # Status might still be "pending" or "running" until run_loops_parallel updates it

tests/integration/test_rag_integration.py ADDED Viewed

	@@ -0,0 +1,343 @@

+"""Integration tests for RAG integration.
+These tests require OPENAI_API_KEY and may make real API calls.
+Marked with @pytest.mark.integration to skip in unit test runs.
+"""
+import pytest
+from src.services.llamaindex_rag import get_rag_service
+from src.tools.rag_tool import create_rag_tool
+from src.tools.search_handler import SearchHandler
+from src.tools.tool_executor import execute_agent_task
+from src.utils.config import settings
+from src.utils.models import AgentTask, Citation, Evidence
+@pytest.mark.integration
+class TestRAGServiceIntegration:
+    """Integration tests for LlamaIndexRAGService."""
+    @pytest.mark.asyncio
+    async def test_rag_service_ingest_and_retrieve(self):
+        """RAG service should ingest and retrieve evidence."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Create RAG service
+        rag_service = get_rag_service(collection_name="test_integration")
+        # Create sample evidence
+        evidence_list = [
+            Evidence(
+                content="Metformin is a first-line treatment for type 2 diabetes. It works by reducing glucose production in the liver and improving insulin sensitivity.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Metformin Mechanism of Action",
+                    url="https://pubmed.ncbi.nlm.nih.gov/12345678/",
+                    date="2024-01-15",
+                    authors=["Smith J", "Johnson M"],
+                ),
+                relevance=0.9,
+            ),
+            Evidence(
+                content="Recent studies suggest metformin may have neuroprotective effects in Alzheimer's disease models.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Metformin and Neuroprotection",
+                    url="https://pubmed.ncbi.nlm.nih.gov/12345679/",
+                    date="2024-02-20",
+                    authors=["Brown K", "Davis L"],
+                ),
+                relevance=0.85,
+            ),
+        ]
+        # Ingest evidence
+        rag_service.ingest_evidence(evidence_list)
+        # Retrieve evidence
+        results = rag_service.retrieve("metformin diabetes", top_k=2)
+        # Assert
+        assert len(results) > 0
+        assert any("metformin" in r["text"].lower() for r in results)
+        assert all("text" in r for r in results)
+        assert all("metadata" in r for r in results)
+        # Cleanup
+        rag_service.clear_collection()
+    @pytest.mark.asyncio
+    async def test_rag_service_query(self):
+        """RAG service should synthesize responses from ingested evidence."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        rag_service = get_rag_service(collection_name="test_query")
+        # Ingest evidence
+        evidence_list = [
+            Evidence(
+                content="Python is a high-level programming language known for its simplicity and readability.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Python Programming",
+                    url="https://example.com/python",
+                    date="2024",
+                    authors=["Author"],
+                ),
+            )
+        ]
+        rag_service.ingest_evidence(evidence_list)
+        # Query
+        response = rag_service.query("What is Python?", top_k=1)
+        assert isinstance(response, str)
+        assert len(response) > 0
+        assert "python" in response.lower()
+        # Cleanup
+        rag_service.clear_collection()
+@pytest.mark.integration
+class TestRAGToolIntegration:
+    """Integration tests for RAGTool."""
+    @pytest.mark.asyncio
+    async def test_rag_tool_search(self):
+        """RAGTool should search RAG service and return Evidence objects."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Create RAG service and ingest evidence
+        rag_service = get_rag_service(collection_name="test_rag_tool")
+        evidence_list = [
+            Evidence(
+                content="Machine learning is a subset of artificial intelligence.",
+                citation=Citation(
+                    source="pubmed",
+                    title="ML Basics",
+                    url="https://example.com/ml",
+                    date="2024",
+                    authors=["ML Expert"],
+                ),
+            )
+        ]
+        rag_service.ingest_evidence(evidence_list)
+        # Create RAG tool
+        tool = create_rag_tool(rag_service=rag_service)
+        # Search
+        results = await tool.search("machine learning", max_results=5)
+        # Assert
+        assert len(results) > 0
+        assert all(isinstance(e, Evidence) for e in results)
+        assert results[0].citation.source == "rag"
+        assert (
+            "machine learning" in results[0].content.lower()
+            or "artificial intelligence" in results[0].content.lower()
+        )
+        # Cleanup
+        rag_service.clear_collection()
+    @pytest.mark.asyncio
+    async def test_rag_tool_empty_collection(self):
+        """RAGTool should return empty list when collection is empty."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        rag_service = get_rag_service(collection_name="test_empty")
+        rag_service.clear_collection()  # Ensure empty
+        tool = create_rag_tool(rag_service=rag_service)
+        results = await tool.search("any query")
+        assert results == []
+@pytest.mark.integration
+class TestRAGAgentIntegration:
+    """Integration tests for RAGAgent in tool executor."""
+    @pytest.mark.asyncio
+    async def test_rag_agent_execution(self):
+        """RAGAgent should execute and return ToolAgentOutput."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Setup: Ingest evidence into RAG
+        rag_service = get_rag_service(collection_name="test_rag_agent")
+        evidence_list = [
+            Evidence(
+                content="Deep learning uses neural networks with multiple layers.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Deep Learning",
+                    url="https://example.com/dl",
+                    date="2024",
+                    authors=["DL Researcher"],
+                ),
+            )
+        ]
+        rag_service.ingest_evidence(evidence_list)
+        # Execute RAGAgent task
+        task = AgentTask(
+            agent="RAGAgent",
+            query="deep learning",
+            gap="Need information about deep learning",
+        )
+        result = await execute_agent_task(task)
+        # Assert
+        assert result.output
+        assert "deep learning" in result.output.lower() or "neural network" in result.output.lower()
+        assert len(result.sources) > 0
+        # Cleanup
+        rag_service.clear_collection()
+@pytest.mark.integration
+class TestRAGSearchHandlerIntegration:
+    """Integration tests for RAG in SearchHandler."""
+    @pytest.mark.asyncio
+    async def test_search_handler_with_rag(self):
+        """SearchHandler should work with RAG tool included."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Setup: Create RAG service and ingest some evidence
+        rag_service = get_rag_service(collection_name="test_search_handler")
+        evidence_list = [
+            Evidence(
+                content="Test evidence for search handler integration.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Test Evidence",
+                    url="https://example.com/test",
+                    date="2024",
+                    authors=["Tester"],
+                ),
+            )
+        ]
+        rag_service.ingest_evidence(evidence_list)
+        # Create SearchHandler with RAG
+        handler = SearchHandler(
+            tools=[],  # No other tools
+            include_rag=True,
+            auto_ingest_to_rag=False,  # Don't auto-ingest (already has data)
+        )
+        # Execute search
+        result = await handler.execute("test evidence", max_results_per_tool=5)
+        # Assert
+        assert result.total_found > 0
+        assert "rag" in result.sources_searched
+        assert any(e.citation.source == "rag" for e in result.evidence)
+        # Cleanup
+        rag_service.clear_collection()
+    @pytest.mark.asyncio
+    async def test_search_handler_auto_ingest(self):
+        """SearchHandler should auto-ingest evidence into RAG."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Create empty RAG service
+        rag_service = get_rag_service(collection_name="test_auto_ingest")
+        rag_service.clear_collection()
+        # Create mock tool that returns evidence
+        from unittest.mock import AsyncMock
+        mock_tool = AsyncMock()
+        mock_tool.name = "pubmed"
+        mock_tool.search = AsyncMock(
+            return_value=[
+                Evidence(
+                    content="Evidence to be ingested",
+                    citation=Citation(
+                        source="pubmed",
+                        title="Test",
+                        url="https://example.com",
+                        date="2024",
+                        authors=[],
+                    ),
+                )
+            ]
+        )
+        # Create handler with auto-ingest enabled
+        handler = SearchHandler(
+            tools=[mock_tool],
+            include_rag=False,  # Don't include RAG as search tool
+            auto_ingest_to_rag=True,
+        )
+        handler._rag_service = rag_service  # Inject RAG service
+        # Execute search
+        await handler.execute("test query")
+        # Verify evidence was ingested
+        rag_results = rag_service.retrieve("Evidence to be ingested", top_k=1)
+        assert len(rag_results) > 0
+        # Cleanup
+        rag_service.clear_collection()
+@pytest.mark.integration
+class TestRAGHybridSearchIntegration:
+    """Integration tests for hybrid search (RAG + database)."""
+    @pytest.mark.asyncio
+    async def test_hybrid_search_rag_and_pubmed(self):
+        """SearchHandler should support RAG + PubMed hybrid search."""
+        if not settings.openai_api_key:
+            pytest.skip("OPENAI_API_KEY required for RAG integration tests")
+        # Setup: Ingest evidence into RAG
+        rag_service = get_rag_service(collection_name="test_hybrid")
+        evidence_list = [
+            Evidence(
+                content="Previously collected evidence about metformin.",
+                citation=Citation(
+                    source="pubmed",
+                    title="Previous Research",
+                    url="https://example.com/prev",
+                    date="2024",
+                    authors=[],
+                ),
+            )
+        ]
+        rag_service.ingest_evidence(evidence_list)
+        # Note: This test would require real PubMed API access
+        # For now, we'll just test that the handler can be created with both tools
+        from src.tools.pubmed import PubMedTool
+        handler = SearchHandler(
+            tools=[PubMedTool()],
+            include_rag=True,
+            auto_ingest_to_rag=True,
+        )
+        # Verify handler has both tools
+        tool_names = [t.name for t in handler.tools]
+        assert "pubmed" in tool_names
+        assert "rag" in tool_names
+        # Cleanup
+        rag_service.clear_collection()

tests/integration/test_research_flows.py ADDED Viewed

	@@ -0,0 +1,584 @@

+"""Integration tests for research flows.
+These tests require API keys and may make real API calls.
+Marked with @pytest.mark.integration to skip in unit test runs.
+"""
+import pytest
+from src.agent_factory.agents import (
+    create_deep_flow,
+    create_iterative_flow,
+    create_planner_agent,
+)
+from src.orchestrator.graph_orchestrator import create_graph_orchestrator
+from src.utils.config import settings
+@pytest.mark.integration
+class TestPlannerAgentIntegration:
+    """Integration tests for PlannerAgent with real API calls."""
+    @pytest.mark.asyncio
+    async def test_planner_agent_creates_plan(self):
+        """PlannerAgent should create a valid report plan with real API."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        planner = create_planner_agent()
+        result = await planner.run("What are the main features of Python programming language?")
+        assert result.report_title
+        assert len(result.report_outline) > 0
+        assert result.report_outline[0].title
+        assert result.report_outline[0].key_question
+    @pytest.mark.asyncio
+    async def test_planner_agent_includes_background_context(self):
+        """PlannerAgent should include background context in plan."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        planner = create_planner_agent()
+        result = await planner.run("Explain quantum computing basics")
+        assert result.background_context
+        assert len(result.background_context) > 50  # Should have substantial context
+@pytest.mark.integration
+class TestIterativeResearchFlowIntegration:
+    """Integration tests for IterativeResearchFlow with real API calls."""
+    @pytest.mark.asyncio
+    async def test_iterative_flow_completes_simple_query(self):
+        """IterativeResearchFlow should complete a simple research query."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=2, max_time_minutes=2)
+        result = await flow.run(
+            query="What is the capital of France?",
+            output_length="A short paragraph",
+        )
+        assert isinstance(result, str)
+        assert len(result) > 0
+        # Should mention Paris
+        assert "paris" in result.lower() or "france" in result.lower()
+    @pytest.mark.asyncio
+    async def test_iterative_flow_respects_max_iterations(self):
+        """IterativeResearchFlow should respect max_iterations limit."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=1, max_time_minutes=5)
+        result = await flow.run(query="What are the main features of Python?")
+        assert isinstance(result, str)
+        # Should complete within 1 iteration (or hit max)
+        assert flow.iteration <= 1
+    @pytest.mark.asyncio
+    async def test_iterative_flow_with_background_context(self):
+        """IterativeResearchFlow should use background context."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=2, max_time_minutes=2)
+        result = await flow.run(
+            query="What is machine learning?",
+            background_context="Machine learning is a subset of artificial intelligence.",
+        )
+        assert isinstance(result, str)
+        assert len(result) > 0
+@pytest.mark.integration
+class TestDeepResearchFlowIntegration:
+    """Integration tests for DeepResearchFlow with real API calls."""
+    @pytest.mark.asyncio
+    async def test_deep_flow_creates_multi_section_report(self):
+        """DeepResearchFlow should create a report with multiple sections."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,  # Keep it short for testing
+            max_time_minutes=3,
+        )
+        result = await flow.run("What are the main features of Python programming language?")
+        assert isinstance(result, str)
+        assert len(result) > 100  # Should have substantial content
+        # Should have section structure
+        assert "#" in result or "##" in result
+    @pytest.mark.asyncio
+    async def test_deep_flow_uses_long_writer(self):
+        """DeepResearchFlow should use long writer by default."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=True,
+        )
+        result = await flow.run("Explain the basics of quantum computing")
+        assert isinstance(result, str)
+        assert len(result) > 0
+    @pytest.mark.asyncio
+    async def test_deep_flow_uses_proofreader_when_specified(self):
+        """DeepResearchFlow should use proofreader when use_long_writer=False."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=False,
+        )
+        result = await flow.run("What is artificial intelligence?")
+        assert isinstance(result, str)
+        assert len(result) > 0
+@pytest.mark.integration
+class TestGraphOrchestratorIntegration:
+    """Integration tests for GraphOrchestrator with real API calls."""
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_iterative_mode(self):
+        """GraphOrchestrator should run in iterative mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="iterative",
+            max_iterations=1,
+            max_time_minutes=2,
+        )
+        events = []
+        async for event in orchestrator.run("What is Python?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_deep_mode(self):
+        """GraphOrchestrator should run in deep mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="deep",
+            max_iterations=1,
+            max_time_minutes=3,
+        )
+        events = []
+        async for event in orchestrator.run("What are the main features of Python?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_auto_mode(self):
+        """GraphOrchestrator should auto-detect research mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="auto",
+            max_iterations=1,
+            max_time_minutes=2,
+        )
+        events = []
+        async for event in orchestrator.run("What is Python?"):
+            events.append(event)
+        assert len(events) > 0
+        # Should complete successfully regardless of mode
+        event_types = [e.type for e in events]
+        assert "complete" in event_types
+@pytest.mark.integration
+class TestGraphOrchestrationIntegration:
+    """Integration tests for graph-based orchestration with real API calls."""
+    @pytest.mark.asyncio
+    async def test_iterative_flow_with_graph_execution(self):
+        """IterativeResearchFlow should work with graph execution enabled."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(
+            max_iterations=1,
+            max_time_minutes=2,
+            use_graph=True,
+        )
+        result = await flow.run(query="What is the capital of France?")
+        assert isinstance(result, str)
+        assert len(result) > 0
+        # Should mention Paris
+        assert "paris" in result.lower() or "france" in result.lower()
+    @pytest.mark.asyncio
+    async def test_deep_flow_with_graph_execution(self):
+        """DeepResearchFlow should work with graph execution enabled."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_graph=True,
+        )
+        result = await flow.run("What are the main features of Python programming language?")
+        assert isinstance(result, str)
+        assert len(result) > 100  # Should have substantial content
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_with_graph_execution(self):
+        """GraphOrchestrator should work with graph execution enabled."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="iterative",
+            max_iterations=1,
+            max_time_minutes=2,
+            use_graph=True,
+        )
+        events = []
+        async for event in orchestrator.run("What is Python?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+        # Extract final report from complete event
+        complete_events = [e for e in events if e.type == "complete"]
+        assert len(complete_events) > 0
+        final_report = complete_events[0].message
+        assert isinstance(final_report, str)
+        assert len(final_report) > 0
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_parallel_execution(self):
+        """GraphOrchestrator should support parallel execution in deep mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="deep",
+            max_iterations=1,
+            max_time_minutes=3,
+            use_graph=True,
+        )
+        events = []
+        async for event in orchestrator.run("What are the main features of Python?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+    @pytest.mark.asyncio
+    async def test_graph_vs_chain_execution_comparison(self):
+        """Both graph and chain execution should produce similar results."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        query = "What is the capital of France?"
+        # Run with graph execution
+        flow_graph = create_iterative_flow(
+            max_iterations=1,
+            max_time_minutes=2,
+            use_graph=True,
+        )
+        result_graph = await flow_graph.run(query=query)
+        # Run with agent chains
+        flow_chains = create_iterative_flow(
+            max_iterations=1,
+            max_time_minutes=2,
+            use_graph=False,
+        )
+        result_chains = await flow_chains.run(query=query)
+        # Both should produce valid results
+        assert isinstance(result_graph, str)
+        assert isinstance(result_chains, str)
+        assert len(result_graph) > 0
+        assert len(result_chains) > 0
+        # Both should mention the answer (Paris)
+        assert "paris" in result_graph.lower() or "france" in result_graph.lower()
+        assert "paris" in result_chains.lower() or "france" in result_chains.lower()
+@pytest.mark.integration
+class TestReportSynthesisIntegration:
+    """Integration tests for report synthesis with writer agents."""
+    @pytest.mark.asyncio
+    async def test_iterative_flow_generates_report(self):
+        """IterativeResearchFlow should generate a report with writer agent."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=1, max_time_minutes=2)
+        result = await flow.run(
+            query="What is the capital of France?",
+            output_length="A short paragraph",
+        )
+        assert isinstance(result, str)
+        assert len(result) > 0
+        # Should be a formatted report
+        assert "paris" in result.lower() or "france" in result.lower()
+        # Should have some structure (markdown headers or content)
+        assert len(result) > 50
+    @pytest.mark.asyncio
+    async def test_iterative_flow_includes_citations(self):
+        """IterativeResearchFlow should include citations in the report."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=1, max_time_minutes=2)
+        result = await flow.run(
+            query="What is machine learning?",
+            output_length="A short paragraph",
+        )
+        assert isinstance(result, str)
+        # Should have some form of citations or references
+        # (either [1], [2] format or References section)
+        # Note: Citations may not always be present depending on findings
+        # This is a soft check - just verify report was generated
+        assert len(result) > 0
+    @pytest.mark.asyncio
+    async def test_iterative_flow_handles_empty_findings(self):
+        """IterativeResearchFlow should handle empty findings gracefully."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_iterative_flow(max_iterations=1, max_time_minutes=1)
+        # Use a query that might not return findings quickly
+        result = await flow.run(
+            query="Test query with no findings",
+            output_length="A short paragraph",
+        )
+        # Should still return a report (even if minimal)
+        assert isinstance(result, str)
+        # Writer agent should handle empty findings with fallback
+    @pytest.mark.asyncio
+    async def test_deep_flow_with_long_writer(self):
+        """DeepResearchFlow should use long writer to create sections."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=True,
+        )
+        result = await flow.run("What are the main features of Python programming language?")
+        assert isinstance(result, str)
+        assert len(result) > 100  # Should have substantial content
+        # Should have section structure (table of contents or sections)
+        has_structure = (
+            "##" in result
+            or "#" in result
+            or "table of contents" in result.lower()
+            or "introduction" in result.lower()
+        )
+        # Long writer should create structured report
+        assert has_structure or len(result) > 200
+    @pytest.mark.asyncio
+    async def test_deep_flow_creates_sections(self):
+        """DeepResearchFlow should create multiple sections in the report."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=True,
+        )
+        result = await flow.run("Explain the basics of quantum computing")
+        assert isinstance(result, str)
+        # Should have multiple sections (indicated by headers)
+        # Should have at least some structure
+        assert len(result) > 100
+    @pytest.mark.asyncio
+    async def test_deep_flow_aggregates_references(self):
+        """DeepResearchFlow should aggregate references from all sections."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=True,
+        )
+        result = await flow.run("What are the main features of Python programming language?")
+        assert isinstance(result, str)
+        # Long writer should aggregate references at the end
+        # Check for references section or citation format
+        # Note: References may not always be present
+        # Just verify report structure is correct
+        assert len(result) > 100
+    @pytest.mark.asyncio
+    async def test_deep_flow_with_proofreader(self):
+        """DeepResearchFlow should use proofreader to finalize report."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=False,  # Use proofreader instead
+        )
+        result = await flow.run("What is artificial intelligence?")
+        assert isinstance(result, str)
+        assert len(result) > 0
+        # Proofreader should create polished report
+        # Should have some structure
+        assert len(result) > 50
+    @pytest.mark.asyncio
+    async def test_proofreader_removes_duplicates(self):
+        """Proofreader should remove duplicate content from report."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=False,
+        )
+        result = await flow.run("Explain machine learning basics")
+        assert isinstance(result, str)
+        # Proofreader should create polished, non-repetitive content
+        # This is a soft check - just verify report was generated
+        assert len(result) > 0
+    @pytest.mark.asyncio
+    async def test_proofreader_adds_summary(self):
+        """Proofreader should add a summary to the report."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        flow = create_deep_flow(
+            max_iterations=1,
+            max_time_minutes=3,
+            use_long_writer=False,
+        )
+        result = await flow.run("What is Python programming language?")
+        assert isinstance(result, str)
+        # Proofreader should add summary/outline
+        # Check for summary indicators
+        # Note: Summary format may vary
+        # Just verify report was generated
+        assert len(result) > 0
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_uses_writer_agents(self):
+        """GraphOrchestrator should use writer agents in iterative mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="iterative",
+            max_iterations=1,
+            max_time_minutes=2,
+            use_graph=False,  # Use agent chains to test writer integration
+        )
+        events = []
+        async for event in orchestrator.run("What is the capital of France?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+        # Extract final report from complete event
+        complete_events = [e for e in events if e.type == "complete"]
+        assert len(complete_events) > 0
+        final_report = complete_events[0].message
+        assert isinstance(final_report, str)
+        assert len(final_report) > 0
+        # Should have content from writer agent
+        assert "paris" in final_report.lower() or "france" in final_report.lower()
+    @pytest.mark.asyncio
+    async def test_graph_orchestrator_uses_long_writer_in_deep_mode(self):
+        """GraphOrchestrator should use long writer in deep mode."""
+        if not settings.has_openai_key and not settings.has_anthropic_key:
+            pytest.skip("No OpenAI or Anthropic API key available")
+        orchestrator = create_graph_orchestrator(
+            mode="deep",
+            max_iterations=1,
+            max_time_minutes=3,
+            use_graph=False,  # Use agent chains
+        )
+        events = []
+        async for event in orchestrator.run("What are the main features of Python?"):
+            events.append(event)
+        assert len(events) > 0
+        event_types = [e.type for e in events]
+        assert "started" in event_types
+        assert "complete" in event_types
+        # Extract final report
+        complete_events = [e for e in events if e.type == "complete"]
+        assert len(complete_events) > 0
+        final_report = complete_events[0].message
+        assert isinstance(final_report, str)
+        assert len(final_report) > 0
+        # Should have structured content from long writer
+        assert len(final_report) > 100

tests/unit/agent_factory/test_graph_builder.py ADDED Viewed

	@@ -0,0 +1,439 @@

+"""Unit tests for graph builder utilities."""
+from typing import Any
+from unittest.mock import MagicMock
+import pytest
+from pydantic_ai import Agent
+from src.agent_factory.graph_builder import (
+    AgentNode,
+    ConditionalEdge,
+    DecisionNode,
+    GraphBuilder,
+    GraphNode,
+    ParallelNode,
+    ResearchGraph,
+    SequentialEdge,
+    StateNode,
+    create_deep_graph,
+    create_iterative_graph,
+)
+from src.middleware.state_machine import WorkflowState
+class TestGraphNode:
+    """Tests for GraphNode models."""
+    def test_graph_node_creation(self):
+        """Test creating a base GraphNode."""
+        node = GraphNode(node_id="test_node", node_type="agent", description="Test")
+        assert node.node_id == "test_node"
+        assert node.node_type == "agent"
+        assert node.description == "Test"
+    def test_agent_node_creation(self):
+        """Test creating an AgentNode."""
+        mock_agent = MagicMock(spec=Agent)
+        node = AgentNode(
+            node_id="agent_1",
+            agent=mock_agent,
+            description="Test agent",
+        )
+        assert node.node_id == "agent_1"
+        assert node.node_type == "agent"
+        assert node.agent == mock_agent
+        assert node.input_transformer is None
+        assert node.output_transformer is None
+    def test_agent_node_with_transformers(self):
+        """Test creating an AgentNode with transformers."""
+        mock_agent = MagicMock(spec=Agent)
+        def input_transformer(x):
+            return f"input_{x}"
+        def output_transformer(x):
+            return f"output_{x}"
+        node = AgentNode(
+            node_id="agent_1",
+            agent=mock_agent,
+            input_transformer=input_transformer,
+            output_transformer=output_transformer,
+        )
+        assert node.input_transformer is not None
+        assert node.output_transformer is not None
+    def test_state_node_creation(self):
+        """Test creating a StateNode."""
+        def state_updater(state: WorkflowState, data: Any) -> WorkflowState:
+            return state
+        node = StateNode(
+            node_id="state_1",
+            state_updater=state_updater,
+            description="Test state",
+        )
+        assert node.node_id == "state_1"
+        assert node.node_type == "state"
+        assert node.state_updater is not None
+        assert node.state_reader is None
+    def test_decision_node_creation(self):
+        """Test creating a DecisionNode."""
+        def decision_func(data: Any) -> str:
+            return "next_node"
+        node = DecisionNode(
+            node_id="decision_1",
+            decision_function=decision_func,
+            options=["next_node", "other_node"],
+            description="Test decision",
+        )
+        assert node.node_id == "decision_1"
+        assert node.node_type == "decision"
+        assert len(node.options) == 2
+        assert "next_node" in node.options
+    def test_parallel_node_creation(self):
+        """Test creating a ParallelNode."""
+        node = ParallelNode(
+            node_id="parallel_1",
+            parallel_nodes=["node1", "node2", "node3"],
+            description="Test parallel",
+        )
+        assert node.node_id == "parallel_1"
+        assert node.node_type == "parallel"
+        assert len(node.parallel_nodes) == 3
+        assert node.aggregator is None
+class TestGraphEdge:
+    """Tests for GraphEdge models."""
+    def test_sequential_edge_creation(self):
+        """Test creating a SequentialEdge."""
+        edge = SequentialEdge(from_node="node1", to_node="node2")
+        assert edge.from_node == "node1"
+        assert edge.to_node == "node2"
+        assert edge.condition is None
+        assert edge.weight == 1.0
+    def test_conditional_edge_creation(self):
+        """Test creating a ConditionalEdge."""
+        def condition(data: Any) -> bool:
+            return True
+        edge = ConditionalEdge(
+            from_node="node1",
+            to_node="node2",
+            condition=condition,
+            condition_description="Test condition",
+        )
+        assert edge.from_node == "node1"
+        assert edge.to_node == "node2"
+        assert edge.condition is not None
+        assert edge.condition_description == "Test condition"
+class TestResearchGraph:
+    """Tests for ResearchGraph class."""
+    def test_graph_creation(self):
+        """Test creating an empty graph."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        assert graph.entry_node == "start"
+        assert len(graph.exit_nodes) == 1
+        assert graph.exit_nodes[0] == "end"
+        assert len(graph.nodes) == 0
+        assert len(graph.edges) == 0
+    def test_add_node(self):
+        """Test adding a node to the graph."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node = GraphNode(node_id="node1", node_type="agent", description="Test")
+        graph.add_node(node)
+        assert "node1" in graph.nodes
+        assert graph.get_node("node1") == node
+    def test_add_node_duplicate_raises_error(self):
+        """Test that adding duplicate node raises ValueError."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node = GraphNode(node_id="node1", node_type="agent", description="Test")
+        graph.add_node(node)
+        with pytest.raises(ValueError, match="already exists"):
+            graph.add_node(node)
+    def test_add_edge(self):
+        """Test adding an edge to the graph."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node1 = GraphNode(node_id="node1", node_type="agent", description="Test")
+        node2 = GraphNode(node_id="node2", node_type="agent", description="Test")
+        graph.add_node(node1)
+        graph.add_node(node2)
+        edge = SequentialEdge(from_node="node1", to_node="node2")
+        graph.add_edge(edge)
+        assert "node1" in graph.edges
+        assert len(graph.edges["node1"]) == 1
+        assert graph.edges["node1"][0] == edge
+    def test_add_edge_invalid_source_raises_error(self):
+        """Test that adding edge with invalid source raises ValueError."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        edge = SequentialEdge(from_node="nonexistent", to_node="node2")
+        with pytest.raises(ValueError, match="Source node.*not found"):
+            graph.add_edge(edge)
+    def test_add_edge_invalid_target_raises_error(self):
+        """Test that adding edge with invalid target raises ValueError."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node1 = GraphNode(node_id="node1", node_type="agent", description="Test")
+        graph.add_node(node1)
+        edge = SequentialEdge(from_node="node1", to_node="nonexistent")
+        with pytest.raises(ValueError, match="Target node.*not found"):
+            graph.add_edge(edge)
+    def test_get_next_nodes(self):
+        """Test getting next nodes from a node."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node1 = GraphNode(node_id="node1", node_type="agent", description="Test")
+        node2 = GraphNode(node_id="node2", node_type="agent", description="Test")
+        graph.add_node(node1)
+        graph.add_node(node2)
+        graph.add_edge(SequentialEdge(from_node="node1", to_node="node2"))
+        next_nodes = graph.get_next_nodes("node1")
+        assert len(next_nodes) == 1
+        assert next_nodes[0][0] == "node2"
+    def test_get_next_nodes_with_condition(self):
+        """Test getting next nodes with conditional edge."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        node1 = GraphNode(node_id="node1", node_type="agent", description="Test")
+        node2 = GraphNode(node_id="node2", node_type="agent", description="Test")
+        node3 = GraphNode(node_id="node3", node_type="agent", description="Test")
+        graph.add_node(node1)
+        graph.add_node(node2)
+        graph.add_node(node3)
+        # Add conditional edge that only passes when data is True
+        def condition(data: Any) -> bool:
+            return data is True
+        graph.add_edge(SequentialEdge(from_node="node1", to_node="node2"))
+        graph.add_edge(ConditionalEdge(from_node="node1", to_node="node3", condition=condition))
+        # With condition True, should get both
+        next_nodes = graph.get_next_nodes("node1", context=True)
+        assert len(next_nodes) == 2
+        # With condition False, should only get sequential edge
+        next_nodes = graph.get_next_nodes("node1", context=False)
+        assert len(next_nodes) == 1
+        assert next_nodes[0][0] == "node2"
+    def test_validate_empty_graph(self):
+        """Test validating an empty graph."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        errors = graph.validate()
+        assert len(errors) > 0  # Should have errors for missing entry/exit nodes
+    def test_validate_valid_graph(self):
+        """Test validating a valid graph."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        start_node = GraphNode(node_id="start", node_type="agent", description="Start")
+        end_node = GraphNode(node_id="end", node_type="agent", description="End")
+        graph.add_node(start_node)
+        graph.add_node(end_node)
+        graph.add_edge(SequentialEdge(from_node="start", to_node="end"))
+        errors = graph.validate()
+        assert len(errors) == 0
+    def test_validate_unreachable_nodes(self):
+        """Test that validation detects unreachable nodes."""
+        graph = ResearchGraph(entry_node="start", exit_nodes=["end"])
+        start_node = GraphNode(node_id="start", node_type="agent", description="Start")
+        end_node = GraphNode(node_id="end", node_type="agent", description="End")
+        unreachable = GraphNode(node_id="unreachable", node_type="agent", description="Unreachable")
+        graph.add_node(start_node)
+        graph.add_node(end_node)
+        graph.add_node(unreachable)
+        graph.add_edge(SequentialEdge(from_node="start", to_node="end"))
+        errors = graph.validate()
+        assert len(errors) > 0
+        assert any("unreachable" in error.lower() for error in errors)
+class TestGraphBuilder:
+    """Tests for GraphBuilder class."""
+    def test_builder_initialization(self):
+        """Test initializing a GraphBuilder."""
+        builder = GraphBuilder()
+        assert builder.graph is not None
+        assert builder.graph.entry_node == ""
+        assert len(builder.graph.exit_nodes) == 0
+    def test_add_agent_node(self):
+        """Test adding an agent node."""
+        builder = GraphBuilder()
+        mock_agent = MagicMock(spec=Agent)
+        builder.add_agent_node("agent1", mock_agent, "Test agent")
+        assert "agent1" in builder.graph.nodes
+        node = builder.graph.get_node("agent1")
+        assert isinstance(node, AgentNode)
+        assert node.agent == mock_agent
+    def test_add_state_node(self):
+        """Test adding a state node."""
+        builder = GraphBuilder()
+        def updater(state: WorkflowState, data: Any) -> WorkflowState:
+            return state
+        builder.add_state_node("state1", updater, "Test state")
+        assert "state1" in builder.graph.nodes
+        node = builder.graph.get_node("state1")
+        assert isinstance(node, StateNode)
+    def test_add_decision_node(self):
+        """Test adding a decision node."""
+        builder = GraphBuilder()
+        def decision_func(data: Any) -> str:
+            return "next"
+        builder.add_decision_node("decision1", decision_func, ["next", "other"], "Test")
+        assert "decision1" in builder.graph.nodes
+        node = builder.graph.get_node("decision1")
+        assert isinstance(node, DecisionNode)
+    def test_add_parallel_node(self):
+        """Test adding a parallel node."""
+        builder = GraphBuilder()
+        builder.add_parallel_node("parallel1", ["node1", "node2"], "Test")
+        assert "parallel1" in builder.graph.nodes
+        node = builder.graph.get_node("parallel1")
+        assert isinstance(node, ParallelNode)
+        assert len(node.parallel_nodes) == 2
+    def test_connect_nodes(self):
+        """Test connecting nodes."""
+        builder = GraphBuilder()
+        builder.add_agent_node("node1", MagicMock(spec=Agent), "Node 1")
+        builder.add_agent_node("node2", MagicMock(spec=Agent), "Node 2")
+        builder.connect_nodes("node1", "node2")
+        assert "node1" in builder.graph.edges
+        assert len(builder.graph.edges["node1"]) == 1
+    def test_connect_nodes_with_condition(self):
+        """Test connecting nodes with a condition."""
+        builder = GraphBuilder()
+        builder.add_agent_node("node1", MagicMock(spec=Agent), "Node 1")
+        builder.add_agent_node("node2", MagicMock(spec=Agent), "Node 2")
+        def condition(data: Any) -> bool:
+            return True
+        builder.connect_nodes("node1", "node2", condition=condition, condition_description="Test")
+        edge = builder.graph.edges["node1"][0]
+        assert isinstance(edge, ConditionalEdge)
+        assert edge.condition is not None
+    def test_set_entry_node(self):
+        """Test setting entry node."""
+        builder = GraphBuilder()
+        builder.add_agent_node("start", MagicMock(spec=Agent), "Start")
+        builder.set_entry_node("start")
+        assert builder.graph.entry_node == "start"
+    def test_set_exit_nodes(self):
+        """Test setting exit nodes."""
+        builder = GraphBuilder()
+        builder.add_agent_node("end1", MagicMock(spec=Agent), "End 1")
+        builder.add_agent_node("end2", MagicMock(spec=Agent), "End 2")
+        builder.set_exit_nodes(["end1", "end2"])
+        assert len(builder.graph.exit_nodes) == 2
+    def test_build_validates_graph(self):
+        """Test that build() validates the graph."""
+        builder = GraphBuilder()
+        builder.add_agent_node("start", MagicMock(spec=Agent), "Start")
+        builder.set_entry_node("start")
+        # Missing exit node - should fail validation
+        with pytest.raises(ValueError, match="validation failed"):
+            builder.build()
+    def test_build_returns_valid_graph(self):
+        """Test that build() returns a valid graph."""
+        builder = GraphBuilder()
+        mock_agent = MagicMock(spec=Agent)
+        builder.add_agent_node("start", mock_agent, "Start")
+        builder.add_agent_node("end", mock_agent, "End")
+        builder.connect_nodes("start", "end")
+        builder.set_entry_node("start")
+        builder.set_exit_nodes(["end"])
+        graph = builder.build()
+        assert isinstance(graph, ResearchGraph)
+        assert graph.entry_node == "start"
+        assert "end" in graph.exit_nodes
+class TestFactoryFunctions:
+    """Tests for factory functions."""
+    def test_create_iterative_graph(self):
+        """Test creating an iterative research graph."""
+        mock_kg_agent = MagicMock(spec=Agent)
+        mock_ts_agent = MagicMock(spec=Agent)
+        mock_thinking_agent = MagicMock(spec=Agent)
+        mock_writer_agent = MagicMock(spec=Agent)
+        graph = create_iterative_graph(
+            knowledge_gap_agent=mock_kg_agent,
+            tool_selector_agent=mock_ts_agent,
+            thinking_agent=mock_thinking_agent,
+            writer_agent=mock_writer_agent,
+        )
+        assert isinstance(graph, ResearchGraph)
+        assert graph.entry_node == "thinking"
+        assert "writer" in graph.exit_nodes
+        assert "thinking" in graph.nodes
+        assert "knowledge_gap" in graph.nodes
+        assert "continue_decision" in graph.nodes
+        assert "tool_selector" in graph.nodes
+        assert "writer" in graph.nodes
+    def test_create_deep_graph(self):
+        """Test creating a deep research graph."""
+        mock_planner_agent = MagicMock(spec=Agent)
+        mock_kg_agent = MagicMock(spec=Agent)
+        mock_ts_agent = MagicMock(spec=Agent)
+        mock_thinking_agent = MagicMock(spec=Agent)
+        mock_writer_agent = MagicMock(spec=Agent)
+        mock_long_writer_agent = MagicMock(spec=Agent)
+        graph = create_deep_graph(
+            planner_agent=mock_planner_agent,
+            knowledge_gap_agent=mock_kg_agent,
+            tool_selector_agent=mock_ts_agent,
+            thinking_agent=mock_thinking_agent,
+            writer_agent=mock_writer_agent,
+            long_writer_agent=mock_long_writer_agent,
+        )
+        assert isinstance(graph, ResearchGraph)
+        assert graph.entry_node == "planner"
+        assert "synthesizer" in graph.exit_nodes
+        assert "planner" in graph.nodes
+        assert "parallel_loops_placeholder" in graph.nodes
+        assert "synthesizer" in graph.nodes

tests/unit/agents/test_input_parser.py ADDED Viewed

	@@ -0,0 +1,325 @@

+"""Unit tests for InputParserAgent."""
+from unittest.mock import AsyncMock, MagicMock, patch
+import pytest
+from pydantic_ai import AgentRunResult
+from src.agents.input_parser import InputParserAgent, create_input_parser_agent
+from src.utils.exceptions import ConfigurationError
+from src.utils.models import ParsedQuery
+@pytest.fixture
+def mock_model() -> MagicMock:
+    """Create a mock Pydantic AI model."""
+    model = MagicMock()
+    model.name = "test-model"
+    return model
+@pytest.fixture
+def mock_parsed_query_iterative() -> ParsedQuery:
+    """Create a mock ParsedQuery for iterative mode."""
+    return ParsedQuery(
+        original_query="What is the mechanism of metformin?",
+        improved_query="What is the molecular mechanism of action of metformin in diabetes treatment?",
+        research_mode="iterative",
+        key_entities=["metformin", "diabetes"],
+        research_questions=["What is metformin's mechanism of action?"],
+    )
+@pytest.fixture
+def mock_parsed_query_deep() -> ParsedQuery:
+    """Create a mock ParsedQuery for deep mode."""
+    return ParsedQuery(
+        original_query="Write a comprehensive report on diabetes treatment",
+        improved_query="Provide a comprehensive analysis of diabetes treatment options, including mechanisms, clinical evidence, and market analysis",
+        research_mode="deep",
+        key_entities=["diabetes", "treatment"],
+        research_questions=[
+            "What are the main treatment options for diabetes?",
+            "What is the clinical evidence for each treatment?",
+            "What is the market size for diabetes treatments?",
+        ],
+    )
+@pytest.fixture
+def mock_agent_result_iterative(
+    mock_parsed_query_iterative: ParsedQuery,
+) -> AgentRunResult[ParsedQuery]:
+    """Create a mock agent result for iterative mode."""
+    result = MagicMock(spec=AgentRunResult)
+    result.output = mock_parsed_query_iterative
+    return result
+@pytest.fixture
+def mock_agent_result_deep(
+    mock_parsed_query_deep: ParsedQuery,
+) -> AgentRunResult[ParsedQuery]:
+    """Create a mock agent result for deep mode."""
+    result = MagicMock(spec=AgentRunResult)
+    result.output = mock_parsed_query_deep
+    return result
+@pytest.fixture
+def input_parser_agent(mock_model: MagicMock) -> InputParserAgent:
+    """Create an InputParserAgent instance with mocked model."""
+    return InputParserAgent(model=mock_model)
+class TestInputParserAgentInit:
+    """Test InputParserAgent initialization."""
+    def test_input_parser_agent_init_with_model(self, mock_model: MagicMock) -> None:
+        """Test InputParserAgent initialization with provided model."""
+        agent = InputParserAgent(model=mock_model)
+        assert agent.model == mock_model
+        assert agent.agent is not None
+    @patch("src.agents.input_parser.get_model")
+    def test_input_parser_agent_init_without_model(
+        self, mock_get_model: MagicMock, mock_model: MagicMock
+    ) -> None:
+        """Test InputParserAgent initialization without model (uses default)."""
+        mock_get_model.return_value = mock_model
+        agent = InputParserAgent()
+        assert agent.model == mock_model
+        mock_get_model.assert_called_once()
+    def test_input_parser_agent_has_correct_system_prompt(
+        self, input_parser_agent: InputParserAgent
+    ) -> None:
+        """Test that InputParserAgent has correct system prompt."""
+        # System prompt should contain key instructions
+        # In pydantic_ai, system_prompt is a property that returns the prompt string
+        # For mocked agents, we check that the agent was created with a system prompt
+        assert input_parser_agent.agent is not None
+        # The actual system prompt is set during agent creation
+        # We verify the agent exists and was properly initialized
+        # Note: Direct access to system_prompt may not work with mocks
+        # This test verifies the agent structure is correct
+class TestParse:
+    """Test parse() method."""
+    @pytest.mark.asyncio
+    async def test_parse_iterative_query(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_iterative: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test parsing a simple query that should return iterative mode."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_iterative)
+        query = "What is the mechanism of metformin?"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert result.research_mode == "iterative"
+        assert result.original_query == query
+        assert "metformin" in result.key_entities
+        assert input_parser_agent.agent.run.called
+    @pytest.mark.asyncio
+    async def test_parse_deep_query(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_deep: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test parsing a complex query that should return deep mode."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_deep)
+        query = "Write a comprehensive report on diabetes treatment"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert result.research_mode == "deep"
+        assert result.original_query == query
+        assert len(result.research_questions) > 0
+        assert input_parser_agent.agent.run.called
+    @pytest.mark.asyncio
+    async def test_parse_improves_query(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_iterative: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test that parse() improves the query."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_iterative)
+        query = "metformin mechanism"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert result.improved_query != result.original_query
+        assert len(result.improved_query) >= len(result.original_query)
+    @pytest.mark.asyncio
+    async def test_parse_extracts_entities(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_iterative: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test that parse() extracts key entities."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_iterative)
+        query = "What is the mechanism of metformin?"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert len(result.key_entities) > 0
+        assert "metformin" in result.key_entities
+    @pytest.mark.asyncio
+    async def test_parse_extracts_research_questions(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_deep: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test that parse() extracts research questions."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_deep)
+        query = "Write a comprehensive report on diabetes treatment"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert len(result.research_questions) > 0
+    @pytest.mark.asyncio
+    async def test_parse_handles_missing_improved_query(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_model: MagicMock,
+    ) -> None:
+        """Test that parse() handles missing improved_query gracefully."""
+        # Create a result with missing improved_query
+        mock_result = MagicMock(spec=AgentRunResult)
+        mock_parsed = ParsedQuery(
+            original_query="test query",
+            improved_query="",  # Empty improved query
+            research_mode="iterative",
+            key_entities=[],
+            research_questions=[],
+        )
+        mock_result.output = mock_parsed
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_result)
+        query = "test query"
+        result = await input_parser_agent.parse(query)
+        # Should use original_query as fallback
+        assert isinstance(result, ParsedQuery)
+        assert result.improved_query == result.original_query
+    @pytest.mark.asyncio
+    async def test_parse_fallback_to_heuristic_on_error(
+        self, input_parser_agent: InputParserAgent
+    ) -> None:
+        """Test that parse() falls back to heuristic when agent fails."""
+        # Make agent.run raise an exception
+        input_parser_agent.agent.run = AsyncMock(side_effect=Exception("Agent failed"))
+        # Query with "comprehensive" should trigger deep mode heuristic
+        query = "Write a comprehensive report on diabetes"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert result.research_mode == "deep"  # Heuristic should detect "comprehensive"
+        assert result.original_query == query
+        assert result.improved_query == query  # No improvement on fallback
+    @pytest.mark.asyncio
+    async def test_parse_heuristic_iterative_mode(
+        self, input_parser_agent: InputParserAgent
+    ) -> None:
+        """Test that parse() heuristic correctly identifies iterative mode."""
+        # Make agent.run raise an exception
+        input_parser_agent.agent.run = AsyncMock(side_effect=Exception("Agent failed"))
+        # Simple query should trigger iterative mode heuristic
+        query = "What is metformin?"
+        result = await input_parser_agent.parse(query)
+        assert isinstance(result, ParsedQuery)
+        assert result.research_mode == "iterative"
+        assert result.original_query == query
+class TestCreateInputParserAgent:
+    """Test create_input_parser_agent() factory function."""
+    @patch("src.agents.input_parser.get_model")
+    def test_create_input_parser_agent_with_model(
+        self, mock_get_model: MagicMock, mock_model: MagicMock
+    ) -> None:
+        """Test factory function with provided model."""
+        agent = create_input_parser_agent(model=mock_model)
+        assert isinstance(agent, InputParserAgent)
+        assert agent.model == mock_model
+        mock_get_model.assert_not_called()
+    @patch("src.agents.input_parser.get_model")
+    def test_create_input_parser_agent_without_model(
+        self, mock_get_model: MagicMock, mock_model: MagicMock
+    ) -> None:
+        """Test factory function without model (uses default)."""
+        mock_get_model.return_value = mock_model
+        agent = create_input_parser_agent()
+        assert isinstance(agent, InputParserAgent)
+        assert agent.model == mock_model
+        mock_get_model.assert_called_once()
+    @patch("src.agents.input_parser.get_model")
+    def test_create_input_parser_agent_handles_error(self, mock_get_model: MagicMock) -> None:
+        """Test factory function handles errors gracefully."""
+        mock_get_model.side_effect = Exception("Model creation failed")
+        with pytest.raises(ConfigurationError, match="Failed to create input parser agent"):
+            create_input_parser_agent()
+class TestResearchModeDetection:
+    """Test research mode detection logic."""
+    @pytest.mark.asyncio
+    async def test_detects_iterative_mode_for_simple_queries(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_iterative: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test that simple queries are detected as iterative."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_iterative)
+        simple_queries = [
+            "What is the mechanism of metformin?",
+            "Find clinical trials for drug X",
+            "What is the capital of France?",
+        ]
+        for query in simple_queries:
+            result = await input_parser_agent.parse(query)
+            assert result.research_mode == "iterative", f"Query '{query}' should be iterative"
+    @pytest.mark.asyncio
+    async def test_detects_deep_mode_for_complex_queries(
+        self,
+        input_parser_agent: InputParserAgent,
+        mock_agent_result_deep: AgentRunResult[ParsedQuery],
+    ) -> None:
+        """Test that complex queries are detected as deep."""
+        input_parser_agent.agent.run = AsyncMock(return_value=mock_agent_result_deep)
+        complex_queries = [
+            "Write a comprehensive report on diabetes treatment",
+            "Analyze the market for quantum computing",
+            "Provide a detailed analysis of AI trends",
+        ]
+        for query in complex_queries:
+            result = await input_parser_agent.parse(query)
+            assert result.research_mode == "deep", f"Query '{query}' should be deep"

tests/unit/agents/test_long_writer.py ADDED Viewed

	@@ -0,0 +1,509 @@

+"""Unit tests for LongWriterAgent."""
+from unittest.mock import AsyncMock, MagicMock, patch
+import pytest
+from pydantic_ai import AgentResult
+from src.agents.long_writer import LongWriterAgent, LongWriterOutput, create_long_writer_agent
+from src.utils.models import ReportDraft, ReportDraftSection
+@pytest.fixture
+def mock_model() -> MagicMock:
+    """Create a mock Pydantic AI model."""
+    model = MagicMock()
+    model.name = "test-model"
+    return model
+@pytest.fixture
+def mock_long_writer_output() -> LongWriterOutput:
+    """Create a mock LongWriterOutput."""
+    return LongWriterOutput(
+        next_section_markdown="## Test Section\n\nContent with citation [1].",
+        references=["[1] https://example.com"],
+    )
+@pytest.fixture
+def mock_agent_result(mock_long_writer_output: LongWriterOutput) -> AgentResult[LongWriterOutput]:
+    """Create a mock agent result."""
+    result = MagicMock(spec=AgentResult)
+    result.output = mock_long_writer_output
+    return result
+@pytest.fixture
+def long_writer_agent(mock_model: MagicMock) -> LongWriterAgent:
+    """Create a LongWriterAgent instance with mocked model."""
+    return LongWriterAgent(model=mock_model)
+@pytest.fixture
+def sample_report_draft() -> ReportDraft:
+    """Create a sample ReportDraft for testing."""
+    return ReportDraft(
+        sections=[
+            ReportDraftSection(
+                section_title="Introduction",
+                section_content="Introduction content with [1].",
+            ),
+            ReportDraftSection(
+                section_title="Methods",
+                section_content="Methods content with [2].",
+            ),
+        ]
+    )
+class TestLongWriterAgentInit:
+    """Test LongWriterAgent initialization."""
+    def test_long_writer_agent_init_with_model(self, mock_model: MagicMock) -> None:
+        """Test LongWriterAgent initialization with provided model."""
+        agent = LongWriterAgent(model=mock_model)
+        assert agent.model == mock_model
+        assert agent.agent is not None
+    @patch("src.agents.long_writer.get_model")
+    def test_long_writer_agent_init_without_model(
+        self, mock_get_model: MagicMock, mock_model: MagicMock
+    ) -> None:
+        """Test LongWriterAgent initialization without model (uses default)."""
+        mock_get_model.return_value = mock_model
+        agent = LongWriterAgent()
+        assert agent.model == mock_model
+        mock_get_model.assert_called_once()
+    def test_long_writer_agent_has_structured_output(
+        self, long_writer_agent: LongWriterAgent
+    ) -> None:
+        """Test that LongWriterAgent uses structured output."""
+        assert long_writer_agent.agent.output_type == LongWriterOutput
+class TestWriteNextSection:
+    """Test write_next_section() method."""
+    @pytest.mark.asyncio
+    async def test_write_next_section_basic(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test basic section writing."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        original_query = "Test query"
+        report_draft = "## Existing Section\n\nContent"
+        next_section_title = "New Section"
+        next_section_draft = "Draft content"
+        result = await long_writer_agent.write_next_section(
+            original_query=original_query,
+            report_draft=report_draft,
+            next_section_title=next_section_title,
+            next_section_draft=next_section_draft,
+        )
+        assert isinstance(result, LongWriterOutput)
+        assert result.next_section_markdown is not None
+        assert isinstance(result.references, list)
+        assert long_writer_agent.agent.run.called
+    @pytest.mark.asyncio
+    async def test_write_next_section_first_section(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test writing the first section (no existing draft)."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        original_query = "Test query"
+        report_draft = ""  # No existing draft
+        next_section_title = "First Section"
+        next_section_draft = "Draft content"
+        result = await long_writer_agent.write_next_section(
+            original_query=original_query,
+            report_draft=report_draft,
+            next_section_title=next_section_title,
+            next_section_draft=next_section_draft,
+        )
+        assert isinstance(result, LongWriterOutput)
+        # Check that "No draft yet" was included in prompt
+        call_args = long_writer_agent.agent.run.call_args[0][0]
+        assert "No draft yet" in call_args or report_draft in call_args
+    @pytest.mark.asyncio
+    async def test_write_next_section_with_existing_draft(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test writing section with existing draft."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        original_query = "Test query"
+        report_draft = "## Previous Section\n\nPrevious content"
+        next_section_title = "Next Section"
+        next_section_draft = "Next draft"
+        result = await long_writer_agent.write_next_section(
+            original_query=original_query,
+            report_draft=report_draft,
+            next_section_title=next_section_title,
+            next_section_draft=next_section_draft,
+        )
+        assert isinstance(result, LongWriterOutput)
+        # Check that existing draft was included in prompt
+        call_args = long_writer_agent.agent.run.call_args[0][0]
+        assert "Previous Section" in call_args
+    @pytest.mark.asyncio
+    async def test_write_next_section_returns_references(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test that write_next_section returns references."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        result = await long_writer_agent.write_next_section(
+            original_query="Test",
+            report_draft="",
+            next_section_title="Test",
+            next_section_draft="Test",
+        )
+        assert isinstance(result.references, list)
+        assert len(result.references) > 0
+    @pytest.mark.asyncio
+    async def test_write_next_section_handles_empty_draft(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test writing section with empty draft."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        result = await long_writer_agent.write_next_section(
+            original_query="Test",
+            report_draft="",
+            next_section_title="Test",
+            next_section_draft="",
+        )
+        assert isinstance(result, LongWriterOutput)
+    @pytest.mark.asyncio
+    async def test_write_next_section_llm_failure(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test write_next_section handles LLM failures gracefully."""
+        long_writer_agent.agent.run = AsyncMock(side_effect=Exception("LLM error"))
+        result = await long_writer_agent.write_next_section(
+            original_query="Test",
+            report_draft="",
+            next_section_title="Test",
+            next_section_draft="Test",
+        )
+        # Should return fallback section
+        assert isinstance(result, LongWriterOutput)
+        assert "Test" in result.next_section_markdown
+        assert result.references == []
+class TestWriteReport:
+    """Test write_report() method."""
+    @pytest.mark.asyncio
+    async def test_write_report_complete_flow(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+        sample_report_draft: ReportDraft,
+    ) -> None:
+        """Test complete report writing flow."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        original_query = "Test query"
+        report_title = "Test Report"
+        result = await long_writer_agent.write_report(
+            original_query=original_query,
+            report_title=report_title,
+            report_draft=sample_report_draft,
+        )
+        assert isinstance(result, str)
+        assert report_title in result
+        assert "Table of Contents" in result
+        assert "Introduction" in result
+        assert "Methods" in result
+        # Should have called write_next_section for each section
+        assert long_writer_agent.agent.run.call_count == len(sample_report_draft.sections)
+    @pytest.mark.asyncio
+    async def test_write_report_single_section(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+    ) -> None:
+        """Test writing report with single section."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        report_draft = ReportDraft(
+            sections=[
+                ReportDraftSection(
+                    section_title="Single Section",
+                    section_content="Content",
+                )
+            ]
+        )
+        result = await long_writer_agent.write_report(
+            original_query="Test",
+            report_title="Test Report",
+            report_draft=report_draft,
+        )
+        assert isinstance(result, str)
+        assert "Single Section" in result
+        assert long_writer_agent.agent.run.call_count == 1
+    @pytest.mark.asyncio
+    async def test_write_report_multiple_sections(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+        sample_report_draft: ReportDraft,
+    ) -> None:
+        """Test writing report with multiple sections."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        result = await long_writer_agent.write_report(
+            original_query="Test",
+            report_title="Test Report",
+            report_draft=sample_report_draft,
+        )
+        assert isinstance(result, str)
+        assert sample_report_draft.sections[0].section_title in result
+        assert sample_report_draft.sections[1].section_title in result
+        assert long_writer_agent.agent.run.call_count == len(sample_report_draft.sections)
+    @pytest.mark.asyncio
+    async def test_write_report_creates_table_of_contents(
+        self,
+        long_writer_agent: LongWriterAgent,
+        mock_agent_result: AgentResult[LongWriterOutput],
+        sample_report_draft: ReportDraft,
+    ) -> None:
+        """Test that write_report creates table of contents."""
+        long_writer_agent.agent.run = AsyncMock(return_value=mock_agent_result)
+        result = await long_writer_agent.write_report(
+            original_query="Test",
+            report_title="Test Report",
+            report_draft=sample_report_draft,
+        )
+        assert "Table of Contents" in result
+        assert "1. Introduction" in result
+        assert "2. Methods" in result
+    @pytest.mark.asyncio
+    async def test_write_report_aggregates_references(
+        self,
+        long_writer_agent: LongWriterAgent,
+        sample_report_draft: ReportDraft,
+    ) -> None:
+        """Test that write_report aggregates references from all sections."""
+        # Create different outputs for each section
+        output1 = LongWriterOutput(
+            next_section_markdown="## Introduction\n\nContent [1].",
+            references=["[1] https://example.com/1"],
+        )
+        output2 = LongWriterOutput(
+            next_section_markdown="## Methods\n\nContent [1].",
+            references=["[1] https://example.com/2"],
+        )
+        results = [AgentResult(output=output1), AgentResult(output=output2)]
+        long_writer_agent.agent.run = AsyncMock(side_effect=results)
+        result = await long_writer_agent.write_report(
+            original_query="Test",
+            report_title="Test Report",
+            report_draft=sample_report_draft,
+        )
+        assert "References:" in result
+        # Should have both references (reformatted)
+        assert "example.com/1" in result or "[1]" in result
+        assert "example.com/2" in result or "[2]" in result
+class TestReformatReferences:
+    """Test _reformat_references() method."""
+    def test_reformat_references_deduplicates(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test that reference reformatting deduplicates URLs."""
+        section_markdown = "Content [1] and [2]."
+        section_references = [
+            "[1] https://example.com",
+            "[2] https://example.com",  # Duplicate URL
+        ]
+        all_references = []
+        updated_markdown, updated_refs = long_writer_agent._reformat_references(
+            section_markdown, section_references, all_references
+        )
+        # Should only have one reference
+        assert len(updated_refs) == 1
+        assert "example.com" in updated_refs[0]
+    def test_reformat_references_renumbers(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test that reference reformatting renumbers correctly."""
+        section_markdown = "Content [1] and [2]."
+        section_references = [
+            "[1] https://example.com/1",
+            "[2] https://example.com/2",
+        ]
+        all_references = ["[1] https://example.com/0"]  # Existing reference
+        updated_markdown, updated_refs = long_writer_agent._reformat_references(
+            section_markdown, section_references, all_references
+        )
+        # Should have 3 references total (0, 1, 2)
+        assert len(updated_refs) == 3
+        # Markdown should have updated reference numbers
+        assert "[2]" in updated_markdown or "[3]" in updated_markdown
+    def test_reformat_references_handles_malformed(
+        self, long_writer_agent: LongWriterAgent
+    ) -> None:
+        """Test that reference reformatting handles malformed references."""
+        section_markdown = "Content [1]."
+        section_references = [
+            "[1] https://example.com",
+            "invalid reference",  # Malformed
+        ]
+        all_references = []
+        updated_markdown, updated_refs = long_writer_agent._reformat_references(
+            section_markdown, section_references, all_references
+        )
+        # Should still work, just skip invalid references
+        assert isinstance(updated_markdown, str)
+        assert isinstance(updated_refs, list)
+    def test_reformat_references_empty_list(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test reference reformatting with empty reference list."""
+        section_markdown = "Content without citations."
+        section_references = []
+        all_references = []
+        updated_markdown, updated_refs = long_writer_agent._reformat_references(
+            section_markdown, section_references, all_references
+        )
+        assert updated_markdown == section_markdown
+        assert updated_refs == []
+    def test_reformat_references_preserves_markdown(
+        self, long_writer_agent: LongWriterAgent
+    ) -> None:
+        """Test that reference reformatting preserves markdown content."""
+        section_markdown = "## Section\n\nContent [1] with **bold** text."
+        section_references = ["[1] https://example.com"]
+        all_references = []
+        updated_markdown, _ = long_writer_agent._reformat_references(
+            section_markdown, section_references, all_references
+        )
+        assert "## Section" in updated_markdown
+        assert "**bold**" in updated_markdown
+class TestReformatSectionHeadings:
+    """Test _reformat_section_headings() method."""
+    def test_reformat_section_headings_level_2(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test that headings are reformatted to level 2."""
+        section_markdown = "## Section Title\n\nContent"
+        result = long_writer_agent._reformat_section_headings(section_markdown)
+        assert "## Section Title" in result
+    def test_reformat_section_headings_level_3(self, long_writer_agent: LongWriterAgent) -> None:
+        """Test that level 3 headings are adjusted correctly."""
+        section_markdown = "### Section Title\n\nContent"
+        result = long_writer_agent._reformat_section_headings(section_markdown)
+        # Should be adjusted to level 2
+        assert "## Section Title" in result
+    def test_reformat_section_headings_no_headings(
+        self, long_writer_agent: LongWriterAgent
+    ) -> None:
+        """Test reformatting with no headings."""
+        section_markdown = "Just content without headings."
+        result = long_writer_agent._reformat_section_headings(section_markdown)
+        assert result == section_markdown
+    def test_reformat_section_headings_preserves_content(
+        self, long_writer_agent: LongWriterAgent
+    ) -> None:
+        """Test that content is preserved during heading reformatting."""
+        section_markdown = "# Section\n\nImportant content here."
+        result = long_writer_agent._reformat_section_headings(section_markdown)
+        assert "Important content here" in result
+class TestCreateLongWriterAgent:
+    """Test create_long_writer_agent factory function."""
+    @patch("src.agents.long_writer.get_model")
+    @patch("src.agents.long_writer.LongWriterAgent")
+    def test_create_long_writer_agent_success(
+        self,
+        mock_long_writer_agent_class: MagicMock,
+        mock_get_model: MagicMock,
+        mock_model: MagicMock,
+    ) -> None:
+        """Test successful long writer agent creation."""
+        mock_get_model.return_value = mock_model
+        mock_agent_instance = MagicMock()
+        mock_long_writer_agent_class.return_value = mock_agent_instance
+        result = create_long_writer_agent()
+        assert result == mock_agent_instance
+        mock_long_writer_agent_class.assert_called_once_with(model=mock_model)
+    @patch("src.agents.long_writer.get_model")
+    @patch("src.agents.long_writer.LongWriterAgent")
+    def test_create_long_writer_agent_with_custom_model(
+        self,
+        mock_long_writer_agent_class: MagicMock,
+        mock_get_model: MagicMock,
+        mock_model: MagicMock,
+    ) -> None:
+        """Test long writer agent creation with custom model."""
+        mock_agent_instance = MagicMock()
+        mock_long_writer_agent_class.return_value = mock_agent_instance
+        result = create_long_writer_agent(model=mock_model)
+        assert result == mock_agent_instance
+        mock_long_writer_agent_class.assert_called_once_with(model=mock_model)
+        mock_get_model.assert_not_called()