Agents¶
This document provides complete documentation for each autonomous agent in Tinman's research system.
Table of Contents¶
- Overview
- BaseAgent Framework
- HypothesisEngine
- ExperimentArchitect
- ExperimentExecutor
- FailureDiscoveryAgent
- InterventionEngine
- SimulationEngine
- Agent Orchestration
Overview¶
Tinman uses six autonomous agents that work together to conduct failure research:
┌─────────────────────────────────────────────────────────────────┐
│ AGENT PIPELINE │
│ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ Hypothesis │─────▶│ Experiment │ │
│ │ Engine │ │ Architect │ │
│ └───────────────┘ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Experiment │ │
│ │ Executor │ │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Failure │ │
│ │ Discovery │ │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ ┌───────────────┐ │
│ │ Simulation │◀─────│ Intervention │ │
│ │ Engine │ │ Engine │ │
│ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Agent Summary¶
| Agent | Purpose | Inputs | Outputs |
|---|---|---|---|
| HypothesisEngine | Generate failure hypotheses | Observations, prior failures | Hypotheses |
| ExperimentArchitect | Design experiments | Hypotheses | Experiment designs |
| ExperimentExecutor | Run experiments | Experiment designs | Experiment results |
| FailureDiscoveryAgent | Classify failures | Experiment results | Discovered failures |
| InterventionEngine | Design fixes | Discovered failures | Interventions |
| SimulationEngine | Validate fixes | Interventions | Simulation results |
BaseAgent Framework¶
All agents inherit from BaseAgent, which provides common functionality.
Agent States¶
class AgentState(str, Enum):
"""Agent lifecycle states."""
IDLE = "idle" # Not running
RUNNING = "running" # Executing
PAUSED = "paused" # Temporarily stopped
COMPLETED = "completed" # Finished successfully
FAILED = "failed" # Encountered error
AgentContext¶
Context passed to all agent operations:
@dataclass
class AgentContext:
"""Context passed to agent operations."""
mode: OperatingMode # LAB/SHADOW/PRODUCTION
session_id: str # Unique session ID
parent_id: Optional[str] # Parent agent ID
metadata: dict[str, Any] # Additional context
started_at: datetime # When session started
AgentResult¶
Standard result structure returned by all agents:
@dataclass
class AgentResult:
"""Result from an agent operation."""
agent_id: str # ID of the agent
agent_type: str # Type name
success: bool # Whether operation succeeded
data: dict[str, Any] # Output data
error: Optional[str] # Error message if failed
duration_ms: int # Execution duration
created_at: datetime # When result was created
Common Methods¶
class BaseAgent(ABC):
"""Base class for all Tinman agents."""
@property
@abstractmethod
def agent_type(self) -> str:
"""Unique identifier for this agent type."""
pass
@abstractmethod
async def execute(self, context: AgentContext, **kwargs) -> AgentResult:
"""Execute the agent's primary function."""
pass
async def run(self, context: AgentContext, **kwargs) -> AgentResult:
"""Run agent with lifecycle management."""
# Handles state transitions, event publishing, error handling
def pause(self) -> None:
"""Pause agent execution."""
def resume(self) -> None:
"""Resume paused agent."""
HypothesisEngine¶
Purpose: Generate testable hypotheses about potential failure modes.
Agent Type: hypothesis_engine
Description¶
The HypothesisEngine doesn't use templates—it uses genuine LLM reasoning to:
- Analyze past failures and find patterns
- Identify unexplored attack surfaces
- Generate novel hypotheses based on observations
- Prioritize based on learned priors
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
| None required | - | - | Uses internal sources (memory graph, adaptive memory) |
Outputs¶
{
"hypothesis_count": int, # Number generated
"hypotheses": [ # List of hypotheses
{
"id": str,
"target_surface": str, # What to test
"expected_failure": str, # What might fail
"failure_class": str, # Taxonomy class
"confidence": float, # 0-1
"priority": str, # low/medium/high/critical
"rationale": str, # Why this hypothesis
"suggested_experiment": str,
}
],
"used_llm_reasoning": bool, # Whether LLM was used
}
Hypothesis Structure¶
@dataclass
class Hypothesis:
"""A hypothesis about potential failure modes."""
id: str
target_surface: str # What we're testing
expected_failure: str # What failure we expect
failure_class: FailureClass
confidence: float # 0-1 confidence
priority: str # low, medium, high, critical
rationale: str
suggested_experiment: str
evidence: list[str]
metadata: dict[str, Any]
Sources of Hypotheses¶
- LLM Reasoning - When LLM backbone is available, generates creative hypotheses
- Prior Failures - Patterns from past discoveries
- Adaptive Memory - What has worked before
- Attack Surface Analysis - Systematic enumeration of surfaces
- Failure Taxonomy - Derivation from known failure classes
Example Usage¶
from tinman.agents.hypothesis_engine import HypothesisEngine, Hypothesis
from tinman.agents.base import AgentContext
from tinman.config.modes import Mode
engine = HypothesisEngine(
graph=memory_graph,
llm_backbone=llm,
adaptive_memory=adaptive_memory,
)
context = AgentContext(mode=Mode.LAB)
result = await engine.run(context)
for hypothesis in result.data["hypotheses"]:
print(f"Hypothesis: {hypothesis['expected_failure']}")
print(f"Confidence: {hypothesis['confidence']}")
print(f"Priority: {hypothesis['priority']}")
ExperimentArchitect¶
Purpose: Design experiments to test hypotheses.
Agent Type: experiment_architect
Description¶
Converts hypotheses into concrete experiment designs with:
- Specific test cases and prompts
- Stress parameters tailored to the failure class
- Success/failure criteria
- Resource estimates
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
hypotheses | list[Hypothesis] | Yes | Hypotheses to design experiments for |
Outputs¶
{
"experiment_count": int,
"experiments": [
{
"id": str,
"hypothesis_id": str,
"name": str,
"description": str,
"stress_type": str,
"mode": str, # single, iterative, adversarial
"parameters": dict,
"estimated_runs": int,
"estimated_tokens": int,
}
],
"used_llm_design": bool,
}
ExperimentDesign Structure¶
@dataclass
class ExperimentDesign:
"""Design specification for an experiment."""
id: str
hypothesis_id: str
name: str
description: str
stress_type: str # prompt_injection, context_overflow, etc.
mode: str # single, iterative, adversarial
parameters: dict[str, Any]
constraints: dict[str, Any]
success_criteria: list[str]
failure_indicators: list[str]
test_cases: list[dict[str, Any]] # Actual prompts to run
estimated_runs: int
estimated_tokens: int
timeout_seconds: int
Stress Types by Failure Class¶
| Failure Class | Stress Types |
|---|---|
| REASONING | logical_chain, goal_conflict |
| LONG_CONTEXT | context_overflow, attention_dilution |
| TOOL_USE | tool_injection, tool_chain |
| FEEDBACK_LOOP | output_recursion, amplification |
| DEPLOYMENT | state_desync, resource_exhaustion |
Example Usage¶
from tinman.agents.experiment_architect import ExperimentArchitect
architect = ExperimentArchitect(
graph=memory_graph,
llm_backbone=llm,
)
result = await architect.run(
context,
hypotheses=hypotheses, # From HypothesisEngine
)
for experiment in result.data["experiments"]:
print(f"Experiment: {experiment['name']}")
print(f"Stress type: {experiment['stress_type']}")
print(f"Estimated runs: {experiment['estimated_runs']}")
ExperimentExecutor¶
Purpose: Run experiments by actually probing models.
Agent Type: experiment_executor
Description¶
The core research capability—takes experiment designs and runs them against target models, collecting real behavioral data.
Key capabilities: - Run test cases against models via ModelClient - Analyze responses for failure indicators using LLM - Collect traces, timing, and behavioral data - Detect failures through pattern matching and LLM analysis
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
experiments | list[ExperimentDesign] | Yes | Experiments to run |
skip_approval | bool | No | Skip approval check |
Outputs¶
{
"experiment_count": int,
"skipped_count": int,
"skipped_experiments": list[str],
"total_runs": int,
"failures_found": int,
"results": [
{
"id": str,
"experiment_id": str,
"hypothesis_id": str,
"total_runs": int,
"failures_triggered": int,
"reproduction_rate": float,
"hypothesis_validated": bool,
"confidence": float,
"notes": str,
"total_tokens": int,
"total_duration_ms": int,
}
],
}
Mode-Based Run Limits¶
| Mode | Runs | Description |
|---|---|---|
| LAB | Full | Run all estimated runs |
| SHADOW | 50% | Run half the estimated runs (min 3) |
| PRODUCTION | Min | Run minimal runs (max 3) |
Approval Integration¶
In PRODUCTION mode, experiments require human approval before execution:
# Approval is requested via ApprovalHandler
approved = await self.approval_handler.approve_experiment(
experiment_name=experiment.name,
hypothesis=experiment.hypothesis_id,
estimated_runs=experiment.estimated_runs,
estimated_cost_usd=estimated_cost,
stress_type=experiment.stress_type,
)
Example Usage¶
from tinman.agents.experiment_executor import ExperimentExecutor
executor = ExperimentExecutor(
graph=memory_graph,
model_client=model_client,
llm_backbone=llm,
approval_handler=approval_handler,
)
result = await executor.run(
context,
experiments=experiments, # From ExperimentArchitect
)
print(f"Total runs: {result.data['total_runs']}")
print(f"Failures found: {result.data['failures_found']}")
FailureDiscoveryAgent¶
Purpose: Discover and classify failures from experiment results.
Agent Type: failure_discovery
Description¶
Analyzes experiment results to extract, classify, and understand failures. Uses LLM for deep analysis when available.
Analysis capabilities: - What actually went wrong (deep analysis) - Why it went wrong (root cause) - What it means (implications) - What to do about it (recommendations)
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
results | list[ExperimentResult] | Yes | Experiment results to analyze |
Outputs¶
{
"failures_discovered": int,
"novel_failures": int,
"failures": [
{
"id": str,
"primary_class": str,
"secondary_class": str,
"severity": str, # S0-S4
"description": str,
"trigger_signature": list[str],
"reproducibility": float,
"is_novel": bool,
"classification_confidence": float,
"llm_analysis": str,
"contributing_factors": list[str],
"key_insight": str,
}
],
"used_llm_analysis": bool,
}
DiscoveredFailure Structure¶
@dataclass
class DiscoveredFailure:
"""A newly discovered failure mode."""
id: str
# Classification
primary_class: FailureClass
secondary_class: Optional[str]
severity: Severity
# Details
description: str
trigger_signature: list[str]
reproducibility: float
# Source
experiment_id: str
run_ids: list[str]
# Analysis
classification_confidence: float
causal_analysis: Optional[dict[str, Any]]
# LLM-generated insights
llm_analysis: str
contributing_factors: list[str]
key_insight: str
# Status
is_novel: bool
parent_failure_id: Optional[str]
LLM Analysis Modes¶
When LLM is available, performs two analysis passes:
- Failure Analysis - Classify and understand what went wrong
- Root Cause Analysis - Identify underlying causes
Example Usage¶
from tinman.agents.failure_discovery import FailureDiscoveryAgent
discovery = FailureDiscoveryAgent(
graph=memory_graph,
llm_backbone=llm,
adaptive_memory=adaptive_memory,
)
result = await discovery.run(
context,
results=experiment_results, # From ExperimentExecutor
)
for failure in result.data["failures"]:
print(f"[{failure['severity']}] {failure['primary_class']}")
print(f"Description: {failure['description']}")
print(f"Novel: {failure['is_novel']}")
InterventionEngine¶
Purpose: Design fixes for discovered failures.
Agent Type: intervention_engine
Description¶
Proposes concrete interventions to address discovered failures. Uses LLM for creative intervention design when available.
Intervention types: - Prompt patches - Guardrails - Parameter tuning - Tool restrictions - Circuit breakers - Human escalation
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
failures | list[DiscoveredFailure] | Yes | Failures to address |
Outputs¶
{
"intervention_count": int,
"by_risk_tier": {
"safe": int,
"review": int,
"block": int,
},
"interventions": [
{
"id": str,
"failure_id": str,
"type": str,
"name": str,
"description": str,
"risk_tier": str,
"requires_approval": bool,
"expected_gains": dict,
"expected_regressions": dict,
}
],
"used_llm_design": bool,
}
Intervention Types¶
class InterventionType(str, Enum):
"""Types of interventions."""
PROMPT_PATCH = "prompt_patch" # Modify system prompt
GUARDRAIL = "guardrail" # Add input/output filter
PARAMETER_TUNE = "parameter_tune" # Adjust model parameters
TOOL_RESTRICTION = "tool_restriction" # Restrict tool access
CONTEXT_LIMIT = "context_limit" # Limit context window
RETRY_POLICY = "retry_policy" # Change retry behavior
CIRCUIT_BREAKER = "circuit_breaker" # Add failure circuit breaker
HUMAN_REVIEW = "human_review" # Route to human review
Intervention Structure¶
@dataclass
class Intervention:
"""A proposed intervention to address a failure."""
id: str
failure_id: str
intervention_type: InterventionType
name: str
description: str
payload: dict[str, Any] # The actual fix
expected_gains: dict[str, float]
expected_regressions: dict[str, float]
risk_tier: RiskTier
risk_factors: list[str]
rationale: str
requires_approval: bool
reversible: bool
Deployment with Approval¶
# Deploy intervention with HITL approval
result = await engine.deploy_intervention(
context,
intervention=intervention,
skip_approval=False,
)
if result["deployed"]:
print("Intervention deployed successfully")
else:
print(f"Deployment failed: {result['status']}")
Example Usage¶
from tinman.agents.intervention_engine import InterventionEngine
intervention_engine = InterventionEngine(
graph=memory_graph,
llm_backbone=llm,
approval_handler=approval_handler,
)
result = await intervention_engine.run(
context,
failures=discovered_failures, # From FailureDiscoveryAgent
)
for intervention in result.data["interventions"]:
print(f"Intervention: {intervention['name']}")
print(f"Type: {intervention['type']}")
print(f"Risk: {intervention['risk_tier']}")
SimulationEngine¶
Purpose: Validate interventions via counterfactual replay.
Agent Type: simulation_engine
Description¶
Simulates interventions before deployment by replaying historical failure traces with the intervention applied.
Capabilities: - Replay prompts through model with intervention applied - Use LLM to analyze whether intervention improved behavior - Build statistical confidence in intervention effectiveness
Inputs¶
| Parameter | Type | Required | Description |
|---|---|---|---|
interventions | list[Intervention] | Yes | Interventions to simulate |
num_runs | int | No | Number of simulation runs (default: 5) |
skip_approval | bool | No | Skip approval check |
Outputs¶
{
"simulations_run": int,
"skipped_count": int,
"skipped_interventions": list[str],
"improved": int,
"deploy_recommended": int,
"results": [
{
"id": str,
"intervention_id": str,
"outcome": str, # improved/no_change/degraded/side_effect/inconclusive
"confidence": float,
"avg_failure_rate_improvement": float,
"avg_latency_impact": float,
"side_effects": list[str],
"regressions": list[str],
"deploy_recommended": bool,
"recommendation_reason": str,
"run_count": int,
}
],
}
Simulation Outcomes¶
class SimulationOutcome(str, Enum):
"""Possible simulation outcomes."""
IMPROVED = "improved" # Intervention helps
NO_CHANGE = "no_change" # No effect
DEGRADED = "degraded" # Made things worse
SIDE_EFFECT = "side_effect" # Unintended consequence
INCONCLUSIVE = "inconclusive" # Can't determine
Simulation Modes¶
| Scenario | Behavior |
|---|---|
| ModelClient + LLM | Full replay with real model, LLM analyzes results |
| LLM only | LLM estimates intervention effect without replay |
| Neither | Heuristic simulation based on intervention type |
Deployment Recommendations¶
Production mode has strict criteria: - Outcome must be IMPROVED - Confidence must be >= 0.7 - No regressions observed - Latency impact < 500ms
Example Usage¶
from tinman.agents.simulation_engine import SimulationEngine
simulation = SimulationEngine(
graph=memory_graph,
model_client=model_client,
llm_backbone=llm,
)
result = await simulation.run(
context,
interventions=interventions, # From InterventionEngine
num_runs=10,
)
for sim in result.data["results"]:
print(f"Intervention: {sim['intervention_id']}")
print(f"Outcome: {sim['outcome']}")
print(f"Deploy recommended: {sim['deploy_recommended']}")
print(f"Reason: {sim['recommendation_reason']}")
Agent Orchestration¶
The Tinman class orchestrates all agents through a research cycle.
Research Cycle Flow¶
async def research_cycle(
self,
focus: str = "",
max_hypotheses: int = 10,
max_experiments: int = 5,
max_interventions: int = 5,
) -> ResearchCycleResult:
"""Run a complete research cycle."""
# Phase 1: Generate hypotheses
hyp_result = await self.hypothesis_engine.run(context)
hypotheses = hyp_result.data["hypotheses"][:max_hypotheses]
# Phase 2: Design experiments
exp_design_result = await self.experiment_architect.run(
context, hypotheses=hypotheses
)
experiments = exp_design_result.data["experiments"][:max_experiments]
# Phase 3: Execute experiments
exec_result = await self.experiment_executor.run(
context, experiments=experiments
)
# Phase 4: Discover failures
discovery_result = await self.failure_discovery.run(
context, results=exec_result.data["results"]
)
failures = discovery_result.data["failures"]
# Phase 5: Design interventions
intervention_result = await self.intervention_engine.run(
context, failures=failures
)
interventions = intervention_result.data["interventions"][:max_interventions]
# Phase 6: Simulate interventions
sim_result = await self.simulation_engine.run(
context, interventions=interventions
)
# Phase 7: Learn (update adaptive memory)
if self.adaptive_memory:
self._update_learning(cycle_results)
return ResearchCycleResult(...)
Agent Dependencies¶
┌─────────────────────────────────────────────────────────────────┐
│ AGENT DEPENDENCIES │
│ │
│ ┌───────────────┐ │
│ │ Memory Graph │ │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┴─────────────────────────┐ │
│ │ │ │
│ ▼ ▼ ▼ ▼ ▼ ▼ │
│ Hyp ExpArch ExpExec Fail Int Sim │
│ Engine itect utor Disc Engine Engine │
│ │ │ │ │ │ │ │
│ └──────────┴──────────┴──────────┴──────────┴────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ LLM Backbone │ │
│ └───────────────┘ │
│ │
│ Approval Handler: ExperimentExecutor, InterventionEngine, │
│ SimulationEngine │
│ │
│ Model Client: ExperimentExecutor, SimulationEngine │
│ │
└─────────────────────────────────────────────────────────────────┘
Creating Agents¶
from tinman.agents import (
HypothesisEngine,
ExperimentArchitect,
ExperimentExecutor,
FailureDiscoveryAgent,
InterventionEngine,
SimulationEngine,
)
# Create with all dependencies
hypothesis_engine = HypothesisEngine(
graph=memory_graph,
llm_backbone=llm_backbone,
adaptive_memory=adaptive_memory,
event_bus=event_bus,
)
experiment_architect = ExperimentArchitect(
graph=memory_graph,
llm_backbone=llm_backbone,
event_bus=event_bus,
)
experiment_executor = ExperimentExecutor(
graph=memory_graph,
model_client=model_client,
llm_backbone=llm_backbone,
approval_handler=approval_handler,
event_bus=event_bus,
)
failure_discovery = FailureDiscoveryAgent(
graph=memory_graph,
llm_backbone=llm_backbone,
adaptive_memory=adaptive_memory,
event_bus=event_bus,
)
intervention_engine = InterventionEngine(
graph=memory_graph,
llm_backbone=llm_backbone,
approval_handler=approval_handler,
event_bus=event_bus,
)
simulation_engine = SimulationEngine(
graph=memory_graph,
model_client=model_client,
llm_backbone=llm_backbone,
approval_handler=approval_handler,
event_bus=event_bus,
)
Summary¶
| Agent | Role in Cycle | LLM-Enhanced |
|---|---|---|
| HypothesisEngine | Generate testable hypotheses | Creative hypothesis generation |
| ExperimentArchitect | Design experiments | Custom test case generation |
| ExperimentExecutor | Run experiments | Response analysis for failures |
| FailureDiscoveryAgent | Classify failures | Deep failure analysis, RCA |
| InterventionEngine | Design fixes | Creative intervention design |
| SimulationEngine | Validate fixes | Replay analysis |
All agents follow the same patterns: - Inherit from BaseAgent - Use AgentContext for mode-aware operation - Return standardized AgentResult - Integrate with MemoryGraph for knowledge persistence - Support HITL via ApprovalHandler where appropriate
Next Steps¶
- MEMORY.md - Memory graph used by agents
- HITL.md - Approval integration details
- CONFIGURATION.md - Agent configuration options