Architecture¶
This document describes Tinman's system architecture, component interactions, and data flow. It's intended for contributors and users who need to understand how the system works internally.
Table of Contents¶
- System Overview
- Component Architecture
- Data Flow
- Agent System
- Memory Graph
- HITL Infrastructure
- Event System
- Extension Points
System Overview¶
High-Level Architecture¶
┌─────────────────────────────────────────────────────────────────────────┐
│ TINMAN │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ User Interface │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │ │
│ │ │ CLI │ │ TUI │ │ Python API │ │ │
│ │ └────┬────┘ └────┬────┘ └────────┬────────┘ │ │
│ └──────────┼───────────────┼─────────────────────┼───────────────────┘ │
│ │ │ │ │
│ └───────────────┴──────────┬──────────┘ │
│ │ │
│ ┌─────────────────────────────────────▼────────────────────────────┐ │
│ │ Tinman Orchestrator │ │
│ │ (tinman.py) │ │
│ └─────────────────────────────────────┬────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────┼────────────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────┐ ┌──────────┐│
│ │ Agents │ │ Infrastructure │ │ Reasoning ││
│ │ │ │ │ │ ││
│ │ Hypothesis │◀────────────▶│ Memory Graph │◀────────▶│ LLM ││
│ │ Architect │ │ Event Bus │ │ Backbone ││
│ │ Executor │ │ Approval │ │ Adaptive ││
│ │ Discovery │ │ Risk Eval │ │ Memory ││
│ │ Intervene │ │ Control Plane │ │ ││
│ │ Simulate │ │ │ │ ││
│ └─────────────┘ └─────────────────┘ └──────────┘│
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Persistence Layer │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ PostgreSQL │ │ Model Clients │ │ │
│ │ │ (Knowledge) │ │ (OpenAI/etc) │ │ │
│ │ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Key Design Principles¶
- Separation of Concerns
- Agents handle domain logic (research tasks)
- Infrastructure handles cross-cutting concerns (approval, events, persistence)
-
Reasoning handles intelligence (LLM interaction, pattern learning)
-
Event-Driven Communication
- Components communicate via EventBus
- Loose coupling enables extensibility
-
Audit trail via event history
-
Mode-Aware Behavior
- All components respect operating mode
- Same code, different permissions
-
Progressive rollout supported
-
Human-in-the-Loop by Design
- Approval gates at critical points
- Risk evaluation for all actions
- Configurable autonomy levels
Component Architecture¶
Directory Structure¶
tinman/
├── __init__.py # Package exports
├── tinman.py # Main orchestrator class
│
├── agents/ # Autonomous research agents
│ ├── __init__.py
│ ├── base.py # BaseAgent abstract class
│ ├── hypothesis_engine.py # Generates failure hypotheses
│ ├── experiment_architect.py # Designs experiments
│ ├── experiment_executor.py # Runs experiments
│ ├── failure_discovery.py # Classifies failures
│ ├── intervention_engine.py # Proposes fixes
│ └── simulation_engine.py # Validates interventions
│
├── config/ # Configuration management
│ ├── __init__.py
│ ├── modes.py # Operating modes (LAB/SHADOW/PRODUCTION)
│ └── settings.py # Settings dataclasses
│
├── core/ # Infrastructure components
│ ├── __init__.py
│ ├── approval_gate.py # Tracks approval requests
│ ├── approval_handler.py # Coordinates HITL approvals
│ ├── control_plane.py # System state management
│ ├── event_bus.py # Pub/sub event system
│ ├── risk_evaluator.py # Risk tier assessment
│ ├── tools.py # Guarded tool execution
│ ├── risk_policy.py # Policy-driven risk matrix
│ ├── cost_tracker.py # Budget enforcement
│ └── metrics.py # Prometheus metrics
│
├── db/ # Database layer
│ ├── __init__.py
│ ├── connection.py # SQLAlchemy connection
│ ├── models.py # ORM models
│ └── audit.py # Durable audit trail (AuditLog, ApprovalDecision)
│
├── ingest/ # Trace ingestion adapters
│ ├── __init__.py
│ ├── base.py # TraceAdapter base class
│ ├── otlp.py # OpenTelemetry OTLP adapter
│ ├── datadog.py # Datadog APM adapter
│ ├── xray.py # AWS X-Ray adapter
│ ├── json_adapter.py # Generic JSON adapter
│ └── registry.py # Auto-detection and registry
│
├── service/ # HTTP API service
│ ├── __init__.py
│ ├── app.py # FastAPI application
│ └── models.py # Pydantic request/response models
│
├── integrations/ # External integrations
│ ├── __init__.py
│ ├── model_client.py # Base model client
│ ├── openai_client.py # OpenAI integration
│ ├── anthropic_client.py # Anthropic integration
│ └── pipeline_adapter.py # Hook into existing pipelines
│
├── memory/ # Knowledge graph
│ ├── __init__.py
│ ├── graph.py # MemoryGraph implementation
│ ├── models.py # Node/Edge data models
│ └── repository.py # Graph persistence
│
├── reasoning/ # LLM-powered reasoning
│ ├── __init__.py
│ ├── llm_backbone.py # Core LLM reasoning
│ ├── adaptive_memory.py # Pattern learning
│ ├── insight_synthesizer.py # Report generation
│ └── prompts.py # Prompt templates
│
├── reporting/ # Partner-facing reports
│ ├── __init__.py
│ ├── base.py # BaseReport class
│ ├── executive.py # ExecutiveSummaryReport
│ ├── technical.py # TechnicalAnalysisReport
│ ├── compliance.py # ComplianceReport
│ └── export.py # Multi-format export (JSON, MD, HTML, PDF, CSV)
│
├── taxonomy/ # Failure classification
│ ├── __init__.py
│ ├── failure_types.py # FailureClass, Severity enums
│ ├── classifiers.py # Classification logic
│ └── causal_linker.py # Root cause analysis
│
├── tui/ # Terminal UI
│ ├── __init__.py
│ ├── app.py # Textual app
│ └── styles.tcss # CSS styling
│
└── utils/ # Utilities
├── __init__.py
├── id_gen.py # ID generation
├── logging_setup.py # Logging configuration
└── time_utils.py # Time utilities
Component Responsibilities¶
| Component | Responsibility | Key Classes |
|---|---|---|
| Orchestrator | Coordinates research cycles | Tinman |
| Agents | Execute research tasks | BaseAgent, HypothesisEngine, etc. |
| Memory | Persistent knowledge store | MemoryGraph, Node, Edge |
| Reasoning | LLM-powered intelligence | LLMBackbone, AdaptiveMemory |
| Core | Infrastructure services | ApprovalHandler, EventBus, RiskEvaluator, CostTracker, Metrics |
| Service | HTTP API | FastAPI app, Pydantic models |
| Ingest | Trace ingestion | OTLPAdapter, DatadogAdapter, XRayAdapter |
| Config | Settings and modes | Settings, Mode |
| Taxonomy | Failure classification | FailureClass, Severity |
| Integrations | External systems | ModelClient, PipelineAdapter |
Data Flow¶
Research Cycle Data Flow¶
┌──────────────────────────────────────────────────────────────────────────┐
│ RESEARCH CYCLE DATA FLOW │
└──────────────────────────────────────────────────────────────────────────┘
┌─────────────┐ observations ┌─────────────────┐
│ Memory │◀─────────────────────│ Hypothesis │
│ Graph │ │ Engine │
│ │─────hypotheses───────▶│ │
└─────────────┘ └────────┬────────┘
▲ │
│ List[Hypothesis]
│ │
│ ▼
│ ┌─────────────────┐
│ │ Experiment │
│◀──────experiments────────────│ Architect │
│ │ │
│ └────────┬────────┘
│ │
│ List[ExperimentDesign]
│ │
│ ▼
│ ┌─────────────────┐
│ ┌─────│ Experiment │
│ │ │ Executor │──────┐
│ │ │ │ │
│ │ └────────┬────────┘ │
│ │ │ │
│ [APPROVAL] List[Result] [LLM CALLS]
│ │ │ │
│ ▼ ▼ ▼
│ ┌─────────────┐ ┌─────────────────┐ ┌─────────┐
│ │ Approval │ │ Failure │ │ Model │
│ │ Handler │ │ Discovery │ │ Client │
│ └─────────────┘ └────────┬────────┘ └─────────┘
│ │
│◀─────────failures────────List[DiscoveredFailure]
│ │
│ ▼
│ ┌─────────────────┐
│ ┌─────│ Intervention │
│ │ │ Engine │
│ │ └────────┬────────┘
│ │ │
│ [APPROVAL] List[Intervention]
│ │ │
│◀──────interventions────┘ ▼
│ ┌─────────────────┐
│ │ Simulation │
│ │ Engine │──────[APPROVAL]
│ └────────┬────────┘
│ │
│◀────────simulations─────List[SimulationResult]
│
▼
┌─────────────┐
│ Updated │
│ Graph │
└─────────────┘
Data Transformations¶
Each agent transforms data:
| Agent | Input | Output |
|---|---|---|
| HypothesisEngine | Prior knowledge, observations | List[Hypothesis] |
| ExperimentArchitect | List[Hypothesis] | List[ExperimentDesign] |
| ExperimentExecutor | List[ExperimentDesign] | List[ExperimentResult] |
| FailureDiscovery | List[ExperimentResult] | List[DiscoveredFailure] |
| InterventionEngine | List[DiscoveredFailure] | List[Intervention] |
| SimulationEngine | List[Intervention] | List[SimulationResult] |
Data Models¶
Key data structures:
@dataclass
class Hypothesis:
id: str
target_surface: str # e.g., "reasoning", "tool_use"
expected_failure: str # Predicted failure type
failure_class: FailureClass
confidence: float # 0.0 - 1.0
priority: int
rationale: str
suggested_experiment: str
@dataclass
class ExperimentDesign:
id: str
hypothesis_id: str
stress_type: str
parameters: dict
expected_outcome: str
estimated_runs: int
estimated_cost_usd: float
timeout_seconds: int
@dataclass
class DiscoveredFailure:
id: str
experiment_id: str
failure_class: FailureClass
subtype: str
severity: Severity
description: str
root_cause: str
reproducibility: float
evidence: list[dict]
@dataclass
class Intervention:
id: str
failure_id: str
type: InterventionType
description: str
implementation: dict
estimated_effectiveness: float
risk_tier: RiskTier
rollback_plan: str
Agent System¶
Base Agent Design¶
All agents inherit from BaseAgent:
class BaseAgent(ABC):
"""Abstract base class for all research agents."""
def __init__(
self,
llm: Optional[LLMBackbone] = None,
graph: Optional[MemoryGraph] = None,
event_bus: Optional[EventBus] = None,
approval_handler: Optional[ApprovalHandler] = None,
config: Optional[dict] = None,
):
self.llm = llm
self.graph = graph
self.event_bus = event_bus
self.approval_handler = approval_handler
self.config = config or {}
self.state = AgentState.IDLE
@property
@abstractmethod
def agent_type(self) -> str:
"""Return agent type identifier."""
pass
async def run(self, context: AgentContext, **kwargs) -> AgentResult:
"""Execute the agent with lifecycle management."""
self.state = AgentState.RUNNING
try:
result = await self.execute(context, **kwargs)
self.state = AgentState.COMPLETED
return result
except Exception as e:
self.state = AgentState.FAILED
raise
@abstractmethod
async def execute(self, context: AgentContext, **kwargs) -> AgentResult:
"""Implement agent-specific logic."""
pass
Agent Context¶
Agents receive context for execution:
@dataclass
class AgentContext:
mode: Mode # Operating mode
session_id: str # Research session
cycle_id: Optional[str] # Current cycle
focus: Optional[str] # Research focus area
prior_results: dict # Results from previous agents
metadata: dict # Additional context
Agent Result¶
Agents return standardized results:
@dataclass
class AgentResult:
agent_type: str
success: bool
data: dict # Agent-specific output
errors: list[str]
warnings: list[str]
metrics: dict # Performance metrics
duration_ms: int
Agent Communication¶
Agents communicate via:
- Direct data passing - Orchestrator passes results between agents
- Memory Graph - Persistent shared state
- Event Bus - Async notifications
# In research_cycle()
hypothesis_result = await self.hypothesis_engine.run(context)
hypotheses = hypothesis_result.data["hypotheses"]
# Pass to next agent
architect_result = await self.experiment_architect.run(
context,
hypotheses=hypotheses
)
Memory Graph¶
Graph Model¶
┌─────────────────────────────────────────────────────────────────┐
│ MEMORY GRAPH MODEL │
│ │
│ Node Types: │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ HYPOTHESIS │ │ EXPERIMENT │ │ FAILURE │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │INTERVENTION│ │ DEPLOYMENT │ │ ROLLBACK │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
│ Edge Relations: │
│ ───────────────────────────────────────────────── │
│ TESTED_IN: Hypothesis → Experiment │
│ OBSERVED_IN: Failure → Experiment │
│ ADDRESSED_BY: Failure → Intervention │
│ EVOLVED_INTO: Failure → Failure │
│ DEPLOYED_AS: Intervention → Deployment │
│ ROLLED_BACK_BY: Deployment → Rollback │
│ REGRESSED_AS: Intervention → Failure │
│ │
└─────────────────────────────────────────────────────────────────┘
Temporal Semantics¶
Each node has temporal validity:
@dataclass
class Node:
id: str
type: NodeType
data: dict
valid_from: datetime # When knowledge became valid
valid_to: Optional[datetime] # When invalidated (None = still valid)
created_at: datetime
session_id: str
Graph Operations¶
class MemoryGraph:
def add_node(self, type: NodeType, data: dict) -> str:
"""Add a node to the graph."""
def add_edge(self, source: str, target: str, relation: EdgeRelation) -> str:
"""Add an edge between nodes."""
def get_node(self, node_id: str) -> Optional[Node]:
"""Get node by ID."""
def get_neighbors(self, node_id: str, relation: Optional[EdgeRelation] = None) -> list[Node]:
"""Get connected nodes."""
def snapshot_at(self, timestamp: datetime, node_type: Optional[NodeType] = None) -> list[Node]:
"""Get all valid nodes at a point in time."""
def get_lineage(self, node_id: str, direction: str = "both") -> list[Node]:
"""Trace causal chain from a node."""
def invalidate_node(self, node_id: str) -> None:
"""Soft-delete by setting valid_to."""
Example Queries¶
# Get all unresolved failures
failures = graph.get_nodes(
type=NodeType.FAILURE,
filter=lambda n: n.data.get("status") != "resolved"
)
# Find what addressed a failure
interventions = graph.get_neighbors(
failure_id,
relation=EdgeRelation.ADDRESSED_BY
)
# Get historical state at deployment
deployment_snapshot = graph.snapshot_at(
deployment_time,
node_type=NodeType.FAILURE
)
# Trace failure lineage
lineage = graph.get_lineage(failure_id, direction="ancestors")
HITL Infrastructure¶
Approval Flow Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ APPROVAL ARCHITECTURE │
│ │
│ Agent Request │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ApprovalHandler │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Risk │───▶│ Approval │ │ │
│ │ │ Evaluator │ │ Gate │ │ │
│ │ └─────────────┘ └─────────────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Risk Tier │ │ Pending │ │ │
│ │ │ SAFE/REVIEW │ │ Requests │ │ │
│ │ │ /BLOCK │ │ Queue │ │ │
│ │ └──────┬──────┘ └─────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ UI Callback Router │ │ │
│ │ │ │ │ │
│ │ │ Primary: TUI Modal │ │ │
│ │ │ Fallback: CLI Prompt │ │ │
│ │ │ Custom: User-registered callback │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Approved / Rejected │
│ │ │
│ ▼ │
│ Agent Proceeds / Aborts │
│ │
└─────────────────────────────────────────────────────────────────┘
Risk Evaluation¶
class RiskEvaluator:
def evaluate(self, action: Action, mode: Mode) -> RiskAssessment:
"""Evaluate risk and return tier."""
# Check hard blocks
if action.type in BLOCKED_ACTIONS:
return RiskAssessment(tier=RiskTier.BLOCK, ...)
# Check mode-specific rules
if mode == Mode.PRODUCTION:
if action.type in REVIEW_REQUIRED_IN_PROD:
return RiskAssessment(tier=RiskTier.REVIEW, ...)
# Evaluate based on factors
score = self._compute_risk_score(action, mode)
tier = self._score_to_tier(score)
return RiskAssessment(
tier=tier,
severity=action.predicted_severity,
reasoning=self._explain_decision(action, tier),
auto_approve=(tier == RiskTier.SAFE),
)
Approval Context¶
@dataclass
class ApprovalContext:
id: str
action_type: ActionType
action_description: str
action_details: dict
risk_tier: RiskTier
severity: Severity
estimated_cost_usd: Optional[float]
is_reversible: bool
rollback_plan: str
requester_agent: str
timeout_seconds: int
status: ApprovalStatus
Event System¶
Event Bus Architecture¶
class EventBus:
"""Pub/sub event system for component communication."""
def __init__(self):
self._subscribers: dict[str, list[Callable]] = {}
self._history: list[Event] = []
def subscribe(self, topic: str, handler: Callable) -> None:
"""Subscribe to a topic."""
def publish(self, topic: str, payload: dict) -> None:
"""Publish an event to a topic."""
def get_history(self, topic: Optional[str] = None) -> list[Event]:
"""Get event history."""
Standard Topics¶
class Topics:
# Hypothesis events
HYPOTHESIS_CREATED = "hypothesis.created"
# Experiment events
EXPERIMENT_DESIGNED = "experiment.designed"
EXPERIMENT_STARTED = "experiment.started"
EXPERIMENT_COMPLETED = "experiment.completed"
# Failure events
FAILURE_DISCOVERED = "failure.discovered"
FAILURE_CLASSIFIED = "failure.classified"
# Intervention events
INTERVENTION_PROPOSED = "intervention.proposed"
INTERVENTION_APPROVED = "intervention.approved"
INTERVENTION_REJECTED = "intervention.rejected"
INTERVENTION_DEPLOYED = "intervention.deployed"
# Approval events
APPROVAL_REQUESTED = "approval.requested"
APPROVAL_GRANTED = "approval.granted"
APPROVAL_DENIED = "approval.denied"
# System events
CYCLE_STARTED = "cycle.started"
CYCLE_COMPLETED = "cycle.completed"
MODE_CHANGED = "mode.changed"
Event Payload¶
@dataclass
class Event:
id: str
topic: str
payload: dict
timestamp: datetime
source: str # Publishing component
correlation_id: Optional[str] # For tracing
Extension Points¶
Custom Model Client¶
from tinman.integrations.model_client import ModelClient
class MyCustomClient(ModelClient):
"""Custom LLM provider integration."""
async def complete(self, messages: list[dict], **kwargs) -> str:
"""Implement completion logic."""
response = await my_api.chat(messages)
return response.content
async def complete_structured(
self,
messages: list[dict],
schema: dict,
**kwargs
) -> dict:
"""Implement structured output."""
response = await my_api.chat(messages, response_format=schema)
return json.loads(response.content)
Custom Agent¶
from tinman.agents.base import BaseAgent, AgentContext, AgentResult
class MyCustomAgent(BaseAgent):
"""Custom research agent."""
@property
def agent_type(self) -> str:
return "my_custom_agent"
async def execute(self, context: AgentContext, **kwargs) -> AgentResult:
# Access shared resources
observations = self.graph.get_nodes(type=NodeType.FAILURE)
# Use LLM reasoning
if self.llm:
analysis = await self.llm.reason(
mode=ReasoningMode.ANALYZE,
context={"observations": observations}
)
# Request approval if needed
if self.approval_handler:
approved = await self.approval_handler.request_approval(
action_type=ActionType.CONFIG_CHANGE,
description="My custom action",
details={...}
)
if not approved:
return AgentResult(success=False, data={}, errors=["Not approved"])
# Publish events
if self.event_bus:
self.event_bus.publish("my_agent.completed", {...})
return AgentResult(
agent_type=self.agent_type,
success=True,
data={"result": ...},
errors=[],
warnings=[],
metrics={},
duration_ms=100
)
Custom Approval UI¶
from tinman.core.approval_handler import ApprovalContext
async def my_approval_callback(context: ApprovalContext) -> bool:
"""Custom approval UI implementation."""
# Display approval request (e.g., Slack, web UI, etc.)
message = f"""
Approval Required:
Action: {context.action_description}
Risk: {context.risk_tier.value}
Cost: ${context.estimated_cost_usd:.2f}
"""
# Wait for human decision
decision = await my_ui.prompt_user(message)
return decision == "approve"
# Register with Tinman
tinman.register_approval_ui(my_approval_callback)
Custom Failure Classifier¶
from tinman.taxonomy.classifiers import BaseClassifier
class MyDomainClassifier(BaseClassifier):
"""Domain-specific failure classification."""
def classify(self, failure_data: dict) -> tuple[FailureClass, str, Severity]:
"""Classify a failure."""
# Your classification logic
if "my_domain_pattern" in failure_data.get("description", ""):
return (
FailureClass.REASONING,
"MY_DOMAIN_SPECIFIC_SUBTYPE",
Severity.S2
)
# Fallback to base classifier
return super().classify(failure_data)
Configuration¶
Settings Hierarchy¶
@dataclass
class Settings:
mode: Mode
database: DatabaseSettings
models: ModelSettings
research: ResearchSettings
experiments: ExperimentSettings
risk: RiskSettings
approval: ApprovalSettings
shadow: ShadowSettings
logging: LoggingSettings
Loading Priority¶
- Defaults (in code)
- Config file (
.tinman/config.yaml) - Environment variables (
${VAR}substitution) - CLI arguments
- Programmatic overrides
# Config file with env var substitution
models:
providers:
openai:
api_key: ${OPENAI_API_KEY} # From environment
Performance Considerations¶
Async Execution¶
All agent operations are async:
# Agents can run concurrently where appropriate
results = await asyncio.gather(
agent1.run(context),
agent2.run(context),
)
Database Pooling¶
Connection pooling for efficiency:
LLM Cost Control¶
Built-in cost tracking and limits:
experiments:
cost_limit_usd: 10.0 # Per cycle limit
# Tracked in results
result.metrics["llm_cost_usd"]
Service Mode Architecture¶
FastAPI Service¶
Tinman can run as an HTTP service for production deployments:
┌─────────────────────────────────────────────────────────────────┐
│ TINMAN SERVICE MODE │
│ │
│ HTTP Clients │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ FastAPI App │ │
│ │ │ │
│ │ /health - Health checks │ │
│ │ /ready - Kubernetes readiness │ │
│ │ /live - Kubernetes liveness │ │
│ │ /status - Current Tinman state │ │
│ │ /research/cycle - Run research cycle │ │
│ │ /approvals/* - Manage approvals │ │
│ │ /discuss - Interactive discussion │ │
│ │ /mode - Mode management │ │
│ │ /metrics - Prometheus metrics │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Tinman Orchestrator │
│ │
└─────────────────────────────────────────────────────────────────┘
Starting the Service¶
# CLI command
tinman serve --host 0.0.0.0 --port 8000
# Or directly with uvicorn
uvicorn tinman.service.app:app --host 0.0.0.0 --port 8000
Cost Tracking¶
Budget Enforcement Architecture¶
from tinman.core.cost_tracker import CostTracker, BudgetConfig, BudgetPeriod
# Configure budget
config = BudgetConfig(
limit_usd=100.0, # Max spend
period=BudgetPeriod.DAILY, # Reset daily
warn_threshold=0.8, # Warn at 80%
hard_limit=True, # Block when exceeded
)
tracker = CostTracker(budget_config=config)
# Enforce before operations
tracker.enforce_budget(estimated_cost=5.0)
# Record actual costs
tracker.record_cost(
amount_usd=4.50,
source="llm_call",
model="claude-3-opus",
operation="research",
)
Cost Categories¶
| Source | Description |
|---|---|
llm_call | LLM API calls |
research_cycle | Full research cycle |
experiment | Individual experiment |
tool_call | External tool invocation |
Metrics & Observability¶
Prometheus Metrics¶
Tinman exposes comprehensive Prometheus metrics:
from tinman.core.metrics import start_metrics_server, get_metrics
# Start metrics server
start_metrics_server(port=9090)
Key Metrics:
| Metric | Type | Description |
|---|---|---|
tinman_research_cycles_total | Counter | Total research cycles |
tinman_failures_discovered_total | Counter | Failures by severity/class |
tinman_approval_decisions_total | Counter | Approvals by decision/tier |
tinman_cost_usd_total | Counter | Costs by source/model |
tinman_llm_requests_total | Counter | LLM requests by model/status |
tinman_llm_latency_seconds | Histogram | LLM request latency |
tinman_tool_executions_total | Counter | Tool calls by status |
tinman_pending_approvals | Gauge | Current pending approvals |
tinman_current_mode | Gauge | Active operating mode |
Trace Ingestion¶
Supported Formats¶
Tinman can ingest traces from multiple observability systems:
┌─────────────────────────────────────────────────────────────────┐
│ TRACE INGESTION │
│ │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ OTLP │ │ Datadog │ │ X-Ray │ │ JSON │ │
│ │ Traces │ │ APM │ │ Traces │ │ Generic │ │
│ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │
│ │ │ │ │ │
│ └──────────────┴──────────────┴──────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Auto-Detect │ │
│ │ Registry │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Unified Trace │ │
│ │ Model │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ Tinman Analysis │
│ │
└─────────────────────────────────────────────────────────────────┘
Usage¶
from tinman.ingest import parse_traces, OTLPAdapter
# Auto-detect format
traces = parse_traces(raw_data)
# Or use specific adapter
adapter = OTLPAdapter()
traces = list(adapter.parse(otlp_data))
# Analyze traces
for trace in traces:
for span in trace.error_spans:
print(f"Error: {span.name} - {span.status_message}")
Audit Trail¶
Durable Audit Logging¶
All consequential actions are persisted:
from tinman.db.audit import AuditLogger
audit = AuditLogger(session)
# Query activity
logs = audit.query(
event_types=["approval_decision", "mode_transition"],
since=datetime.now() - timedelta(hours=24),
)
Audit Tables:
| Table | Purpose |
|---|---|
audit_logs | Immutable event log |
approval_decisions | Human approval records |
mode_transitions | Mode change history |
tool_executions | Tool call records |
Risk Policy¶
Policy-Driven Risk Evaluation¶
Risk decisions are configurable via YAML:
# risk_policy.yaml
base_matrix:
lab:
S0: safe
S1: safe
S2: review
S3: review
S4: block
shadow:
S0: safe
S1: review
S2: review
S3: block
S4: block
production:
S0: review
S1: review
S2: block
S3: block
S4: block
action_overrides:
DEPLOY_INTERVENTION:
production: block
DESTRUCTIVE_TEST:
shadow: block
production: block
Guarded Tool Execution¶
All tool calls go through the safety pipeline:
from tinman.core.tools import guarded_call, ToolRegistry
@ToolRegistry.register(
name="search",
risk_level=ToolRiskLevel.LOW,
)
async def search_tool(query: str) -> list[str]:
return await do_search(query)
# Execution is automatically guarded
result = await guarded_call(
search_tool,
action_type=ActionType.TOOL_CALL,
description="Search for relevant documents",
approval_handler=handler,
mode=Mode.PRODUCTION,
query="AI safety",
)
Next Steps¶
- AGENTS.md - Detailed agent documentation
- MEMORY.md - Memory graph deep dive
- HITL.md - Approval system details
- INTEGRATION.md - Integration patterns
- PRODUCTION.md - Production deployment guide