Human-in-the-Loop (HITL) Approval System¶

This document provides complete documentation for Tinman's human-in-the-loop approval system, which ensures appropriate oversight of AI research and intervention decisions.

Table of Contents¶

Overview
Risk Tiers
Risk Evaluation
Approval Flow
Approval Handler
Approval Modes
UI Integration
Custom Approval Workflows
Best Practices

Overview¶

The HITL system is Tinman's safety mechanism that ensures humans remain in control of consequential decisions while allowing low-risk actions to proceed autonomously.

Core Principles¶

Risk-Proportionate Oversight - More risky actions require more oversight
Mode-Aware - Approval requirements vary by operating mode
Transparent - All decisions are logged with reasoning
Configurable - Teams can customize approval thresholds
Fail-Safe - When in doubt, require approval

System Components¶

┌─────────────────────────────────────────────────────────────────┐
│                    HITL SYSTEM ARCHITECTURE                      │
│                                                                  │
│  ┌──────────┐    ┌───────────────┐    ┌─────────────────┐      │
│  │  Agent   │───▶│   Approval    │───▶│ Risk Evaluator  │      │
│  │ Request  │    │   Handler     │    │                 │      │
│  └──────────┘    └───────┬───────┘    └────────┬────────┘      │
│                          │                      │                │
│                          │                      ▼                │
│                          │            ┌─────────────────┐       │
│                          │            │   RiskAssessment │       │
│                          │            │  (tier, severity)│       │
│                          │            └────────┬────────┘       │
│                          │                     │                 │
│                          ▼                     ▼                 │
│                  ┌───────────────┐    ┌─────────────────┐       │
│                  │ Approval Gate │◀───│ Decision Logic  │       │
│                  │  (tracking)   │    │ (SAFE/REVIEW/   │       │
│                  └───────┬───────┘    │  BLOCK)         │       │
│                          │            └─────────────────┘       │
│                          │                                       │
│                          ▼                                       │
│                  ┌───────────────┐                               │
│                  │  UI Callback  │                               │
│                  │ (TUI/CLI/API) │                               │
│                  └───────────────┘                               │
└─────────────────────────────────────────────────────────────────┘

Risk Tiers¶

Every action in Tinman is classified into one of three risk tiers:

SAFE¶

Definition: Low-risk actions that can proceed autonomously.

Behavior: - Auto-approved without human interaction - Logged for audit trail - Publish event for monitoring

Examples: - Running read-only experiments - Generating hypotheses - Creating reports - S0-S1 severity findings

RiskTier.SAFE  # Proceed autonomously

REVIEW¶

Definition: Medium-risk actions requiring human approval.

Behavior: - Presented to human via UI - Waits for explicit approval/rejection - Timeout defaults to rejection - Full context provided for decision

Examples: - Deploying prompt mutations - Running costly experiments - S2-S3 severity findings - Any action in PRODUCTION mode

RiskTier.REVIEW  # Requires human approval

BLOCK¶

Definition: High-risk actions that are never allowed.

Behavior: - Automatically rejected - No human override available - Logged with warning - Alert generated

Examples: - Destructive tool calls - Safety filter modifications - S4 severity findings - Skip-mode transitions (LAB → PRODUCTION)

RiskTier.BLOCK  # Always rejected

Tier Decision Matrix¶

Severity	LAB	SHADOW	PRODUCTION
S0	SAFE	SAFE	SAFE
S1	SAFE	SAFE	SAFE
S2	SAFE	SAFE	REVIEW
S3	REVIEW	REVIEW	REVIEW
S4	REVIEW	BLOCK	BLOCK

Risk Evaluation¶

The RiskEvaluator component assesses actions and assigns risk tiers.

Evaluation Process¶

Action
  │
  ├── Is action type blocked?
  │   └── Yes → BLOCK (S4)
  │
  ├── Affects safety filters?
  │   └── Yes → BLOCK (S4)
  │
  ├── Mode-specific evaluation
  │   ├── LAB → Usually SAFE
  │   ├── SHADOW → Review S3+
  │   └── PRODUCTION → Review S2+, certain types
  │
  └── Return RiskAssessment

Action Types¶

class ActionType(str, Enum):
    """Types of actions that can be risk-evaluated."""
    PROMPT_MUTATION = "prompt_mutation"         # Modify prompts
    TOOL_POLICY_CHANGE = "tool_policy_change"   # Change tool permissions
    MEMORY_GATING = "memory_gating"             # Modify memory access
    FINE_TUNE = "fine_tune"                     # Fine-tuning operations
    CONFIG_CHANGE = "config_change"             # Configuration changes
    DESTRUCTIVE_TOOL_CALL = "destructive_tool_call"  # Dangerous operations
    SAFETY_FILTER_CHANGE = "safety_filter_change"    # Safety modifications

Blocked Actions¶

These action types are always blocked:

BLOCKED_ACTIONS = {
    ActionType.DESTRUCTIVE_TOOL_CALL,
}

Production-Review Actions¶

These action types always require review in PRODUCTION mode:

REVIEW_REQUIRED_IN_PROD = {
    ActionType.PROMPT_MUTATION,
    ActionType.TOOL_POLICY_CHANGE,
    ActionType.SAFETY_FILTER_CHANGE,
    ActionType.FINE_TUNE,
}

RiskAssessment Structure¶

@dataclass
class RiskAssessment:
    """Result of risk evaluation."""
    tier: RiskTier              # SAFE, REVIEW, or BLOCK
    severity: Severity          # S0-S4
    reasoning: str              # Why this tier was assigned
    requires_approval: bool     # Whether human approval needed
    auto_approve: bool          # Whether to auto-approve if SAFE
    details: dict[str, Any]     # Additional context

Using the RiskEvaluator¶

from tinman.core.risk_evaluator import (
    RiskEvaluator, Action, ActionType, Severity
)
from tinman.config.modes import Mode

evaluator = RiskEvaluator(
    detailed_mode=False,        # Simple 3-tier model
    auto_approve_safe=True,     # Auto-approve SAFE actions
    block_on_destructive=True,  # Block destructive actions
)

# Create action to evaluate
action = Action(
    action_type=ActionType.PROMPT_MUTATION,
    target_surface="production",
    payload={"prompt": "New system prompt"},
    predicted_severity=Severity.S2,
    estimated_cost=0.50,
    is_reversible=True,
)

# Evaluate risk
assessment = evaluator.evaluate(action, Mode.PRODUCTION)

print(f"Tier: {assessment.tier}")           # RiskTier.REVIEW
print(f"Severity: {assessment.severity}")   # Severity.S2
print(f"Reasoning: {assessment.reasoning}") # "Action type prompt_mutation requires review in production"

Severity Computation¶

For detailed severity scoring:

severity = evaluator.compute_severity(
    failure_class="GOAL_DRIFT",
    reproducibility=0.8,        # 80% reproducible
    impact_scope=["chat", "api", "webhook"],
    is_safety_related=False,
)
# Returns appropriate S0-S4 based on criteria

Approval Flow¶

Complete Flow Diagram¶

Agent Action Request
        │
        ▼
┌───────────────────┐
│ ApprovalHandler   │
│ request_approval()│
└─────────┬─────────┘
          │
          ▼
┌───────────────────┐
│  RiskEvaluator    │
│  evaluate()       │
└─────────┬─────────┘
          │
          ▼
    ┌─────────────┐
    │ Risk Tier?  │
    └──────┬──────┘
           │
    ┌──────┼──────┬──────────┐
    │      │      │          │
    ▼      ▼      ▼          ▼
  BLOCK  SAFE   REVIEW    LAB+safe?
    │      │      │          │
    │      │      │          │
    ▼      ▼      │          ▼
 Reject  Auto    │      Auto-approve
 (log)   Approve │       (lab)
           │      │          │
           │      ▼          │
           │ ┌──────────┐    │
           │ │ Present  │    │
           │ │ to Human │    │
           │ └────┬─────┘    │
           │      │          │
           │   ┌──┴──┐       │
           │   ▼     ▼       │
           │ Approve Reject  │
           │   │      │      │
           └───┼──────┼──────┘
               │      │
               ▼      ▼
            Execute  Abort

Flow Steps¶

Agent Request - Agent calls request_approval() with action details
Risk Evaluation - RiskEvaluator assesses the action
Tier Dispatch - Different handling based on tier
Human Presentation - For REVIEW, present to UI
Decision - Human approves or rejects
Execution - If approved, proceed; if rejected, abort

Request Lifecycle¶

CREATED → PENDING → [APPROVED | REJECTED | TIMED_OUT | BLOCKED]

Approval Handler¶

The ApprovalHandler is the central coordination point for all HITL approvals.

Initialization¶

from tinman.core.approval_handler import ApprovalHandler, ApprovalMode
from tinman.config.modes import Mode

handler = ApprovalHandler(
    mode=Mode.PRODUCTION,
    approval_mode=ApprovalMode.INTERACTIVE,
    auto_approve_in_lab=True,
    cost_threshold_usd=5.0,
)

Constructor Parameters¶

Parameter	Type	Default	Description
`mode`	`Mode`	`LAB`	Operating mode
`approval_mode`	`ApprovalMode`	`INTERACTIVE`	How approvals are handled
`risk_evaluator`	`RiskEvaluator`	`None`	Custom evaluator (created if None)
`approval_gate`	`ApprovalGate`	`None`	Custom gate (created if None)
`event_bus`	`EventBus`	`None`	For publishing events
`auto_approve_in_lab`	`bool`	`True`	Auto-approve REVIEW in LAB
`cost_threshold_usd`	`float`	`5.0`	Cost threshold for review

Request Approval¶

approved = await handler.request_approval(
    action_type=ActionType.PROMPT_MUTATION,
    description="Inject safety prefix into system prompt",
    details={"prefix": "Always be helpful..."},
    estimated_cost_usd=0.50,
    estimated_duration_ms=100,
    affected_systems=["chat_api"],
    is_reversible=True,
    rollback_plan="Remove prefix from prompt",
    requester_agent="intervention_engine",
    predicted_severity=Severity.S2,
    timeout_seconds=300,
)

if approved:
    # Proceed with action
    pass
else:
    # Abort or use fallback
    pass

Convenience Methods¶

# Approve experiment execution
approved = await handler.approve_experiment(
    experiment_name="goal_drift_test",
    hypothesis="System loses track of goals in long conversations",
    estimated_runs=10,
    estimated_cost_usd=2.50,
    stress_type="CONTEXT_INJECTION",
)

# Approve intervention deployment
approved = await handler.approve_intervention(
    intervention_type="PROMPT_MUTATION",
    target_failure="goal_drift",
    description="Add periodic goal reinforcement",
    is_reversible=True,
    rollback_plan="Remove injection logic",
    estimated_effect=0.75,
)

# Approve simulation run
approved = await handler.approve_simulation(
    failure_id="fail_001",
    intervention_id="int_001",
    trace_count=50,
    estimated_cost_usd=1.00,
)

# Approve tool policy change
approved = await handler.approve_tool_policy_change(
    tool_name="database_query",
    change_description="Add rate limiting",
    is_reversible=True,
)

Statistics¶

stats = handler.get_stats()
print(stats)
# {
#     "total_requests": 100,
#     "auto_approved": 75,
#     "human_approved": 15,
#     "human_rejected": 5,
#     "auto_rejected": 0,
#     "timed_out": 3,
#     "blocked": 2,
#     "pending_count": 1,
#     "gate_stats": {...}
# }

Approval Modes¶

The ApprovalMode enum controls how the handler processes approvals:

INTERACTIVE¶

Description: Block and wait for human decision via UI.

Use When: TUI is running or CLI prompts are acceptable.

ApprovalMode.INTERACTIVE

Behavior: - Calls registered UI callback - Blocks until decision or timeout - Default behavior for PRODUCTION mode

ASYNC¶

Description: Non-blocking approval via callbacks.

Use When: Integrating with external approval systems.

ApprovalMode.ASYNC

Behavior: - Returns immediately with pending status - Callback invoked when decision made - Good for Slack/email approval workflows

AUTO_APPROVE¶

Description: Automatically approve all requests.

Use When: Testing, development, or trusted environments.

ApprovalMode.AUTO_APPROVE

Behavior: - All REVIEW tier requests are approved - BLOCK tier still rejected - ⚠️ Dangerous in production!

AUTO_REJECT¶

Description: Automatically reject all requests.

Use When: Read-only mode or emergency lockdown.

ApprovalMode.AUTO_REJECT

Behavior: - All REVIEW tier requests are rejected - SAFE tier still approved - BLOCK tier still rejected

UI Integration¶

Registering UI Callback¶

The approval handler needs a UI callback to present approvals to humans:

async def my_approval_ui(context: ApprovalContext) -> bool:
    """Custom approval UI implementation."""
    # Display approval request to user
    print(f"Approve: {context.action_description}?")

    # Get user decision (implement your UI here)
    response = await get_user_response()

    # Record reason
    context.decision_reason = "User approved" if response else "User rejected"

    return response

# Register the callback
handler.register_ui(my_approval_ui)

ApprovalContext Structure¶

@dataclass
class ApprovalContext:
    """Full context for an approval request."""
    id: str                              # Unique request ID

    # What's being requested
    action_type: ActionType              # Type of action
    action_description: str              # Human-readable description
    action_details: dict[str, Any]       # Additional details

    # Risk assessment
    risk_assessment: RiskAssessment      # Full assessment
    risk_tier: RiskTier                  # SAFE/REVIEW/BLOCK
    severity: Severity                   # S0-S4

    # Cost/impact estimates
    estimated_cost_usd: float            # Estimated cost
    estimated_duration_ms: int           # Estimated duration
    affected_systems: list[str]          # Systems affected

    # Rollback info
    is_reversible: bool                  # Can be undone?
    rollback_plan: str                   # How to undo

    # Source
    requester_agent: str                 # Which agent requested
    requester_session: str               # Session ID

    # Timing
    created_at: datetime                 # When created
    timeout_seconds: int                 # Approval timeout

    # Result (filled after decision)
    status: ApprovalStatus               # PENDING/APPROVED/REJECTED
    decided_at: datetime                 # When decided
    decided_by: str                      # Who decided
    decision_reason: str                 # Why decided

TUI Approval Dialog¶

The TUI provides a modal approval dialog:

┌─────────────────────────────────────────────────────────────┐
│                    APPROVAL REQUIRED                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Action: Deploy intervention: Add goal reinforcement         │
│  Type: prompt_mutation                                       │
│  Risk: REVIEW (Severity: S2)                                │
│                                                              │
│  Estimated Cost: $0.50                                       │
│  Affected Systems: chat_api, webhook                         │
│                                                              │
│  Reasoning: Action type prompt_mutation requires review      │
│             in production                                    │
│                                                              │
│  Rollback Plan: Remove injection logic                       │
│                                                              │
├─────────────────────────────────────────────────────────────┤
│         [Y] Approve              [N] Reject                  │
└─────────────────────────────────────────────────────────────┘

CLI Approval Prompt¶

For CLI-based approval (when TUI is not available):

from tinman.core.approval_handler import cli_approval_callback

handler.register_fallback(cli_approval_callback)

Output:

============================================================
  APPROVAL REQUIRED
============================================================

Action: Deploy intervention: Add goal reinforcement
Type: prompt_mutation
Risk: REVIEW (Severity: S2)
Estimated Cost: $0.50
Reasoning: Action type prompt_mutation requires review in production
Details: {'intervention_type': 'PROMPT_MUTATION', ...}
Rollback: Remove injection logic

------------------------------------------------------------
Approve? [y/N]: _

Custom Approval Workflows¶

External Approval System¶

Integrate with external approval systems (Slack, email, ticketing):

class SlackApprovalHandler:
    """Approval via Slack."""

    def __init__(self, webhook_url: str, channel: str):
        self.webhook_url = webhook_url
        self.channel = channel
        self._pending: dict[str, asyncio.Future] = {}

    async def approval_callback(self, context: ApprovalContext) -> bool:
        """Send approval request to Slack and wait for response."""
        # Send to Slack
        message = self._format_message(context)
        await self._send_to_slack(message)

        # Wait for response (via webhook)
        future = asyncio.get_event_loop().create_future()
        self._pending[context.id] = future

        try:
            return await asyncio.wait_for(
                future,
                timeout=context.timeout_seconds
            )
        except asyncio.TimeoutError:
            return False

    def handle_slack_callback(self, request_id: str, approved: bool):
        """Called when user clicks approve/reject in Slack."""
        if request_id in self._pending:
            self._pending[request_id].set_result(approved)

# Usage
slack_handler = SlackApprovalHandler(
    webhook_url="https://hooks.slack.com/...",
    channel="#approvals"
)
handler.register_ui(slack_handler.approval_callback)

Multi-Approver Workflow¶

Require multiple approvals for high-severity actions:

class MultiApproverHandler:
    """Require multiple approvers for critical decisions."""

    def __init__(self, required_approvals: int = 2):
        self.required_approvals = required_approvals

    async def approval_callback(self, context: ApprovalContext) -> bool:
        # High severity needs more approvers
        required = self.required_approvals
        if context.severity >= Severity.S3:
            required += 1

        approvals = 0
        rejections = 0

        for i in range(required):
            response = await self._get_approval(context, approver_num=i+1)
            if response:
                approvals += 1
            else:
                rejections += 1

            # Early exit on rejection
            if rejections > 0:
                context.decision_reason = f"Rejected by approver {i+1}"
                return False

        context.decision_reason = f"Approved by {approvals} approvers"
        return True

Audit Trail Enhancement¶

Add enhanced audit logging:

class AuditingApprovalHandler(ApprovalHandler):
    """Approval handler with enhanced audit logging."""

    def __init__(self, audit_log_path: str, **kwargs):
        super().__init__(**kwargs)
        self.audit_log_path = audit_log_path

    async def request_approval(self, **kwargs) -> bool:
        # Log request
        self._log_audit_event("REQUEST", kwargs)

        # Get decision
        result = await super().request_approval(**kwargs)

        # Log decision
        self._log_audit_event(
            "APPROVED" if result else "REJECTED",
            kwargs
        )

        return result

    def _log_audit_event(self, event_type: str, data: dict):
        import json
        from datetime import datetime

        entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "event_type": event_type,
            "data": data,
        }

        with open(self.audit_log_path, "a") as f:
            f.write(json.dumps(entry) + "\n")

Best Practices¶

Approval Request Design¶

Clear Descriptions - Make action_description understandable by non-engineers
Include Rollback - Always provide rollback_plan for reversible actions
Accurate Estimates - Provide realistic cost and duration estimates
Appropriate Severity - Don't over- or under-estimate severity

# Good
await handler.request_approval(
    action_type=ActionType.PROMPT_MUTATION,
    description="Add safety prefix to prevent harmful outputs in chat",
    details={
        "prefix": "You are a helpful assistant...",
        "target": "chat_system_prompt",
    },
    estimated_cost_usd=0.00,  # No direct cost
    is_reversible=True,
    rollback_plan="Remove prefix via config update",
    predicted_severity=Severity.S2,
)

# Bad
await handler.request_approval(
    action_type=ActionType.PROMPT_MUTATION,
    description="Change prompt",  # Too vague
    # Missing details, estimates, rollback plan
)

Mode Selection¶

Scenario	Recommended Mode
Local development	`AUTO_APPROVE`
CI/CD testing	`AUTO_REJECT`
Staging environment	`INTERACTIVE`
Production	`INTERACTIVE`
External integration	`ASYNC`

Timeout Configuration¶

approval:
  timeout_seconds: 300  # 5 minutes default

  # Per-severity timeouts
  timeouts:
    S0: 60      # 1 minute
    S1: 120     # 2 minutes
    S2: 300     # 5 minutes
    S3: 600     # 10 minutes
    S4: 1800    # 30 minutes (though usually blocked)

Handling Rejections¶

approved = await handler.request_approval(...)

if not approved:
    # Log the rejection
    logger.warning(f"Action rejected: {description}")

    # Check if we have a fallback
    if has_fallback_action:
        await execute_fallback()
    else:
        # Inform the orchestrator
        raise ApprovalRejectedError(f"Action rejected: {description}")

Testing Approval Flows¶

import pytest
from unittest.mock import AsyncMock

@pytest.fixture
def approval_handler():
    handler = ApprovalHandler(
        mode=Mode.PRODUCTION,
        approval_mode=ApprovalMode.INTERACTIVE,
    )
    return handler

async def test_safe_action_auto_approved(approval_handler):
    """SAFE tier actions should auto-approve."""
    result = await approval_handler.request_approval(
        action_type=ActionType.CONFIG_CHANGE,
        description="Minor config update",
        predicted_severity=Severity.S0,
    )
    assert result is True

async def test_review_action_needs_human(approval_handler):
    """REVIEW tier actions should call UI."""
    mock_ui = AsyncMock(return_value=True)
    approval_handler.register_ui(mock_ui)

    result = await approval_handler.request_approval(
        action_type=ActionType.PROMPT_MUTATION,
        description="Modify system prompt",
        predicted_severity=Severity.S2,
    )

    assert mock_ui.called
    assert result is True

async def test_blocked_action_rejected(approval_handler):
    """BLOCK tier actions should be rejected."""
    result = await approval_handler.request_approval(
        action_type=ActionType.DESTRUCTIVE_TOOL_CALL,
        description="Delete database",
        predicted_severity=Severity.S4,
    )
    assert result is False

Summary¶

The HITL approval system provides:

Three Risk Tiers: SAFE (auto), REVIEW (human), BLOCK (never)
Severity-Based Gating: S0-S4 severity drives approval requirements
Mode-Aware: Different behavior per operating mode
Pluggable UI: TUI, CLI, or custom approval interfaces
Audit Trail: Complete logging of all decisions
Extensible: Custom approval workflows supported

Use this system to maintain appropriate human oversight while allowing Tinman to operate autonomously where safe.

Next Steps¶

MODES.md - How modes affect approval requirements
TAXONOMY.md - Severity level definitions
CONFIGURATION.md - Approval configuration options