Operating Modes¶

Tinman operates in three distinct modes that control safety boundaries, approval requirements, and available capabilities. This document provides a complete reference for each mode and their transitions.

Table of Contents¶

Overview
Mode Comparison
LAB Mode
SHADOW Mode
PRODUCTION Mode
Mode Transitions
Configuration
Best Practices

Overview¶

Operating modes are the primary safety mechanism in Tinman. They define:

What experiments can be run - Destructive tests, stress tests, etc.
What approval is required - Automatic vs. human-in-the-loop
What data can be accessed - Synthetic vs. shadow vs. live
What interventions can be deployed - Test vs. staging vs. production

The Progressive Model¶

Modes follow a progressive deployment model:

LAB ──────────▶ SHADOW ──────────▶ PRODUCTION
 │                 │                    │
 │                 │                    │
 ▼                 ▼                    ▼
Development     Validation          Deployment
Environment     Environment         Environment

This progression ensures: 1. Hypotheses are validated in LAB before testing against real patterns 2. Interventions are proven in SHADOW before deployment 3. Production impact is minimized through human oversight

Mode Properties¶

Each mode has three key properties:

Property	LAB	SHADOW	PRODUCTION
`allows_destructive_testing`	Yes	No	No
`requires_approval_gate`	No	No	Yes
`is_autonomous`	Yes	Yes	No

Mode Comparison¶

Complete Behavior Matrix¶

Behavior	LAB	SHADOW	PRODUCTION
Experimentation
Run stress tests	✅ Auto	✅ Auto	⚠️ Review
Run destructive tests	✅ Auto	❌ Blocked	❌ Blocked
Generate hypotheses	✅ Auto	✅ Auto	⚠️ Review
Data Access
Use synthetic data	✅	✅	✅
Mirror production traffic	❌	✅	❌
Access live traffic	❌	❌	✅
Interventions
Test in isolation	✅ Auto	✅ Auto	⚠️ Review
Deploy to staging	⚠️ Review	✅ Auto	⚠️ Review
Deploy to production	❌ Blocked	❌ Blocked	⚠️ Review
Approval
S0-S1 findings	Auto	Auto	Auto
S2 findings	Auto	Auto	Review
S3 findings	Review	Review	Review
S4 findings	Review	Blocked	Blocked
Autonomy
Research cycles	Autonomous	Autonomous	Supervised
Memory updates	Auto	Auto	Review
Report generation	Auto	Auto	Auto

Legend: - ✅ Auto = Proceeds automatically - ⚠️ Review = Requires human approval - ❌ Blocked = Not allowed in this mode

LAB Mode¶

Purpose: Unrestricted research and experimentation in isolated environments.

Characteristics¶

Mode.LAB.allows_destructive_testing  # True
Mode.LAB.requires_approval_gate      # False
Mode.LAB.is_autonomous               # True

When to Use¶

Initial hypothesis exploration
Testing new experiment designs
Validating detection algorithms
Development and debugging
Stress testing intervention logic

Capabilities¶

Experiments: - All experiment types allowed - Destructive stress tests permitted - No cost limits (configurable) - Maximum parallelism

Interventions: - All intervention types can be tested - Simulation against synthetic traces - No deployment to real systems

Approval: - Most actions auto-approved - Only S3+ findings require review - Cost-based approval for expensive experiments

Restrictions¶

No access to production data
No access to shadow traffic
Cannot deploy interventions externally
Results are for internal use only

Example Configuration¶

mode: lab

experiments:
  max_parallel: 10
  default_timeout_seconds: 600
  cost_limit_usd: 50.0  # Higher limit for exploration
  allow_destructive: true

risk:
  auto_approve_safe: true
  auto_approve_review_in_lab: true  # Lab-specific setting

Example Usage¶

from tinman import create_tinman
from tinman.config.modes import Mode

async def lab_research():
    tinman = await create_tinman(
        mode=Mode.LAB,
        db_url="postgresql://localhost/tinman_lab"
    )

    # Run aggressive research cycle
    results = await tinman.research_cycle(
        focus="edge case failures",
        max_hypotheses=20,  # Explore broadly
        max_experiments=10,
        allow_destructive=True
    )

    # All findings are internal - can be aggressive
    print(f"Discovered {len(results.failures)} failures")

SHADOW Mode¶

Purpose: Observe production patterns without affecting users.

Characteristics¶

Mode.SHADOW.allows_destructive_testing  # False
Mode.SHADOW.requires_approval_gate      # False
Mode.SHADOW.is_autonomous               # True

When to Use¶

Validating LAB findings against real patterns
Discovering production-specific failures
Testing detection on real traffic
Calibrating sensitivity thresholds
Building confidence before PRODUCTION

Capabilities¶

Experiments: - Non-destructive tests only - Mirror of production traffic - Real pattern analysis - No synthetic data mixing

Interventions: - Counterfactual simulation against real traces - Effectiveness estimation - No actual deployment

Approval: - Most actions auto-approved - S3+ findings require review - Cannot modify production behavior

Restrictions¶

No destructive experiments
No intervention deployment
Read-only access to traffic
Cannot affect user experience

Traffic Mirroring¶

Shadow mode mirrors production traffic for analysis:

Production Traffic ─────────────────────────▶ Users
        │
        │ (mirror)
        ▼
   Shadow System ──▶ Analysis ──▶ Findings
        │
        └── No response sent to users

Example Configuration¶

mode: shadow

experiments:
  max_parallel: 5
  default_timeout_seconds: 300
  cost_limit_usd: 20.0
  allow_destructive: false  # Always false in shadow

shadow:
  traffic_mirror_percent: 10  # Mirror 10% of production
  sampling_strategy: random   # or: error_biased, latency_biased
  retention_hours: 24

risk:
  auto_approve_safe: true
  review_s3_plus: true

Example Usage¶

from tinman import create_tinman
from tinman.config.modes import Mode

async def shadow_validation():
    tinman = await create_tinman(
        mode=Mode.SHADOW,
        db_url="postgresql://localhost/tinman_shadow"
    )

    # Validate LAB findings against real patterns
    results = await tinman.research_cycle(
        focus="goal drift",  # Specific focus from LAB
        max_hypotheses=5,
        max_experiments=3
    )

    # Check if LAB findings hold in production patterns
    for failure in results.failures:
        print(f"Confirmed in production: {failure.failure_class}")

PRODUCTION Mode¶

Purpose: Active protection with human oversight.

Characteristics¶

Mode.PRODUCTION.allows_destructive_testing  # False
Mode.PRODUCTION.requires_approval_gate      # True
Mode.PRODUCTION.is_autonomous               # False

When to Use¶

Deploying validated interventions
Active failure detection
Real-time protection
Compliance and audit requirements

Capabilities¶

Experiments: - Careful, approved experiments only - Focus on monitoring, not exploration - Minimal impact testing

Interventions: - Approved interventions can deploy - Full rollback support - Audit trail required

Approval: - Human approval for all significant actions - S4 findings always blocked until review - Change management integration

Restrictions¶

No destructive experiments
No speculative research
All interventions require approval
Strict audit requirements

Approval Flow in Production¶

Action Request
     │
     ▼
Risk Evaluation
     │
     ├── SAFE (S0-S1) ──▶ Auto-approve ──▶ Execute
     │
     ├── REVIEW (S2-S3) ──▶ Human Review ──┬──▶ Approved ──▶ Execute
     │                                      │
     │                                      └──▶ Rejected ──▶ Log & Skip
     │
     └── BLOCK (S4) ──▶ Blocked ──▶ Log & Alert

Example Configuration¶

mode: production

experiments:
  max_parallel: 2
  default_timeout_seconds: 60
  cost_limit_usd: 5.0
  allow_destructive: false

risk:
  auto_approve_safe: true
  detailed_mode: true  # More granular risk assessment

approval:
  mode: interactive  # TUI approval dialog
  timeout_seconds: 300
  require_comment: true  # Approver must explain decision
  notify_on_block: true

audit:
  enabled: true
  log_all_decisions: true
  retention_days: 90

Example Usage¶

from tinman import create_tinman
from tinman.config.modes import Mode

async def production_protection():
    tinman = await create_tinman(
        mode=Mode.PRODUCTION,
        db_url="postgresql://localhost/tinman_prod"
    )

    # Run focused research with approval gates
    results = await tinman.research_cycle(
        focus="known failure patterns",  # Focus on validated patterns
        max_hypotheses=3,
        max_experiments=2
    )

    # Deploy approved interventions
    for intervention in results.interventions:
        if intervention.status == "approved":
            await tinman.deploy_intervention(intervention.id)

Mode Transitions¶

Allowed Transitions¶

     ┌─────────────────────────────────┐
     │                                 │
     ▼                                 │
   LAB ─────────────▶ SHADOW ─────────▶│ PRODUCTION
     ▲                   │             │
     │                   │             │
     └───────────────────┘             │
            (fallback)                 │
                         ◀─────────────┘
                          (regression)

From	To	Allowed	Use Case
LAB	SHADOW	✅	Promote validated hypotheses
SHADOW	PRODUCTION	✅	Deploy proven interventions
SHADOW	LAB	✅	Return for more exploration
PRODUCTION	SHADOW	✅	Regression fallback
LAB	PRODUCTION	❌	Not allowed - must validate
PRODUCTION	LAB	❌	Not allowed - go through SHADOW

Transition Requirements¶

LAB → SHADOW: - At least one successful research cycle in LAB - No critical errors in LAB runs - Database migration completed (if schema changed)

SHADOW → PRODUCTION: - Validated findings in SHADOW mode - Intervention effectiveness > configured threshold - No S4 findings unaddressed - Approval from authorized personnel

PRODUCTION → SHADOW: - Any time (regression fallback) - Recommended after incidents - Audit log of reason required

Code Example¶

from tinman.config.modes import Mode

# Check if transition is allowed
can_promote = Mode.can_transition(Mode.LAB, Mode.SHADOW)
print(f"LAB → SHADOW allowed: {can_promote}")  # True

can_skip = Mode.can_transition(Mode.LAB, Mode.PRODUCTION)
print(f"LAB → PRODUCTION allowed: {can_skip}")  # False

# Programmatic transition
async def promote_to_shadow(tinman):
    if Mode.can_transition(tinman.mode, Mode.SHADOW):
        await tinman.transition_mode(Mode.SHADOW)
    else:
        raise ValueError(f"Cannot transition from {tinman.mode} to SHADOW")

Configuration¶

Setting the Mode¶

Via Configuration File:

# .tinman/config.yaml
mode: lab  # or: shadow, production

Via Environment Variable:

export TINMAN_MODE=shadow

Via Python API:

from tinman import create_tinman
from tinman.config.modes import Mode

tinman = await create_tinman(mode=Mode.SHADOW)

Via CLI:

# Override config file
tinman --mode shadow research

# Or
tinman research --mode production

Mode-Specific Configuration¶

Each mode can have specific configuration overrides:

mode: shadow

# Global defaults
experiments:
  max_parallel: 5
  cost_limit_usd: 20.0

# Mode-specific overrides
modes:
  lab:
    experiments:
      max_parallel: 10
      cost_limit_usd: 100.0
      allow_destructive: true

  shadow:
    experiments:
      max_parallel: 5
      cost_limit_usd: 20.0
    shadow:
      traffic_mirror_percent: 10

  production:
    experiments:
      max_parallel: 2
      cost_limit_usd: 5.0
    approval:
      require_comment: true

Best Practices¶

Development Workflow¶

Start in LAB mode
Explore broadly
Test aggressive hypotheses
Iterate quickly
Validate in SHADOW mode
Test against real patterns
Calibrate detection thresholds
Build confidence
Deploy in PRODUCTION mode
Human oversight
Careful rollout
Continuous monitoring

Mode-Specific Tips¶

LAB Mode: - Use synthetic data that represents edge cases - Don't be afraid to test destructive scenarios - Document all findings for SHADOW validation - Set high cost limits for exploration

SHADOW Mode: - Monitor for LAB findings that don't reproduce - Pay attention to patterns unique to production - Use findings to refine detection sensitivity - Prepare intervention validation data

PRODUCTION Mode: - Start with conservative interventions - Have rollback plans ready - Monitor intervention effectiveness - Maintain audit trails

Anti-Patterns to Avoid¶

Anti-Pattern	Why It's Bad	Better Approach
Skipping SHADOW	Unvalidated interventions may fail	Always validate in SHADOW
LAB with production data	Privacy risk, contamination	Use synthetic or anonymized data
Auto-approve in PRODUCTION	Bypasses safety gates	Use proper approval workflow
Permanent SHADOW	Never deploy findings	Progress to PRODUCTION when validated
Running all modes simultaneously	Confusion, data mixing	One mode per environment

Environment Isolation¶

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│   LAB ENV       │     │  SHADOW ENV     │     │  PROD ENV       │
│                 │     │                 │     │                 │
│ - Synthetic DB  │     │ - Mirror DB     │     │ - Prod DB       │
│ - Test models   │     │ - Prod model    │     │ - Prod model    │
│ - No traffic    │     │ - Shadow traffic│     │ - Live traffic  │
│ - Full autonomy │     │ - Read-only     │     │ - Human HITL    │
└─────────────────┘     └─────────────────┘     └─────────────────┘
         │                       │                       │
         └───────────────────────┴───────────────────────┘
                    Findings flow forward
                    Regressions flow backward

Summary¶

Mode	Purpose	Autonomy	Destructive	Approval
LAB	Exploration	Full	Allowed	Minimal
SHADOW	Validation	Full	Blocked	S3+ review
PRODUCTION	Protection	Supervised	Blocked	Required

The mode system ensures: - Progressive validation before production impact - Appropriate oversight at each stage - Clear boundaries for safe operation - Audit trail for compliance

Next Steps¶

HITL.md - Detailed approval flow documentation
CONFIGURATION.md - Complete configuration reference
INTEGRATION.md - Embedding Tinman in existing systems