Quickstart Guide¶
Get Tinman running and discover your first failure in under 5 minutes.
Prerequisites¶
- Python 3.10 or higher
- SQLite (default, no setup) or PostgreSQL for production
- An LLM API key (see options below)
Model Provider Options¶
| Provider | Cost | Setup |
|---|---|---|
| Ollama | Free (local) | ollama pull llama3.1 |
| Groq | Free tier (14,400/day) | Get key at groq.com |
| OpenRouter | Many free models | Get key at openrouter.ai |
| Together | $25 free credits | Get key at together.xyz |
| OpenAI | Paid | Get key at openai.com |
| Anthropic | Paid | Get key at anthropic.com |
Installation¶
Step 1: Install Tinman¶
# Basic installation
pip install AgentTinman
# With OpenAI support
pip install AgentTinman[openai]
# With Anthropic support
pip install AgentTinman[anthropic]
# With all providers
pip install AgentTinman[all]
# Development installation (from source)
git clone https://github.com/oliveskin/agent_tinman.git
cd agent_tinman
pip install -e ".[dev]"
Step 2: Set Up Database¶
Tinman defaults to SQLite (sqlite:///tinman.db) for quick starts. If you want PostgreSQL, set database.url in .tinman/config.yaml and create the role/database first.
Option A: PostgreSQL (Recommended for production)
# Create database (PostgreSQL)
createdb tinman
# Tinman will create tables automatically on first run
Option B: SQLite (Quick testing)
# No setup needed - Tinman will create the file
# Just use: sqlite:///tinman.db as your database URL
Step 3: Set Environment Variables¶
# Pick ONE of these based on your provider:
# For Ollama (local, free) - no key needed!
# Just make sure Ollama is running: ollama serve
# For Groq (fast, free tier)
export GROQ_API_KEY="gsk_..."
# For OpenRouter (many free models)
export OPENROUTER_API_KEY="sk-or-..."
# For Together ($25 free credits)
export TOGETHER_API_KEY="..."
# For OpenAI (paid)
export OPENAI_API_KEY="sk-..."
# For Anthropic (paid)
export ANTHROPIC_API_KEY="sk-ant-..."
Initialize Your Project¶
Step 4: Create Configuration¶
This creates .tinman/config.yaml:
mode: lab
database:
url: sqlite:///tinman.db # Default (no setup required)
# For PostgreSQL: url: postgresql://localhost:5432/tinman
models:
default: openai
providers:
openai:
api_key: ${OPENAI_API_KEY}
model: gpt-4-turbo-preview
temperature: 0.7
research:
max_hypotheses_per_run: 5
max_experiments_per_hypothesis: 2
default_runs_per_experiment: 3
experiments:
max_parallel: 3
default_timeout_seconds: 120
cost_limit_usd: 5.0
Edit the configuration to match your setup.
Run Your First Research Cycle¶
Option 1: Command Line¶
# Run a basic research cycle
tinman research
# With a specific focus area
tinman research --focus "reasoning errors"
# With custom parameters
tinman research --focus "tool use" --hypotheses 3 --experiments 2
Option 2: Interactive TUI¶
This launches the retro terminal interface:
████████╗██╗███╗ ██╗███╗ ███╗ █████╗ ███╗ ██╗
╚══██╔══╝██║████╗ ██║████╗ ████║██╔══██╗████╗ ██║
██║ ██║██╔██╗ ██║██╔████╔██║███████║██╔██╗ ██║
██║ ██║██║╚██╗██║██║╚██╔╝██║██╔══██║██║╚██╗██║
██║ ██║██║ ╚████║██║ ╚═╝ ██║██║ ██║██║ ╚████║
╚═╝ ╚═╝╚═╝ ╚═══╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═══╝
[F1] Research [F2] Hypotheses [F3] Failures [F4] Intervene [F5] Discuss
═══ RESEARCH CONTROL ═══
[▶ Start Research] [⏸ Pause] [⏹ Stop]
Focus Area: [tool use failures_______________]
─── Activity Log ───
Ready. Press [F1] or click 'Start Research' to begin.
Hypotheses: 0 │ Experiments: 0 │ Failures: 0
Controls: - F1-F5 - Navigate tabs - F10 or Ctrl+C - Exit - Y/N - Approve/Reject in approval dialogs
Option 3: Python API¶
import asyncio
from tinman import create_tinman
from tinman.config.modes import Mode
async def main():
# Create Tinman instance
tinman = await create_tinman(
mode=Mode.LAB,
db_url="postgresql://localhost/tinman"
)
# Run a research cycle
results = await tinman.research_cycle(
focus="reasoning errors in multi-step tasks",
max_hypotheses=5,
max_experiments=3
)
# Print summary
print(f"\n{'='*50}")
print("RESEARCH CYCLE COMPLETE")
print(f"{'='*50}")
print(f"Hypotheses generated: {len(results.hypotheses)}")
print(f"Experiments run: {len(results.experiments)}")
print(f"Failures discovered: {len(results.failures)}")
print(f"Interventions proposed: {len(results.interventions)}")
# Show discovered failures
if results.failures:
print(f"\n{'─'*50}")
print("DISCOVERED FAILURES:")
for f in results.failures:
print(f"\n [{f.severity.value}] {f.failure_class.value}")
print(f" Description: {f.description}")
print(f" Root cause: {f.root_cause}")
# Generate report
report = await tinman.generate_report(format="markdown")
print(f"\n{'─'*50}")
print("FULL REPORT:")
print(report)
await tinman.close()
if __name__ == "__main__":
asyncio.run(main())
Understanding the Output¶
Hypotheses¶
Hypothesis: HYP-001
Target: reasoning
Expected Failure: goal_drift
Confidence: 0.7
Rationale: System may lose track of original objectives in long conversations
Hypotheses are educated guesses about what might fail. Higher confidence means more evidence supports the hypothesis.
Experiments¶
Experiment: EXP-001
Testing: HYP-001
Stress Type: CONTEXT_INJECTION
Parameters: {"distractor_count": 3, "injection_point": "mid"}
Runs: 5
Experiments are designed to confirm or refute hypotheses. Multiple runs ensure statistical significance.
Failures¶
Failure: FAIL-001
Class: REASONING
Subtype: GOAL_DRIFT
Severity: S2 (Business Risk)
Reproducibility: 4/5 runs
Description: Model abandoned original task after distractor injection
Root Cause: Attention mechanism prioritizes recent context over instructions
Failures are classified using the taxonomy. Severity indicates impact.
Interventions¶
Intervention: INT-001
Target: FAIL-001
Type: PROMPT_MUTATION
Description: Add periodic instruction reinforcement
Estimated Effectiveness: 75%
Risk Tier: REVIEW
Rollback Plan: Remove injection logic
Interventions are proposed fixes. They require approval before deployment.
Common Commands¶
Research¶
# Basic research
tinman research
# Focused research
tinman research --focus "tool use"
tinman research --focus "long context handling"
tinman research --focus "feedback loops"
# Control parameters
tinman research --hypotheses 10 --experiments 5 --runs 10
# Different mode
tinman --mode shadow research
Reporting¶
# Generate markdown report
tinman report --format markdown
# Generate JSON report
tinman report --format json
# Exclude synthetic demo failures
tinman report --format markdown --exclude-demo-failures
# Save to file
tinman report --format markdown > report.md
# Reset the local SQLite demo database
tinman demo-reset-db
Status¶
# Show current status
tinman status
# Show failure summary
tinman failures
# Show intervention status
tinman interventions
Interactive¶
# Launch TUI
tinman tui
# Chat with Tinman
tinman discuss "What's the most critical failure?"
tinman discuss "How should I prioritize fixes?"
Service Mode¶
# Start HTTP service
tinman serve --host 0.0.0.0 --port 8000
# Or with specific mode
TINMAN_MODE=shadow tinman serve
# Health check
curl http://localhost:8000/health
# Run research cycle via API
curl -X POST http://localhost:8000/research/cycle \
-H "Content-Type: application/json" \
-d '{"focus": "tool_use", "max_hypotheses": 5}'
# Get status
curl http://localhost:8000/status
# Get pending approvals
curl http://localhost:8000/approvals/pending
What's Next?¶
Learn the Concepts¶
Read CONCEPTS.md to understand: - What a forward-deployed research agent does - The hypothesis-driven methodology - How knowledge compounds over time
Understand Operating Modes¶
Read MODES.md to learn about: - LAB mode for unrestricted research - SHADOW mode for production observation - PRODUCTION mode for active protection
Explore the Taxonomy¶
Read TAXONOMY.md to understand: - Failure classes (REASONING, LONG_CONTEXT, etc.) - Severity levels (S0-S4) - How to interpret and act on findings
Configure for Production¶
Read CONFIGURATION.md for: - All configuration options - Environment variable setup - Cost and safety controls
Integrate with Your Pipeline¶
Read INTEGRATION.md for: - Embedding Tinman in existing systems - Using the PipelineAdapter - Custom approval workflows
Deploy to Production¶
Read PRODUCTION.md for: - Docker/Kubernetes deployment - Database migrations with Alembic - Prometheus metrics setup - Cost controls and budget enforcement - Operational runbook
Development & Testing¶
Running Tests¶
# Run all tests
pytest
# Run with coverage report
pytest --cov=tinman
# Run specific test file
pytest tests/test_risk_evaluation.py
# Run specific test
pytest tests/test_risk_evaluation.py::TestRiskEvaluator::test_destructive_always_blocked
# Verbose output
pytest -v
# Stop on first failure
pytest -x
Test Categories¶
| Test File | Coverage |
|---|---|
test_risk_evaluation.py | Risk evaluator, severity, policy-driven evaluation |
test_guarded_tools.py | Tool registry, guarded_call, blocking, approval flow |
test_audit.py | AuditLogger, approval decisions, mode transitions |
test_approval_flow.py | Full HITL flow including timeouts |
test_ingest.py | OTLP, Datadog, X-Ray, JSON trace adapters |
test_agents.py | Hypothesis engine, experiment executor |
test_memory.py | Memory graph operations |
Code Quality¶
Database Migrations (Development)¶
# Create new migration after model changes
alembic revision --autogenerate -m "description"
# Apply migrations
alembic upgrade head
# Rollback one version
alembic downgrade -1
# Check current version
alembic current
Troubleshooting¶
Database Connection Error¶
Fix: Check your database URL and ensure the database exists. If you want to skip PostgreSQL for now, set database.url to sqlite:///tinman.db:
API Key Error¶
Fix: Ensure environment variable is set:
Permission Error (Windows)¶
Fix: Run terminal as administrator or check file permissions.
TUI Not Displaying Correctly¶
Fix: Ensure your terminal supports Unicode and has sufficient size:
Cost Limit Reached¶
Fix: Increase limit in config or wait for next cycle:
Getting Help¶
- Documentation: See the
docs/directory - Issues: Report bugs at GitHub Issues
- Discussions: Ask questions in GitHub Discussions