Memory Graph¶

This document provides complete documentation for Tinman's Research Memory Graph—the temporal knowledge store that tracks all discoveries, relationships, and behavioral lineages.

Table of Contents¶

Overview
Graph Model
Node Types
Edge Relations
Temporal Versioning
Core Operations
Query Operations
Lineage Tracking
Recording Findings
Examples

Overview¶

The Memory Graph is Tinman's central knowledge store. It tracks:

Model behavior evolution - How behavior changes over time
Failure emergence and evolution - New failures and their mutations
Intervention effects - What fixes worked and what didn't
Causal relationships - What caused what

Key Capabilities¶

┌─────────────────────────────────────────────────────────────────┐
│                     MEMORY GRAPH CAPABILITIES                    │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │    Temporal     │  │    Lineage      │  │     Search      │  │
│  │    Versioning   │  │    Tracking     │  │                 │  │
│  │                 │  │                 │  │                 │  │
│  │ "What did we    │  │ "What caused    │  │ "Find all S3+   │  │
│  │  know at T?"    │  │  this failure?" │  │  failures"      │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │   Persistence   │  │   Evolution     │  │  Relationships  │  │
│  │                 │  │   Tracking      │  │                 │  │
│  │                 │  │                 │  │                 │  │
│  │ PostgreSQL or   │  │ "How did this   │  │ Hypothesis ->   │  │
│  │ SQLite backed   │  │  failure evolve?"│ │ Experiment ->   │  │
│  │                 │  │                 │  │ Failure -> Fix  │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Design Principles¶

Temporal - Every node has a validity period
Relational - Edges capture causal and logical relationships
Persistent - Survives restarts, accumulates knowledge
Queryable - Find patterns, lineage, evolution

Graph Model¶

The Memory Graph is a directed graph with typed nodes and edges.

┌─────────────────────────────────────────────────────────────────┐
│                       GRAPH STRUCTURE                            │
│                                                                  │
│   ┌────────────┐   TESTED_IN    ┌────────────┐                  │
│   │ Hypothesis ├───────────────▶│ Experiment │                  │
│   └────────────┘                └──────┬─────┘                  │
│                                        │ EXECUTED_AS            │
│                                        ▼                        │
│                                 ┌────────────┐                  │
│                                 │    Run     │                  │
│                                 └──────┬─────┘                  │
│                                        │ OBSERVED_IN            │
│                                        ▼                        │
│   ┌────────────┐                ┌────────────┐                  │
│   │  Failure   │◀───────────────│  Failure   │                  │
│   │ (parent)   │   EVOLVED_INTO │            │                  │
│   └────────────┘                └──────┬─────┘                  │
│                                        │ ADDRESSED_BY           │
│                                        ▼                        │
│   ┌────────────┐   DEPLOYED_AS  ┌────────────┐                  │
│   │ Deployment │◀───────────────│Intervention│                  │
│   └──────┬─────┘                └────────────┘                  │
│          │ ROLLED_BACK_BY                                       │
│          ▼                                                      │
│   ┌────────────┐                                                │
│   │  Rollback  │                                                │
│   └────────────┘                                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Node Types¶

Nodes represent entities in the research process.

class NodeType(str, Enum):
    """Types of nodes in the memory graph."""
    MODEL_VERSION = "model_version"   # Target model version
    HYPOTHESIS = "hypothesis"         # Failure hypothesis
    EXPERIMENT = "experiment"         # Experiment design
    RUN = "run"                       # Single experiment run
    FAILURE_MODE = "failure_mode"     # Discovered failure
    INTERVENTION = "intervention"     # Proposed fix
    SIMULATION = "simulation"         # Counterfactual simulation
    DEPLOYMENT = "deployment"         # Deployed intervention
    ROLLBACK = "rollback"             # Rolled back deployment

Node Structure¶

@dataclass
class Node:
    """A node in the Research Memory Graph."""
    id: str                        # Unique identifier
    node_type: NodeType            # Type of node
    created_at: datetime           # When created
    valid_from: datetime           # When validity starts
    valid_to: Optional[datetime]   # When validity ends (null = still valid)
    data: dict[str, Any]           # Node-specific data

Node Type Details¶

HYPOTHESIS¶

Represents a testable hypothesis about potential failures.

{
    "node_type": "hypothesis",
    "data": {
        "target_surface": "tool_use",
        "expected_failure": "parameter_injection",
        "confidence": 0.7,
        "priority": "high",
        "rationale": "Tool parameters not validated",
    }
}

EXPERIMENT¶

Represents an experiment design to test a hypothesis.

{
    "node_type": "experiment",
    "data": {
        "hypothesis_id": "hyp_123",
        "stress_type": "tool_injection",
        "mode": "single",
        "constraints": {
            "max_tokens": 100000,
            "timeout_seconds": 300,
        },
    }
}

RUN¶

Represents a single experiment run with results.

{
    "node_type": "run",
    "data": {
        "experiment_id": "exp_456",
        "run_number": 3,
        "failure_triggered": true,
        "tokens_used": 15000,
        "latency_ms": 2500,
        "trace": {...},
    }
}

FAILURE_MODE¶

Represents a discovered failure.

{
    "node_type": "failure_mode",
    "data": {
        "primary_class": "tool_use",
        "secondary_class": "parameter_error",
        "severity": "S2",
        "trigger_signature": ["stress:tool_injection", "error:validation"],
        "reproducibility": 0.8,
        "is_resolved": false,
    }
}

INTERVENTION¶

Represents a proposed fix for a failure.

{
    "node_type": "intervention",
    "data": {
        "intervention_type": "guardrail",
        "payload": {
            "validation_type": "schema",
            "reject_on_fail": true,
        },
        "expected_gains": {"failure_reduction": 0.6},
        "expected_regressions": {"latency_increase": 0.2},
        "risk_tier": "review",
    }
}

DEPLOYMENT¶

Represents a deployed intervention.

{
    "node_type": "deployment",
    "data": {
        "intervention_id": "int_789",
        "mode": "production",
        "rollback_state": {...},
        "status": "active",
    }
}

Edge Relations¶

Edges represent relationships between nodes.

class EdgeRelation(str, Enum):
    """Types of edges in the memory graph."""
    TESTED_IN = "tested_in"           # Hypothesis -> Experiment
    EXECUTED_AS = "executed_as"       # Experiment -> Run
    OBSERVED_IN = "observed_in"       # Failure -> Run
    CAUSED_BY = "caused_by"           # Effect -> Cause
    ADDRESSED_BY = "addressed_by"     # Failure -> Intervention
    SIMULATED_BY = "simulated_by"     # Intervention -> Simulation
    DEPLOYED_AS = "deployed_as"       # Intervention -> Deployment
    ROLLED_BACK_BY = "rolled_back_by" # Deployment -> Rollback
    REGRESSED_AS = "regressed_as"     # Deployment -> Failure
    EVOLVED_INTO = "evolved_into"     # Failure -> Failure (evolution)

Edge Structure¶

@dataclass
class Edge:
    """An edge in the Research Memory Graph."""
    id: str                        # Unique identifier
    src_id: str                    # Source node ID
    dst_id: str                    # Destination node ID
    relation: EdgeRelation         # Relationship type
    created_at: datetime           # When created
    metadata: dict[str, Any]       # Edge-specific data

Relationship Semantics¶

Relation	From	To	Meaning
`TESTED_IN`	Hypothesis	Experiment	Hypothesis is tested by this experiment
`EXECUTED_AS`	Experiment	Run	Experiment was executed as this run
`OBSERVED_IN`	Failure	Run	Failure was observed in this run
`CAUSED_BY`	Effect	Cause	Effect was caused by this cause
`ADDRESSED_BY`	Failure	Intervention	Failure is addressed by intervention
`SIMULATED_BY`	Intervention	Simulation	Intervention was simulated
`DEPLOYED_AS`	Intervention	Deployment	Intervention was deployed
`ROLLED_BACK_BY`	Deployment	Rollback	Deployment was rolled back
`REGRESSED_AS`	Deployment	Failure	Deployment caused this regression
`EVOLVED_INTO`	Failure	Failure	Failure evolved into a new form

Temporal Versioning¶

Every node has a validity period, enabling temporal queries.

Validity Period¶

node.valid_from  # When the node became valid
node.valid_to    # When the node stopped being valid (null = still valid)
node.is_valid    # Check if currently valid

Invalidating Nodes¶

When findings are superseded or corrected:

# Mark node as no longer valid
graph.invalidate_node(node_id)

# This sets valid_to = now
# Node is still queryable but marked as historical

Temporal Queries¶

Query the graph state at any point in time:

# What failures did we know about on Jan 15?
deployment_time = datetime(2024, 1, 15, 10, 30)
known_failures = graph.snapshot_at(
    deployment_time,
    node_type=NodeType.FAILURE_MODE
)

# What was the graph state when we deployed?
graph_state = graph.snapshot_at(deployment_time)

Use Cases for Temporal Queries¶

Forensic Analysis: "What did we know when the incident occurred?"
Deployment Auditing: "Were known issues addressed before release?"
Trend Analysis: "Are failures increasing or decreasing?"
Compliance: "What was our risk posture at audit time?"

Core Operations¶

Node Operations¶

from tinman.memory.graph import MemoryGraph
from tinman.memory.models import Node, NodeType

# Add a node
node = Node(
    node_type=NodeType.HYPOTHESIS,
    data={"target_surface": "reasoning", "expected_failure": "goal_drift"}
)
node_id = graph.add_node(node)

# Get a node
node = graph.get_node(node_id)

# Invalidate a node (soft delete)
graph.invalidate_node(node_id)

Edge Operations¶

from tinman.memory.models import Edge, EdgeRelation

# Add an edge
edge = Edge(
    src_id=hypothesis_id,
    dst_id=experiment_id,
    relation=EdgeRelation.TESTED_IN
)
edge_id = graph.add_edge(edge)

# Or use the convenience method
edge = graph.link(
    src_id=hypothesis_id,
    dst_id=experiment_id,
    relation=EdgeRelation.TESTED_IN,
    metadata={"design_version": 1}
)

# Get an edge
edge = graph.get_edge(edge_id)

Query Operations¶

Get by Type¶

# Get all hypotheses
hypotheses = graph.get_hypotheses(valid_only=True, limit=100)

# Get all experiments
experiments = graph.get_experiments(valid_only=True)

# Get all failures
failures = graph.get_failures(valid_only=True)

# Get all interventions
interventions = graph.get_interventions(valid_only=True)

Get Neighbors¶

# Get nodes connected by outgoing edges
related = graph.get_neighbors(
    node_id=failure_id,
    relation=EdgeRelation.ADDRESSED_BY,
    direction="outgoing"
)

# Get nodes connected by incoming edges
causes = graph.get_neighbors(
    node_id=failure_id,
    relation=EdgeRelation.CAUSED_BY,
    direction="incoming"
)

Search by Data¶

# Find all S3 severity failures
critical_failures = graph.search(
    data_filter={"severity": "S3"},
    node_type=NodeType.FAILURE_MODE
)

# Find interventions with block risk tier
blocked = graph.search(
    data_filter={"risk_tier": "block"},
    node_type=NodeType.INTERVENTION
)

# Find unresolved failures
unresolved = graph.find_unresolved_failures()

# Find high-severity failures
severe = graph.find_failures_by_severity(min_severity="S3")

# Find interventions by risk
risky = graph.find_interventions_by_risk(risk_tier="review")

Temporal Queries¶

from datetime import datetime

# Get state at a specific time
past_failures = graph.snapshot_at(
    timestamp=datetime(2024, 1, 15),
    node_type=NodeType.FAILURE_MODE
)

# Track failure evolution
evolution = graph.get_failure_evolution(failure_class="goal_drift")

Lineage Tracking¶

Track the causal chain from effect to root cause.

Get Lineage¶

# Get full causal lineage
lineage = graph.get_lineage(failure_id, max_depth=10)

# Returns: [(node, edge), (node, edge), ...]
# Starting from the failure back to root causes

for node, edge in lineage:
    print(f"{node.node_type}: {node.data}")
    print(f"  via: {edge.relation}")

Failure Evolution¶

Track how failures mutate over time:

# Get evolution of a failure class
evolution = graph.get_failure_evolution("goal_drift")

# Shows: Original failure -> Mutation 1 -> Mutation 2 -> ...
for failure in evolution:
    print(f"{failure.created_at}: {failure.data['severity']}")

Example: Full Lineage¶

Failure: goal_drift_v3 (S3)
    │
    ├── EVOLVED_INTO from: goal_drift_v2 (S2)
    │   │
    │   └── EVOLVED_INTO from: goal_drift_v1 (S1)
    │       │
    │       └── OBSERVED_IN: run_exp_001_5
    │           │
    │           └── EXECUTED_AS: experiment_001
    │               │
    │               └── TESTED_IN: hypothesis_goal_drift

Recording Findings¶

Convenience methods for recording research findings.

Record Hypothesis¶

node = graph.record_hypothesis(
    target_surface="context_window",
    expected_failure="attention_dilution",
    confidence=0.7,
    priority="high"
)

Record Experiment¶

node = graph.record_experiment(
    hypothesis_id=hypothesis.id,
    stress_type="context_overflow",
    mode="iterative",
    constraints={
        "max_tokens": 200000,
        "timeout_seconds": 600,
    }
)
# Automatically links to hypothesis via TESTED_IN edge

Record Failure¶

node = graph.record_failure(
    run_id=run.id,
    primary_class="long_context",
    secondary_class="attention_dilution",
    severity="S2",
    trigger_signature=["stress:context_overflow", "high_tool_usage"],
    reproducibility=0.8,
    parent_failure_id=None  # Set if this evolved from another
)
# Automatically links to run via OBSERVED_IN edge

Record Intervention¶

node = graph.record_intervention(
    failure_id=failure.id,
    intervention_type="context_limit",
    payload={
        "max_tokens": 100000,
        "strategy": "recency_weighted"
    },
    expected_gains={"failure_reduction": 0.5},
    expected_regressions={"capability_reduction": 0.2},
    risk_tier="review"
)
# Automatically links to failure via ADDRESSED_BY edge

Record Deployment¶

node = graph.record_deployment(
    intervention_id=intervention.id,
    mode="production",
    rollback_state={"previous_config": {...}}
)
# Automatically links to intervention via DEPLOYED_AS edge

Record Rollback¶

node = graph.record_rollback(
    deployment_id=deployment.id,
    reason="Latency regression detected",
    regression_failure_id=new_failure.id  # Optional
)
# Links deployment -> rollback and deployment -> regression failure

Examples¶

Complete Research Cycle Recording¶

from tinman.memory.graph import MemoryGraph
from tinman.memory.models import NodeType, EdgeRelation
from sqlalchemy.orm import Session

# Initialize graph with database session
graph = MemoryGraph(session)

# 1. Record hypothesis
hypothesis = graph.record_hypothesis(
    target_surface="tool_use",
    expected_failure="parameter_injection",
    confidence=0.7,
    priority="high"
)
print(f"Hypothesis: {hypothesis.id}")

# 2. Record experiment
experiment = graph.record_experiment(
    hypothesis_id=hypothesis.id,
    stress_type="tool_injection",
    mode="adversarial",
    constraints={"max_retries": 3}
)
print(f"Experiment: {experiment.id}")

# 3. Record failure discovered in experiment
failure = graph.record_failure(
    run_id=experiment.id,  # Simplified - normally would link to Run
    primary_class="tool_use",
    secondary_class="parameter_error",
    severity="S2",
    trigger_signature=["injection:path_traversal"],
    reproducibility=0.6
)
print(f"Failure: {failure.id}")

# 4. Record intervention
intervention = graph.record_intervention(
    failure_id=failure.id,
    intervention_type="guardrail",
    payload={"validation": "strict_schema"},
    expected_gains={"failure_reduction": 0.7},
    expected_regressions={"latency_increase": 0.1},
    risk_tier="safe"
)
print(f"Intervention: {intervention.id}")

# 5. Record deployment
deployment = graph.record_deployment(
    intervention_id=intervention.id,
    mode="production",
    rollback_state={"previous_validation": "none"}
)
print(f"Deployment: {deployment.id}")

# Query the complete lineage
lineage = graph.get_lineage(failure.id)
print("\nFailure Lineage:")
for node, edge in lineage:
    print(f"  {node.node_type.value}: {node.id}")

Temporal Analysis¶

from datetime import datetime, timedelta

# Get failures known at deployment time
deployment_time = datetime(2024, 1, 15, 10, 0)
known_failures = graph.snapshot_at(
    deployment_time,
    node_type=NodeType.FAILURE_MODE
)

print(f"Failures known at deployment: {len(known_failures)}")

# Get failures discovered after deployment
now = datetime.utcnow()
recent_failures = graph.search(
    {"created_after": deployment_time.isoformat()},
    node_type=NodeType.FAILURE_MODE
)

print(f"New failures since deployment: {len(recent_failures)}")

# Check if any deployed intervention caused regression
deployments = graph.get_interventions(valid_only=True)
for dep in deployments:
    regressions = graph.get_neighbors(
        dep.id,
        relation=EdgeRelation.REGRESSED_AS,
        direction="outgoing"
    )
    if regressions:
        print(f"Deployment {dep.id} caused {len(regressions)} regressions")

# Find all failures in the goal_drift family
goal_drift_failures = graph.search(
    {"primary_class": "reasoning", "secondary_class": "goal_drift"},
    node_type=NodeType.FAILURE_MODE
)

# Track their evolution
for failure in goal_drift_failures:
    # Find what this failure evolved from
    parents = graph.get_neighbors(
        failure.id,
        relation=EdgeRelation.EVOLVED_INTO,
        direction="incoming"
    )

    # Find what this failure evolved into
    children = graph.get_neighbors(
        failure.id,
        relation=EdgeRelation.EVOLVED_INTO,
        direction="outgoing"
    )

    print(f"Failure {failure.id}:")
    print(f"  Evolved from: {[p.id for p in parents]}")
    print(f"  Evolved into: {[c.id for c in children]}")

Graph Statistics¶

stats = graph.get_stats()
print("Memory Graph Statistics:")
for node_type, count in stats.items():
    print(f"  {node_type}: {count}")

# Output:
# Memory Graph Statistics:
#   hypothesis: 45
#   experiment: 120
#   run: 600
#   failure_mode: 32
#   intervention: 28
#   deployment: 15
#   rollback: 3

Summary¶

The Memory Graph provides:

Capability	Description
Node Storage	Typed nodes for all research entities
Edge Relations	Causal and logical relationships
Temporal Versioning	Query state at any point in time
Lineage Tracking	Trace from effect to root cause
Search	Find nodes by data attributes
Evolution Tracking	See how failures mutate

This enables: - Forensic Analysis - What did we know when? - Knowledge Accumulation - Learn from all research - Pattern Recognition - Find recurring failures - Audit Trail - Complete history of findings

Next Steps¶

AGENTS.md - How agents use the memory graph
ARCHITECTURE.md - System design overview
CONFIGURATION.md - Database configuration

Memory Graph¶

Table of Contents¶

Overview¶

Key Capabilities¶

Design Principles¶

Graph Model¶

Node Types¶

Node Structure¶

Node Type Details¶

HYPOTHESIS¶

EXPERIMENT¶

RUN¶

FAILURE_MODE¶

INTERVENTION¶

DEPLOYMENT¶

Edge Relations¶

Edge Structure¶

Relationship Semantics¶

Temporal Versioning¶

Validity Period¶

Invalidating Nodes¶

Temporal Queries¶

Use Cases for Temporal Queries¶

Core Operations¶

Node Operations¶

Edge Operations¶

Query Operations¶

Get by Type¶

Get Neighbors¶

Search by Data¶

Temporal Queries¶

Lineage Tracking¶

Get Lineage¶

Failure Evolution¶

Example: Full Lineage¶

Recording Findings¶

Record Hypothesis¶

Record Experiment¶

Record Failure¶

Record Intervention¶

Record Deployment¶

Record Rollback¶

Examples¶

Complete Research Cycle Recording¶

Temporal Analysis¶

Finding Related Failures¶

Graph Statistics¶

Summary¶

Next Steps¶