Skip to content

Memory Graph

This document provides complete documentation for Tinman's Research Memory Graph—the temporal knowledge store that tracks all discoveries, relationships, and behavioral lineages.


Table of Contents

  1. Overview
  2. Graph Model
  3. Node Types
  4. Edge Relations
  5. Temporal Versioning
  6. Core Operations
  7. Query Operations
  8. Lineage Tracking
  9. Recording Findings
  10. Examples

Overview

The Memory Graph is Tinman's central knowledge store. It tracks:

  • Model behavior evolution - How behavior changes over time
  • Failure emergence and evolution - New failures and their mutations
  • Intervention effects - What fixes worked and what didn't
  • Causal relationships - What caused what

Key Capabilities

┌─────────────────────────────────────────────────────────────────┐
│                     MEMORY GRAPH CAPABILITIES                    │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │    Temporal     │  │    Lineage      │  │     Search      │  │
│  │    Versioning   │  │    Tracking     │  │                 │  │
│  │                 │  │                 │  │                 │  │
│  │ "What did we    │  │ "What caused    │  │ "Find all S3+   │  │
│  │  know at T?"    │  │  this failure?" │  │  failures"      │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                  │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │   Persistence   │  │   Evolution     │  │  Relationships  │  │
│  │                 │  │   Tracking      │  │                 │  │
│  │                 │  │                 │  │                 │  │
│  │ PostgreSQL or   │  │ "How did this   │  │ Hypothesis ->   │  │
│  │ SQLite backed   │  │  failure evolve?"│ │ Experiment ->   │  │
│  │                 │  │                 │  │ Failure -> Fix  │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Design Principles

  1. Temporal - Every node has a validity period
  2. Relational - Edges capture causal and logical relationships
  3. Persistent - Survives restarts, accumulates knowledge
  4. Queryable - Find patterns, lineage, evolution

Graph Model

The Memory Graph is a directed graph with typed nodes and edges.

┌─────────────────────────────────────────────────────────────────┐
│                       GRAPH STRUCTURE                            │
│                                                                  │
│   ┌────────────┐   TESTED_IN    ┌────────────┐                  │
│   │ Hypothesis ├───────────────▶│ Experiment │                  │
│   └────────────┘                └──────┬─────┘                  │
│                                        │ EXECUTED_AS            │
│                                        ▼                        │
│                                 ┌────────────┐                  │
│                                 │    Run     │                  │
│                                 └──────┬─────┘                  │
│                                        │ OBSERVED_IN            │
│                                        ▼                        │
│   ┌────────────┐                ┌────────────┐                  │
│   │  Failure   │◀───────────────│  Failure   │                  │
│   │ (parent)   │   EVOLVED_INTO │            │                  │
│   └────────────┘                └──────┬─────┘                  │
│                                        │ ADDRESSED_BY           │
│                                        ▼                        │
│   ┌────────────┐   DEPLOYED_AS  ┌────────────┐                  │
│   │ Deployment │◀───────────────│Intervention│                  │
│   └──────┬─────┘                └────────────┘                  │
│          │ ROLLED_BACK_BY                                       │
│          ▼                                                      │
│   ┌────────────┐                                                │
│   │  Rollback  │                                                │
│   └────────────┘                                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Node Types

Nodes represent entities in the research process.

class NodeType(str, Enum):
    """Types of nodes in the memory graph."""
    MODEL_VERSION = "model_version"   # Target model version
    HYPOTHESIS = "hypothesis"         # Failure hypothesis
    EXPERIMENT = "experiment"         # Experiment design
    RUN = "run"                       # Single experiment run
    FAILURE_MODE = "failure_mode"     # Discovered failure
    INTERVENTION = "intervention"     # Proposed fix
    SIMULATION = "simulation"         # Counterfactual simulation
    DEPLOYMENT = "deployment"         # Deployed intervention
    ROLLBACK = "rollback"             # Rolled back deployment

Node Structure

@dataclass
class Node:
    """A node in the Research Memory Graph."""
    id: str                        # Unique identifier
    node_type: NodeType            # Type of node
    created_at: datetime           # When created
    valid_from: datetime           # When validity starts
    valid_to: Optional[datetime]   # When validity ends (null = still valid)
    data: dict[str, Any]           # Node-specific data

Node Type Details

HYPOTHESIS

Represents a testable hypothesis about potential failures.

{
    "node_type": "hypothesis",
    "data": {
        "target_surface": "tool_use",
        "expected_failure": "parameter_injection",
        "confidence": 0.7,
        "priority": "high",
        "rationale": "Tool parameters not validated",
    }
}

EXPERIMENT

Represents an experiment design to test a hypothesis.

{
    "node_type": "experiment",
    "data": {
        "hypothesis_id": "hyp_123",
        "stress_type": "tool_injection",
        "mode": "single",
        "constraints": {
            "max_tokens": 100000,
            "timeout_seconds": 300,
        },
    }
}

RUN

Represents a single experiment run with results.

{
    "node_type": "run",
    "data": {
        "experiment_id": "exp_456",
        "run_number": 3,
        "failure_triggered": true,
        "tokens_used": 15000,
        "latency_ms": 2500,
        "trace": {...},
    }
}

FAILURE_MODE

Represents a discovered failure.

{
    "node_type": "failure_mode",
    "data": {
        "primary_class": "tool_use",
        "secondary_class": "parameter_error",
        "severity": "S2",
        "trigger_signature": ["stress:tool_injection", "error:validation"],
        "reproducibility": 0.8,
        "is_resolved": false,
    }
}

INTERVENTION

Represents a proposed fix for a failure.

{
    "node_type": "intervention",
    "data": {
        "intervention_type": "guardrail",
        "payload": {
            "validation_type": "schema",
            "reject_on_fail": true,
        },
        "expected_gains": {"failure_reduction": 0.6},
        "expected_regressions": {"latency_increase": 0.2},
        "risk_tier": "review",
    }
}

DEPLOYMENT

Represents a deployed intervention.

{
    "node_type": "deployment",
    "data": {
        "intervention_id": "int_789",
        "mode": "production",
        "rollback_state": {...},
        "status": "active",
    }
}

Edge Relations

Edges represent relationships between nodes.

class EdgeRelation(str, Enum):
    """Types of edges in the memory graph."""
    TESTED_IN = "tested_in"           # Hypothesis -> Experiment
    EXECUTED_AS = "executed_as"       # Experiment -> Run
    OBSERVED_IN = "observed_in"       # Failure -> Run
    CAUSED_BY = "caused_by"           # Effect -> Cause
    ADDRESSED_BY = "addressed_by"     # Failure -> Intervention
    SIMULATED_BY = "simulated_by"     # Intervention -> Simulation
    DEPLOYED_AS = "deployed_as"       # Intervention -> Deployment
    ROLLED_BACK_BY = "rolled_back_by" # Deployment -> Rollback
    REGRESSED_AS = "regressed_as"     # Deployment -> Failure
    EVOLVED_INTO = "evolved_into"     # Failure -> Failure (evolution)

Edge Structure

@dataclass
class Edge:
    """An edge in the Research Memory Graph."""
    id: str                        # Unique identifier
    src_id: str                    # Source node ID
    dst_id: str                    # Destination node ID
    relation: EdgeRelation         # Relationship type
    created_at: datetime           # When created
    metadata: dict[str, Any]       # Edge-specific data

Relationship Semantics

Relation From To Meaning
TESTED_IN Hypothesis Experiment Hypothesis is tested by this experiment
EXECUTED_AS Experiment Run Experiment was executed as this run
OBSERVED_IN Failure Run Failure was observed in this run
CAUSED_BY Effect Cause Effect was caused by this cause
ADDRESSED_BY Failure Intervention Failure is addressed by intervention
SIMULATED_BY Intervention Simulation Intervention was simulated
DEPLOYED_AS Intervention Deployment Intervention was deployed
ROLLED_BACK_BY Deployment Rollback Deployment was rolled back
REGRESSED_AS Deployment Failure Deployment caused this regression
EVOLVED_INTO Failure Failure Failure evolved into a new form

Temporal Versioning

Every node has a validity period, enabling temporal queries.

Validity Period

node.valid_from  # When the node became valid
node.valid_to    # When the node stopped being valid (null = still valid)
node.is_valid    # Check if currently valid

Invalidating Nodes

When findings are superseded or corrected:

# Mark node as no longer valid
graph.invalidate_node(node_id)

# This sets valid_to = now
# Node is still queryable but marked as historical

Temporal Queries

Query the graph state at any point in time:

# What failures did we know about on Jan 15?
deployment_time = datetime(2024, 1, 15, 10, 30)
known_failures = graph.snapshot_at(
    deployment_time,
    node_type=NodeType.FAILURE_MODE
)

# What was the graph state when we deployed?
graph_state = graph.snapshot_at(deployment_time)

Use Cases for Temporal Queries

  1. Forensic Analysis: "What did we know when the incident occurred?"
  2. Deployment Auditing: "Were known issues addressed before release?"
  3. Trend Analysis: "Are failures increasing or decreasing?"
  4. Compliance: "What was our risk posture at audit time?"

Core Operations

Node Operations

from tinman.memory.graph import MemoryGraph
from tinman.memory.models import Node, NodeType

# Add a node
node = Node(
    node_type=NodeType.HYPOTHESIS,
    data={"target_surface": "reasoning", "expected_failure": "goal_drift"}
)
node_id = graph.add_node(node)

# Get a node
node = graph.get_node(node_id)

# Invalidate a node (soft delete)
graph.invalidate_node(node_id)

Edge Operations

from tinman.memory.models import Edge, EdgeRelation

# Add an edge
edge = Edge(
    src_id=hypothesis_id,
    dst_id=experiment_id,
    relation=EdgeRelation.TESTED_IN
)
edge_id = graph.add_edge(edge)

# Or use the convenience method
edge = graph.link(
    src_id=hypothesis_id,
    dst_id=experiment_id,
    relation=EdgeRelation.TESTED_IN,
    metadata={"design_version": 1}
)

# Get an edge
edge = graph.get_edge(edge_id)

Query Operations

Get by Type

# Get all hypotheses
hypotheses = graph.get_hypotheses(valid_only=True, limit=100)

# Get all experiments
experiments = graph.get_experiments(valid_only=True)

# Get all failures
failures = graph.get_failures(valid_only=True)

# Get all interventions
interventions = graph.get_interventions(valid_only=True)

Get Neighbors

# Get nodes connected by outgoing edges
related = graph.get_neighbors(
    node_id=failure_id,
    relation=EdgeRelation.ADDRESSED_BY,
    direction="outgoing"
)

# Get nodes connected by incoming edges
causes = graph.get_neighbors(
    node_id=failure_id,
    relation=EdgeRelation.CAUSED_BY,
    direction="incoming"
)

Search by Data

# Find all S3 severity failures
critical_failures = graph.search(
    data_filter={"severity": "S3"},
    node_type=NodeType.FAILURE_MODE
)

# Find interventions with block risk tier
blocked = graph.search(
    data_filter={"risk_tier": "block"},
    node_type=NodeType.INTERVENTION
)

# Find unresolved failures
unresolved = graph.find_unresolved_failures()

# Find high-severity failures
severe = graph.find_failures_by_severity(min_severity="S3")

# Find interventions by risk
risky = graph.find_interventions_by_risk(risk_tier="review")

Temporal Queries

from datetime import datetime

# Get state at a specific time
past_failures = graph.snapshot_at(
    timestamp=datetime(2024, 1, 15),
    node_type=NodeType.FAILURE_MODE
)

# Track failure evolution
evolution = graph.get_failure_evolution(failure_class="goal_drift")

Lineage Tracking

Track the causal chain from effect to root cause.

Get Lineage

# Get full causal lineage
lineage = graph.get_lineage(failure_id, max_depth=10)

# Returns: [(node, edge), (node, edge), ...]
# Starting from the failure back to root causes

for node, edge in lineage:
    print(f"{node.node_type}: {node.data}")
    print(f"  via: {edge.relation}")

Failure Evolution

Track how failures mutate over time:

# Get evolution of a failure class
evolution = graph.get_failure_evolution("goal_drift")

# Shows: Original failure -> Mutation 1 -> Mutation 2 -> ...
for failure in evolution:
    print(f"{failure.created_at}: {failure.data['severity']}")

Example: Full Lineage

Failure: goal_drift_v3 (S3)
    ├── EVOLVED_INTO from: goal_drift_v2 (S2)
    │   │
    │   └── EVOLVED_INTO from: goal_drift_v1 (S1)
    │       │
    │       └── OBSERVED_IN: run_exp_001_5
    │           │
    │           └── EXECUTED_AS: experiment_001
    │               │
    │               └── TESTED_IN: hypothesis_goal_drift

Recording Findings

Convenience methods for recording research findings.

Record Hypothesis

node = graph.record_hypothesis(
    target_surface="context_window",
    expected_failure="attention_dilution",
    confidence=0.7,
    priority="high"
)

Record Experiment

node = graph.record_experiment(
    hypothesis_id=hypothesis.id,
    stress_type="context_overflow",
    mode="iterative",
    constraints={
        "max_tokens": 200000,
        "timeout_seconds": 600,
    }
)
# Automatically links to hypothesis via TESTED_IN edge

Record Failure

node = graph.record_failure(
    run_id=run.id,
    primary_class="long_context",
    secondary_class="attention_dilution",
    severity="S2",
    trigger_signature=["stress:context_overflow", "high_tool_usage"],
    reproducibility=0.8,
    parent_failure_id=None  # Set if this evolved from another
)
# Automatically links to run via OBSERVED_IN edge

Record Intervention

node = graph.record_intervention(
    failure_id=failure.id,
    intervention_type="context_limit",
    payload={
        "max_tokens": 100000,
        "strategy": "recency_weighted"
    },
    expected_gains={"failure_reduction": 0.5},
    expected_regressions={"capability_reduction": 0.2},
    risk_tier="review"
)
# Automatically links to failure via ADDRESSED_BY edge

Record Deployment

node = graph.record_deployment(
    intervention_id=intervention.id,
    mode="production",
    rollback_state={"previous_config": {...}}
)
# Automatically links to intervention via DEPLOYED_AS edge

Record Rollback

node = graph.record_rollback(
    deployment_id=deployment.id,
    reason="Latency regression detected",
    regression_failure_id=new_failure.id  # Optional
)
# Links deployment -> rollback and deployment -> regression failure

Examples

Complete Research Cycle Recording

from tinman.memory.graph import MemoryGraph
from tinman.memory.models import NodeType, EdgeRelation
from sqlalchemy.orm import Session

# Initialize graph with database session
graph = MemoryGraph(session)

# 1. Record hypothesis
hypothesis = graph.record_hypothesis(
    target_surface="tool_use",
    expected_failure="parameter_injection",
    confidence=0.7,
    priority="high"
)
print(f"Hypothesis: {hypothesis.id}")

# 2. Record experiment
experiment = graph.record_experiment(
    hypothesis_id=hypothesis.id,
    stress_type="tool_injection",
    mode="adversarial",
    constraints={"max_retries": 3}
)
print(f"Experiment: {experiment.id}")

# 3. Record failure discovered in experiment
failure = graph.record_failure(
    run_id=experiment.id,  # Simplified - normally would link to Run
    primary_class="tool_use",
    secondary_class="parameter_error",
    severity="S2",
    trigger_signature=["injection:path_traversal"],
    reproducibility=0.6
)
print(f"Failure: {failure.id}")

# 4. Record intervention
intervention = graph.record_intervention(
    failure_id=failure.id,
    intervention_type="guardrail",
    payload={"validation": "strict_schema"},
    expected_gains={"failure_reduction": 0.7},
    expected_regressions={"latency_increase": 0.1},
    risk_tier="safe"
)
print(f"Intervention: {intervention.id}")

# 5. Record deployment
deployment = graph.record_deployment(
    intervention_id=intervention.id,
    mode="production",
    rollback_state={"previous_validation": "none"}
)
print(f"Deployment: {deployment.id}")

# Query the complete lineage
lineage = graph.get_lineage(failure.id)
print("\nFailure Lineage:")
for node, edge in lineage:
    print(f"  {node.node_type.value}: {node.id}")

Temporal Analysis

from datetime import datetime, timedelta

# Get failures known at deployment time
deployment_time = datetime(2024, 1, 15, 10, 0)
known_failures = graph.snapshot_at(
    deployment_time,
    node_type=NodeType.FAILURE_MODE
)

print(f"Failures known at deployment: {len(known_failures)}")

# Get failures discovered after deployment
now = datetime.utcnow()
recent_failures = graph.search(
    {"created_after": deployment_time.isoformat()},
    node_type=NodeType.FAILURE_MODE
)

print(f"New failures since deployment: {len(recent_failures)}")

# Check if any deployed intervention caused regression
deployments = graph.get_interventions(valid_only=True)
for dep in deployments:
    regressions = graph.get_neighbors(
        dep.id,
        relation=EdgeRelation.REGRESSED_AS,
        direction="outgoing"
    )
    if regressions:
        print(f"Deployment {dep.id} caused {len(regressions)} regressions")
# Find all failures in the goal_drift family
goal_drift_failures = graph.search(
    {"primary_class": "reasoning", "secondary_class": "goal_drift"},
    node_type=NodeType.FAILURE_MODE
)

# Track their evolution
for failure in goal_drift_failures:
    # Find what this failure evolved from
    parents = graph.get_neighbors(
        failure.id,
        relation=EdgeRelation.EVOLVED_INTO,
        direction="incoming"
    )

    # Find what this failure evolved into
    children = graph.get_neighbors(
        failure.id,
        relation=EdgeRelation.EVOLVED_INTO,
        direction="outgoing"
    )

    print(f"Failure {failure.id}:")
    print(f"  Evolved from: {[p.id for p in parents]}")
    print(f"  Evolved into: {[c.id for c in children]}")

Graph Statistics

stats = graph.get_stats()
print("Memory Graph Statistics:")
for node_type, count in stats.items():
    print(f"  {node_type}: {count}")

# Output:
# Memory Graph Statistics:
#   hypothesis: 45
#   experiment: 120
#   run: 600
#   failure_mode: 32
#   intervention: 28
#   deployment: 15
#   rollback: 3

Summary

The Memory Graph provides:

Capability Description
Node Storage Typed nodes for all research entities
Edge Relations Causal and logical relationships
Temporal Versioning Query state at any point in time
Lineage Tracking Trace from effect to root cause
Search Find nodes by data attributes
Evolution Tracking See how failures mutate

This enables: - Forensic Analysis - What did we know when? - Knowledge Accumulation - Learn from all research - Pattern Recognition - Find recurring failures - Audit Trail - Complete history of findings


Next Steps