How to See Inside Your AI Model in 3 Lines of Python

python dev.to

I built a tool that makes any PyTorch model inspectable with one line of code. No retraining, no architecture changes, no extra memory. Here's how it works.

The Problem

You train a model. It works. But why does it work? Which layers matter? Are any neurons dead? What are the attention heads actually doing?

Most interpretability tools try to answer these questions after the fact -- approximations bolted onto a black box. I wanted something different: exact traces of what actually happened inside the model during inference.

The Solution: 3 Lines

pip install hdna-workbench[pytorch]
Enter fullscreen mode Exit fullscreen mode
import workbench

model = workbench.inspect(model)   # swap layers for inspectable versions
output = model(input)              # same math, same output
traces = workbench.trace(model)    # see what every layer did
Enter fullscreen mode Exit fullscreen mode

That's it. workbench.inspect() walks your model and replaces each layer with a subclass that records what happens during forward passes. nn.Linear becomes InspectableLinear, nn.MultiheadAttention becomes InspectableMultiheadAttention, etc.

Because they're subclasses:

  • isinstance(layer, nn.Linear) is still True
  • model.state_dict() works unchanged
  • torch.save(model) works unchanged
  • Output is numerically identical

What You Get

Here's real output from a small transformer:

embedding                   calls=1  shape=  (2, 32, 128)  time=0.08ms
layers.0.self_attn          calls=1  shape=  (2, 32, 128)  time=2.29ms
  Head entropy:  ['2.922', '2.987', '2.984', '2.970']
  Head sharpness: ['0.173', '0.159', '0.152', '0.156']
  Head redundancy: 0.5618
layers.0.linear1            calls=1  shape=  (2, 32, 256)  time=0.07ms
  Weights: mean=0.0001 std=0.0511 sparsity=0.0%
layers.1.self_attn          calls=1  shape=  (2, 32, 128)  time=0.66ms
  Head redundancy: 0.8213
norm                        calls=1  shape=  (2, 32, 128)  time=0.03ms
head                        calls=1  shape= (2, 32, 1000)  time=0.12ms
Enter fullscreen mode Exit fullscreen mode

Per-layer timing. Attention head entropy (how spread out the attention is). Head redundancy (how similar heads are to each other). Weight statistics. All automatic.

Going Deeper

Find Dead Neurons and Anomalies

anomalies = workbench.anomalies(model)
for a in anomalies:
    print(f"WARNING: {a['layer']} -- {a['issue']}")
Enter fullscreen mode Exit fullscreen mode

Inspect Attention Patterns

for name, module in model.named_modules():
    if hasattr(module, 'attention_weights'):
        heads = module.head_summary()
        for h in heads:
            print(f"Head {h['head']}: entropy={h['entropy']:.3f}")
Enter fullscreen mode Exit fullscreen mode

Track Embedding Usage

for name, module in model.named_modules():
    if hasattr(module, 'most_accessed'):
        print(f"Top tokens: {module.most_accessed(10)}")
        print(f"Never accessed: {len(module.never_accessed())} tokens")
Enter fullscreen mode Exit fullscreen mode

Set Breakpoints

# Halt when output magnitude exceeds threshold
layer.add_breakpoint(lambda l, inp, out: out.abs().max() > 100)
Enter fullscreen mode Exit fullscreen mode

Control Trace Depth

from workbench import TraceDepth

workbench.set_depth(model, TraceDepth.FULL)    # activations + gradients + history
workbench.set_depth(model, TraceDepth.STATS)   # running statistics only
workbench.set_depth(model, TraceDepth.OFF)     # disable for benchmarking
Enter fullscreen mode Exit fullscreen mode

Revert When Done

model = workbench.revert(model)    # back to standard PyTorch
torch.save(model, "clean.pt")     # no workbench dependency in saved model
Enter fullscreen mode Exit fullscreen mode

It's More Than a Wrapper

The inspection wrapper is one part of a larger platform called HDNA Workbench. HDNA stands for Highly Dynamic Neural Architecture -- it includes:

  • An open-box AI engine where every neuron has persistent memory, mutable routing tables, and semantic tags. Not a black box with explanations bolted on -- transparent by design. Core runs on numpy alone.
  • Universal adapters that connect any model (PyTorch, HuggingFace, ONNX, or API) to the same research tools
  • 6 research tools: Inspector, Decision Replay, Daemon Studio, Experiment Forge, Model Comparison, and Exporter
  • 3 built-in curricula: Math (14 phases), Language (sentiment/topic/emotion/intent), Spatial (grid pattern recognition)
  • Compliance mapping to EU AI Act, NIST AI RMF, and ISO/IEC 42001

If you just want the PyTorch inspection, pip install hdna-workbench[pytorch] and use the 3 lines above. If you want to study how AI learns from the ground up, the HDNA core is there too.

14 Supported Layer Types

Category Layers
Core Linear, Embedding, Sequential
Transformer MultiheadAttention, TransformerEncoderLayer, TransformerDecoderLayer
Normalization LayerNorm, BatchNorm1d, BatchNorm2d
Convolution Conv1d, Conv2d
Activation ReLU, GELU, Softmax

Custom layers: workbench.register(MyLayer, InspectableMyLayer)

Links

Feedback welcome -- especially from anyone working on model interpretability or AI compliance.

Source: dev.to

arrow_back Back to Tutorials