pip install provedex: a tamper-evident black box for your Python AI agent

Your AI agent's audit log lives in a database you control. Which means you can edit it. When a regulator, an auditor, or a court later asks what the agent actually did, all anyone has is your word.

Provedex is an open-source fix for that. pip install provedex gives your Python backend a black box recorder: every agent action is cryptographically signed at the moment it happens, chained to the action before it, and written to an append-only file. Anyone with the public key can verify the whole thing offline, with no call back to you. Edit or drop a single event and the chain visibly breaks.

The native SDK is a PyO3 binding over the same Rust core the reference CLI uses, so a ledger you sign from Python verifies byte-for-byte with the Rust verifier. Not a reimplementation, the same primitive.

Install

pip install provedex

Pre-built wheels ship for cpython 3.11+ on Linux x86_64, Linux aarch64, and macOS arm64. No Rust toolchain needed to install. Add it to the backend service that runs your agents.

How it fits your backend

provedex is a library you embed, not a service you run.

your backend (agents + automations)          an auditor, months later
  pip install provedex                          (only needs the public key)
  session.record(event)  --->  ledger.ndjson  --->  provedex verify  ->  VALID / BROKEN
  (signing key stays here)     (the evidence)       (offline, no trust in you)

Sign in-process. Wherever the agent does something worth proving, you call session.record(...). The event is signed and appended as it happens, no network hop.
The key and the ledger live on the backend host. The key is read once at startup from a path you control. The ledger is an append-only NDJSON file.
Verify anywhere, later, by anyone, with only the public key. That separation is the point: the operator never has to be trusted for the integrity of the log.

Quickstart

Events carry SHA-256 digests of content, not raw content. What you hash versus keep in clear is your call.

import hashlib
import os

import provedex


def sha256_hex(data: str | bytes) -> str:
    if isinstance(data, str):
        data = data.encode("utf-8")
    return hashlib.sha256(data).hexdigest()


# Once at startup. Key is created on first run, then reused (0600 on unix).
keypair = provedex.SigningKeypair.load_or_create(
    os.path.expanduser("~/.provedex/keys/ed25519.key")
)

# One session per conversation / agent run. Resumes if the ledger exists.
session = provedex.Session.open(
    keypair=keypair,
    ledger_path=os.path.expanduser("~/.provedex/ledger.ndjson"),
    session_id="conversation-42",
)

session.record(
    provedex.events.session_started(
        agent_id="intake-bot", model_id="gpt-4o", session_id="conversation-42"
    )
)

prompt = "Summarize the patient's chief complaint."
response = call_your_model(prompt)  # your code

session.record(
    provedex.events.model_invoked(
        model_id="gpt-4o",
        prompt_sha256=sha256_hex(prompt),
        response_sha256=sha256_hex(response),
        prompt_tokens=18,
        response_tokens=42,
    )
)

session.record(
    provedex.events.session_ended(
        reason="completed", summary_sha256=sha256_hex(response)
    )
)

That is the whole integration: open a session, record events at the points worth proving, done. The signing and chaining happen for you.

Verifying, offline, with only the public key

# Anyone with the public key can verify this ledger, offline, later.
report = provedex.verify_file(os.path.expanduser("~/.provedex/ledger.ndjson"))
assert report.ok

Or from the Rust CLI, against the same file:

provedex verify ~/.provedex/ledger.ndjson

Both go through one canonical-JSON encoder and one Ed25519 signature scheme, both published as specs with byte-level test vectors. That is why the Python-signed ledger and the Rust verifier agree to the byte.

The seven things you can record

The event schema is fixed and small. Seven factories cover an agent's lifecycle:

Factory	Signs
`events.session_started(agent_id, model_id, session_id)`	session open
`events.utterance_captured(audio_sha256, transcript, lang, duration_ms)`	inbound speech
`events.tool_called(tool_name, args_sha256, args_redacted)`	tool invocation
`events.tool_returned(tool_name, result_sha256, latency_ms, success)`	tool result
`events.model_invoked(model_id, prompt_sha256, response_sha256, prompt_tokens, response_tokens)`	LLM call
`events.utterance_spoken(text_sha256, text, audio_sha256)`	outbound speech
`events.session_ended(reason, summary_sha256)`	session close

A fixed schema is deliberate: a verifier in any language knows exactly what shapes to expect.

Native SDK vs the sidecar

Two ways to integrate, pick by constraint:

Native SDK (this post): in-process, sub-millisecond signing, no extra process. Best when you can add a compiled wheel to the backend that runs your agents.
Sidecar (provedex-agent): a localhost HTTP daemon you POST events to. Best when you do not want a native extension in your runtime, or you are not in Python. It is the default integration.

Honest numbers, M4 Pro: in-process sign with no I/O is ~11 us. Full cycle with append plus fsync on every event is ~3.8 ms, so ~261 events/sec if you fsync every write. Batch the flush if you need more throughput and can accept a wider crash window.

What this is not

Not observability, not PII redaction, not a compliance dashboard, not a blockchain. One primitive: signed, chained, third-party-verifiable evidence of what your agent did. Apache-2.0 for the core, forever.

Repo, specs, and architecture decision records: https://github.com/provedex/provedex

If you run AI agents anywhere a regulator or a court might one day ask "prove what it did," try it on one session and run the verifier. Feedback and issues welcome.