Your AI agent's audit log lives in a database you control. Which means you can edit it. When a regulator, an auditor, or a court later asks what the agent actually did, all anyone has is your word.
Provedex is an open-source fix for that. pip install provedex gives your Python backend a black box recorder: every agent action is cryptographically signed at the moment it happens, chained to the action before it, and written to an append-only file. Anyone with the public key can verify the whole thing offline, with no call back to you. Edit or drop a single event and the chain visibly breaks.
The native SDK is a PyO3 binding over the same Rust core the reference CLI uses, so a ledger you sign from Python verifies byte-for-byte with the Rust verifier. Not a reimplementation, the same primitive.
Install
pip install provedex
Pre-built wheels ship for cpython 3.11+ on Linux x86_64, Linux aarch64, and macOS arm64. No Rust toolchain needed to install. Add it to the backend service that runs your agents.
How it fits your backend
provedex is a library you embed, not a service you run.
your backend (agents + automations) an auditor, months later
pip install provedex (only needs the public key)
session.record(event) ---> ledger.ndjson ---> provedex verify -> VALID / BROKEN
(signing key stays here) (the evidence) (offline, no trust in you)
- Sign in-process. Wherever the agent does something worth proving, you call
session.record(...). The event is signed and appended as it happens, no network hop. - The key and the ledger live on the backend host. The key is read once at startup from a path you control. The ledger is an append-only NDJSON file.
- Verify anywhere, later, by anyone, with only the public key. That separation is the point: the operator never has to be trusted for the integrity of the log.
Quickstart
Events carry SHA-256 digests of content, not raw content. What you hash versus keep in clear is your call.
import hashlib
import os
import provedex
def sha256_hex(data: str | bytes) -> str:
if isinstance(data, str):
data = data.encode("utf-8")
return hashlib.sha256(data).hexdigest()
# Once at startup. Key is created on first run, then reused (0600 on unix).
keypair = provedex.SigningKeypair.load_or_create(
os.path.expanduser("~/.provedex/keys/ed25519.key")
)
# One session per conversation / agent run. Resumes if the ledger exists.
session = provedex.Session.open(
keypair=keypair,
ledger_path=os.path.expanduser("~/.provedex/ledger.ndjson"),
session_id="conversation-42",
)
session.record(
provedex.events.session_started(
agent_id="intake-bot", model_id="gpt-4o", session_id="conversation-42"
)
)
prompt = "Summarize the patient's chief complaint."
response = call_your_model(prompt) # your code
session.record(
provedex.events.model_invoked(
model_id="gpt-4o",
prompt_sha256=sha256_hex(prompt),
response_sha256=sha256_hex(response),
prompt_tokens=18,
response_tokens=42,
)
)
session.record(
provedex.events.session_ended(
reason="completed", summary_sha256=sha256_hex(response)
)
)
That is the whole integration: open a session, record events at the points worth proving, done. The signing and chaining happen for you.
Verifying, offline, with only the public key
# Anyone with the public key can verify this ledger, offline, later.
report = provedex.verify_file(os.path.expanduser("~/.provedex/ledger.ndjson"))
assert report.ok
Or from the Rust CLI, against the same file:
provedex verify ~/.provedex/ledger.ndjson
Both go through one canonical-JSON encoder and one Ed25519 signature scheme, both published as specs with byte-level test vectors. That is why the Python-signed ledger and the Rust verifier agree to the byte.
The seven things you can record
The event schema is fixed and small. Seven factories cover an agent's lifecycle:
| Factory | Signs |
|---|---|
events.session_started(agent_id, model_id, session_id) |
session open |
events.utterance_captured(audio_sha256, transcript, lang, duration_ms) |
inbound speech |
events.tool_called(tool_name, args_sha256, args_redacted) |
tool invocation |
events.tool_returned(tool_name, result_sha256, latency_ms, success) |
tool result |
events.model_invoked(model_id, prompt_sha256, response_sha256, prompt_tokens, response_tokens) |
LLM call |
events.utterance_spoken(text_sha256, text, audio_sha256) |
outbound speech |
events.session_ended(reason, summary_sha256) |
session close |
A fixed schema is deliberate: a verifier in any language knows exactly what shapes to expect.
Native SDK vs the sidecar
Two ways to integrate, pick by constraint:
- Native SDK (this post): in-process, sub-millisecond signing, no extra process. Best when you can add a compiled wheel to the backend that runs your agents.
- Sidecar (
provedex-agent): a localhost HTTP daemon you POST events to. Best when you do not want a native extension in your runtime, or you are not in Python. It is the default integration.
Honest numbers, M4 Pro: in-process sign with no I/O is ~11 us. Full cycle with append plus fsync on every event is ~3.8 ms, so ~261 events/sec if you fsync every write. Batch the flush if you need more throughput and can accept a wider crash window.
What this is not
Not observability, not PII redaction, not a compliance dashboard, not a blockchain. One primitive: signed, chained, third-party-verifiable evidence of what your agent did. Apache-2.0 for the core, forever.
Repo, specs, and architecture decision records: https://github.com/provedex/provedex
If you run AI agents anywhere a regulator or a court might one day ask "prove what it did," try it on one session and run the verifier. Feedback and issues welcome.