Your AI agent already emits OpenTelemetry. Why aren't you watching it?

TL;DR: Spring AI, LangChain4j, Koog (Kotlin), the Python OpenLLMetry-style instrumentations, and the Go OTel SDKs all emit gen_ai.* spans natively now. So you don't need a vendor SDK to make your agent observable — you need an OTLP endpoint that knows what to do with the spans your framework is already throwing on the wire. Here's what that looks like in three lines of YAML or one Kotlin extension.

Someone on your team shipped an LLM agent two months ago. Today it ran up a $400 bill in twenty minutes, hallucinated a refund policy to a real customer, and got stuck in a tool-calling loop that retried the same broken payments.create call seventeen times before the rate limiter caught it.

You'd like to know which of those things happened first, which agent was responsible (you're up to four now), what the user typed, and why the planner decided that calling the payments API was a reasonable response to "how do I unsubscribe."

If you're observing your agents the same way you observe your other services, you can answer maybe two of those questions and only after a long Slack thread with the engineer who wrote the prompt. The trace span called POST /chat doesn't help you. Neither does the metric for p99 latency on /v1/agent/run.

This post is about why that gap exists, why it's about to close, and what to do about it today.

The agent observability gap

Two existing approaches sort of work and mostly don't:

Generic APM (Datadog, New Relic, Honeycomb-as-default-config) treats your agent like any other HTTP service. You get latency histograms, error rates, and a top-level span. You don't get the prompt, the model, the token counts, the tool calls, or the cost. The signal is buried under "request body" or not captured at all.

Vendor LLM-observability SDKs (Langfuse, Helicone, Phoenix, the proprietary ones) capture all the right signals but ship as a heavy SDK that you bolt onto your service. Every framework upgrade is now a coordination problem. Every backend switch is a rewrite. And the more frameworks your stack uses (Spring AI for the orchestrator, LangChain4j for the rag service, Koog for that one Kotlin pilot), the more SDKs you carry.

Neither is the right shape. The right shape is: your framework emits standard signal, your backend understands standard signal, you change zero application code.

Until recently that wasn't possible. The OpenTelemetry community had been working on the gen_ai.* semantic conventions for a year, but framework support was uneven and the conventions kept shifting.

That changed in the last six months. Concretely:

Spring AI 1.0 emits gen_ai.client.chat, gen_ai.tool.execute, and friends via Micrometer Observations.
LangChain4j emits the same via its ChatModelListener API.
Koog 0.8 ships a first-class OpenTelemetry feature with addDatadogExporter, addLangfuseExporter, addWeaveExporter.
Python OpenLLMetry / OpenInference instrumentations (Anthropic, OpenAI, LangChain, LlamaIndex) emit the same conventions and stream through standard OTel exporters.
Go's otel-instrumentation-genai is in alpha with the same shape.

In other words: the signal is on the wire, in standard form, regardless of which framework your team picked. What's missing is a backend that does something useful with it.

The four-line Spring AI version

Stock Spring Boot 3.5 + Spring AI 1.0. No vendor SDK on the classpath. Just the standard OTel pieces (micrometer-tracing-bridge-otel, opentelemetry-exporter-otlp).

In application.yml:

management:
  otlp:
    tracing:
      endpoint: ${JAMJET_API_URL}/v1/otlp/v1/traces
      headers:
        authorization: "Bearer${JAMJET_API_KEY}"
  tracing:
    sampling:
      probability: 1.0

That's it. Spring AI's gen_ai.client.chat spans get serialized as standard OTLP/HTTP-protobuf and posted to a JamJet Cloud project. Demo: jamjet-runtime-java/examples/spring-ai-engram-cloud-demo.

The same pattern works against any OTLP-aware backend. The endpoint URL is the only thing that varies.

The Kotlin Koog one-liner

AIAgent(...) {
    install(OpenTelemetry) {
        setServiceInfo(serviceName = "memory-agent", serviceVersion = "1.0")
        addJamjetCloudExporter()  // reads JAMJET_API_KEY from env
    }
}

About 20 lines wrapping the standard OtlpHttpSpanExporter. Demo: jamjet-runtime-java/examples/kotlin-koog-engram-cloud-demo. We've filed an upstream YouTrack issue proposing this lands in agents-features-opentelemetry-jvm directly.

And for Python / Go folks

The same shape works:

Python:

export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT="https://api.jamjet.dev/v1/otlp/v1/traces"
export OTEL_EXPORTER_OTLP_TRACES_HEADERS="authorization=Bearer ${JAMJET_API_KEY}"

Plus pip install opentelemetry-instrumentation-anthropic (or -openai, -langchain, etc.) and a one-line instrument() call. The OpenInference and OpenLLMetry projects each ship instrumentations for the major frameworks.

Go:

Standard otelhttp for the LLM client wrapper, plus otlptracehttp.New(...) configured to point at /v1/otlp/v1/traces. The instrumentation surface is younger but moving fast.

The point: once your framework speaks gen_ai.* OTel, the only language-specific code you write is the exporter setup. And that's stock OTel boilerplate, not vendor-specific.

What you get on the other side

The interesting question, which generic OTel backends won't help you with, is what the receiver does with these spans. The signals an agent owner actually wants:

Multi-agent network graph — every cross-agent call is a node, edges show who-called-whom with cost and latency rolled up per edge. (W3C traceparent plus a jj tracestate segment links agents across HTTP hops.)
Cost rollups per agent / model / end-user — computed server-side from gen_ai.usage.*_tokens against current vendor pricing. No pricing table in your app.
Failure-mode pie chart — typed exception classification, not just "HTTP 5xx" buckets.
Cross-agent identity — when a user request fans out across three agents, you see the same end-user-id stitched across all of them.
Policy enforcement + audit export — Ed25519-signed JSON+CSV+PDF audit packages for the SOC2-ish surface, OTLP-formatted exports for SIEM tools (Splunk, Datadog Logs).

If "the safety layer behind your AI agents" is something you've been trying to articulate to your CTO, that's the shape we're building for.

Why this architecture is the durable bet

Three reasons stock-OTel-plus-LLM-aware-backend wins over vendor SDKs over the next two years:

No SDK on the classpath / requirements.txt / go.mod. The exporter is already in your app for HTTP tracing — you change one URL.
Backend-portable. Every line in those demos works against Honeycomb, Tempo, Jaeger, Datadog, or a self-hosted OTel collector. That's a real CTO-pitch argument when "vendor lock-in" comes up in the procurement review.
The frameworks are doing the work. Spring AI's Observation handlers, LangChain4j's listeners, Koog's OpenTelemetry feature, Python OpenLLMetry instrumentations — these aren't vendor projects. They're the framework's own contracts. Every release brings new signals for free.

Try it

Demos: github.com/jamjet-labs/jamjet-runtime-java
Cloud sign-up: jamjet.dev (free tier)
Spring Boot starter on Maven Central (for users who want a richer in-process path than stock OTLP): dev.jamjet:jamjet-cloud-spring-boot-starter:0.2.0

If you're at Devoxx this week and want to see this running against a real Spring AI or Koog agent, drop me a line — happy to do a five-minute hallway walkthrough. If your stack is Python or Go and you'd like the equivalent demo for your language, that's the next post — let me know in the comments which framework so I can pick the right starting point.

If you've got opinions on what AI-agent observability should look like, especially the bits I've glossed over (multi-tenancy, on-prem, BYO collector), the comments are open.