Episode 291: Reassessing the LLM Landscape & Summoning Ghosts
The Real Python Podcast
What are the current techniques being employed to improve the performance of LLM-based systems? How is the industry shifting from post-training towards context engineering and multi-agent orchestration? This week on the show, Jodie Burchell, data scientist and Python Advocacy Team Lead at JetBrains, returns to discuss the current AI coding landscape.
In our last conversation, Jodie covered how LLMs were approaching the limits of scaling laws. This time, we recap last year’s big focus on reasoning models and a post-training method called “reinforcement learning from verifiable rewards” (RLVR). We also cover test-time compute, where models spend more time reasoning through steps and considering multiple approaches to solve a problem.
We touch on Agent Context Protocol (ACP), agent orchestration layers, and context engineering. We also share some concerns about the hype cycle, maintaining all that code being generated, and running local models.
Course Spotlight: Vector Databases and Embeddings With ChromaDB
Learn how to use ChromaDB, an open-source vector database, to store embeddings and give context to large language models in Python.
Topics:
- 00:00:00 – Introduction
- 00:02:02 – Build a Language-Learning Agent course
- 00:02:55 – Update on the past six months of LLMs
- 00:05:32 – Reinforcement Learning From Verifiable Rewards
- 00:07:32 – Test Time Compute
- 00:08:36 – 2025 and the rise of agents
- 00:14:24 – Benchmarks shifting
- 00:15:23 – Andrew Karpathy and jagged intelligence
- 00:19:16 – Not evolving or growing animals but summoning ghosts
- 00:23:34 – Diminishing gains in newer models
- 00:24:23 – Context Engineering
- 00:35:01 – Multi-agent systems and diversity of models
- 00:36:56 – Video Course Spotlight
- 00:38:34 – Current generation of coding agents
- 00:44:00 – Fast vs deep reasoning
- 00:45:18 – Agent Context Protocol
- 00:50:19 – Working through the hype cycle
- 00:55:43 – Open-source contribution pollution
- 00:57:21 – Local models
- 00:58:36 – Rick Beato comparing how the music industry failed
- 01:08:41 – LLMs are an amazing development
- 01:11:33 – Keynote talk on AI summers and winters
- 01:12:45 – PyCon US and EuroPython
- 01:14:11 – Thanks and goodbye
Show Links:
- AI Agent Course - Build a Language‑Learning Agent with OpenAI, LangGraph, Ollama & MCP - YouTube
- Episode #264: Large Language Models on the Edge of the Scaling Laws
- Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
- Reinforcement learning with verifiable rewards (RLVR)
- What is test-time compute and how to scale it?
- Overfitting - Wikipedia
- 2025 LLM Year in Review - karpathy
- Animals vs Ghosts - karpathy
- Agent Context Protocols Enhance Collective Inference
- Open source AI we use to work on Wagtail - Wagtail CMS
- LLMs for Devs: Model Selection, Hallucinations, Agents, AGI – Jodie Burchell - The Marco Show
- Keynote - Can you trust your (large language) model? - Standard error
- The Human-in-the-Loop is Tired
- How AI Will Fail Like The Music Industry - YouTube
- “Yes, AI Is a Bubble. There Is No Question.” - The Ringer
- Keynote: AI is having its moment … again - Jodie Burchell - NDC Copenhagen 2025
- PyCon US 2026
- EuroPython 2026 - July 13th-19th 2026 - Kraków, Poland
- Jodie Burchell (@t-redactyl.bsky.social) — Bluesky
- Standard error