Python 3.15 Preview: Upgraded JIT Compiler

Python 3.15 sharpens the experimental Just-In-Time (JIT) compiler. It adds a new tracing frontend, basic register allocation, in-place int and float operations, tighter machine code, and more. In this tutorial, you’ll learn how to enable and benchmark the 3.15 JIT on your own machine and explore what makes it faster.

By the end of this tutorial, you’ll understand that:

Python 3.15 improved the experimental JIT compiler that you opt into with PYTHON_JIT=1.
The new tracing frontend records actual bytecode paths instead of estimating them, and the optimizer covers a wider set of operations.
Basic register allocation and tighter codegen reduce stack traffic and memory use.
Reference-count elimination and in-place ops speed up float-heavy numeric loops.
The official pyperformance suite reports an 8–9 percent geometric mean speedup on x86-64 Linux.

First, you’ll set up a Python 3.15 build with the JIT compiled in and run a quick benchmark to see the speedup on your own machine. Then you’ll walk through each optimizer upgrade and verify the numbers against the official pyperformance suite. The JIT is still experimental, so treat the numbers as a snapshot of where things stood in the 3.15 beta rather than a guarantee for the final release.

Get Your Code: Click here to download the free sample code you’ll use to benchmark the JIT and measure the speedup on your own machine.

Take the Quiz: Test your knowledge with our interactive “Python 3.15 Preview: Upgraded JIT Compiler” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python 3.15 Preview: Upgraded JIT Compiler

Check your understanding of the Python 3.15 JIT compiler, from enabling and benchmarking it to the optimizer upgrades that make your code faster.

Set Up a Python 3.15 Build With the JIT Compiler

To follow along, you’ll need a pre-release of Python 3.15 with the JIT compiled in. At the time of this writing, the latest pre-release is 3.15.0b2. Once you have a JIT-capable build, you’ll opt into the JIT at runtime by setting the PYTHON_JIT environment variable to 1.

The JIT has shipped in the official Python installer on Windows and macOS since Python 3.14, so the Windows and macOS installers from python.org include it without any manual configuration step.

A handy path to a JIT-capable build is to use pyenv, which accepts the JIT configure flag through an environment variable. If you have pyenv installed, then you can run the following commands:

$ exportPYTHON_CONFIGURE_OPTS="--enable-experimental-jit=yes-off"
$ pyenvinstall3.15.0b2

With these commands, pyenv builds and installs Python 3.15.0b2 with the JIT included but switched off by default with the yes-off value. You can enable the JIT manually with PYTHON_JIT=1. If you’ve never installed a Python pre-release before, the guide How Can You Install a Pre-Release Version of Python? can walk you through the process.

Note: If you’d rather build CPython yourself, then you can pass --enable-experimental-jit=yes-off directly to ./configure and run make:

$ ./configure--enable-experimental-jit=yes-off
$ make-j

You’ll need a matching version of the LLVM compiler infrastructure installed locally because the build uses it to generate JIT stencils. The 3.15 build pins to LLVM 21.

With the 3.15.0b2 build in place, you can confirm that the JIT is active by checking sys._jit.is_enabled() with and without PYTHON_JIT set:

$ pyenvshell3.15.0b2

$ python-c"import sys; print(sys._jit.is_enabled())"
False
$ PYTHON_JIT=1python-c"import sys; print(sys._jit.is_enabled())"
True

In this example, you run the same one-liner twice—once without the environment variable and once with it. The sys._jit namespace is an experimental implementation detail signaled by the leading underscore, and it may change between minor releases. Treat any code that reads it as throwaway debugging code rather than production introspection.

If the command with PYTHON_JIT=1 still prints False, then the JIT wasn’t compiled into this build. Recheck your configure flags or fall back to the python.org installer.

Before you spend time building 3.15, it’s worth knowing that the JIT’s long-term future is under active discussion: the Python Steering Council recently paused new JIT development. The compiler still ships in 3.15 and keeps receiving bug and security fixes, and you’ll find the full story in Decide Whether to Enable the JIT at the end of this tutorial.

Remove ads

Run a Quick JIT Benchmark

Before diving into what’s new in 3.15’s JIT, you can see the speedup in action. The following quick_bench.py script runs a tight, float-heavy loop wrapped in timeit, then prints whether the JIT was enabled and the average execution time per repeat:

"""Quick benchmark: compare CPython 3.15 with the JIT off vs on.

Run twice against the same 3.15 build and compare the two timings:

    PYTHON_JIT=0 python quick_bench.py
    PYTHON_JIT=1 python quick_bench.py
"""

importsys
fromtimeitimport timeit

ITERATIONS = 20_000_000
REPEATS = 5

defworkload():
    x = 1.0
    for _ in range(ITERATIONS):
        x = x * 1.0001
    return x

defjit_enabled():
    jit = getattr(sys, "_jit", None)
    return bool(jit and jit.is_enabled())

defmain():
    seconds = timeit(workload, number=REPEATS) / REPEATS
    label = "JIT on" if jit_enabled() else "JIT off"
    print(f"{label}: {seconds:.2f} s")

if __name__ == "__main__":
    main()

In this script, you wrap workload() in timeit() and average across REPEATS runs. The function is intentionally minimal and runs a tight loop that multiplies a float by a small constant. That operation is where the new JIT’s float optimization pays off, because the JIT can reuse the same float object across iterations instead of allocating a fresh one each time.

Note: A tight loop is a loop that performs minimal work on each pass and carries almost no overhead between iterations. The CPU spends nearly all its time running the loop body over and over.

You can run the script twice against the same Python 3.15 build, toggling PYTHON_JIT each time:

$ PYTHON_JIT=0pythonquick_bench.py
JIT off: 1.54 s

$ PYTHON_JIT=1pythonquick_bench.py
JIT on: 0.96 s

You have two numbers from the same machine and the same build: JIT off versus JIT on. In this example, dividing 1.54 by 0.96 gives about 1.6, so the JIT-on run is roughly 1.6 times faster than the JIT-off one. Cool!

Click the Run button below to watch the JIT-off and JIT-on timings race through the same loop:

Note that this is a best-case workload because the loop is small, tight, and numeric. Most real-world Python code looks nothing like this, so the speedup you see will land much closer to the pyperformance geometric mean of 8–9 percent than to the 1.6-times speedup you just measured.

Understand the JIT-Powered Speedup

Now that you’ve seen the JIT speed up a tight loop, you can explore what’s actually doing the work in 3.15. Python 3.13 introduced the JIT as a stripped-down proof of concept. Python 3.14 widened the platform support and shipped it in the official installers. Python 3.15 is the release where the optimizer itself gets serious upgrades.

The optimizer is the part of the JIT that reworks the code it records to make it run faster. Improving it speeds up many different kinds of programs. The Upgraded JIT compiler section of the Python 3.15 documentation lists each change with its tracking issue.

Note: If you want a deeper look at the interpreter all of this builds on, Real Python’s CPython Internals book walks through the compiler, the bytecode evaluation loop, and the memory model underneath.

The reported aggregate performance gains, all preliminary at 3.15 beta, are the following:

About 8–9 percent geometric mean over the standard CPython interpreter on x86-64 Linux
About 12–13 percent geometric mean over the tail-calling interpreter on AArch64 macOS

Benchmark suites summarize results with a geometric mean rather than a plain arithmetic average because each benchmark contributes a speedup ratio. The geometric mean is the right way to average ratios, since it keeps a single extreme result from dominating the overall figure.

The 3.15 documentation is explicit that these numbers may shift before the final 3.15 release in October, so it’s best to treat them as a moving target.

In the following sections, you’ll walk through each user-visible upgrade that 3.15 has made to the JIT compiler, grouped into three areas: the code the JIT can trace, the machine code it emits, and the redundant work it now skips.

Remove ads

Trace More of Your Code

Before the optimizer can speed anything up, the JIT has to trace your code and keep it in compiled form, so the more bytecode and control flow the frontend follows, the less often your program drops back to the slower interpreter loop. The following two upgrades widen what the JIT can capture in the first place.

A New Tracing Frontend

The JIT compiler now has a tracing frontend that records the execution paths through code instead of estimating them as previous versions did. That brings a much wider set of bytecode and control flow under JIT optimization, including simple Python object creation and partial support for overloaded operations and generators.

This work is tracked in the trace-recording JIT issue.

Ken Jin’s write-up on the JIT plan reports a modest geometric mean with a wide best- to worst-case spread:

“Roughly 1.5% faster geometric mean average on pyperformance. 100% faster on the most improved benchmark (richards), and 15% slower on the slowest benchmark.”

The wins and losses cancel out to a small overall gain today, but the rewritten frontend is the change that unlocks the rest of the optimizations in this section. You’ll feel it most in code that mixes object creation, generators, or overloaded operators with hot loops—exactly the code patterns the 3.14 frontend couldn’t trace through.

Extended Bytecode Operation Support

Separate from the frontend rewrite, the JIT optimizer now covers more micro-operations (uops) than it did in 3.14. A uop is one of the smaller, lower-level steps that CPython breaks each bytecode instruction into, so the optimizer has finer-grained pieces to work with. More of what the interpreter executes stays in JIT-compiled machine code instead of returning to the interpreter on an unsupported uop.

You can follow this upgrade in the better uop coverage issue.

This pays off most in hot loops that use operations the 3.14 JIT had to bounce back to the interpreter for. Each uop the optimizer now handles is one fewer place where the trace falls back to the slow interpreter path.

Emit Tighter Machine Code

The following two upgrades improve the quality of the machine code the JIT produces, rather than how much code it covers. One determines where values live, in fast CPU registers or slower memory. The other sharpens which instructions get emitted, so each compiled trace ends up smaller and quicker to execute.

Basic Register Allocation

The JIT optimizer now performs a basic form of register allocation, which lets it skip certain stack operations and operate on CPU registers instead. In 3.14’s JIT, each operation in a trace wrote its result to the in-memory evaluation stack for the next operation and read its inputs back from the stack, making a register-to-memory-to-register round trip on every step.

With the new register allocation, values stay in CPU registers across consecutive operations, so the trace doesn’t have to touch memory between them. The diagram below contrasts the two approaches:

The register-allocation work is tracked in the top-of-stack caching issue.

The overall gain is small for now, but numeric workloads already show a noticeable speedup:

“Roughly a 0.5% geometric mean speedup on pyperformance, with the highest speedup on nbody at 16%.” (Source)

That figure stays modest because only a handful of bytecodes have been ported to use the new allocator. The visible win shows up in tight numeric inner loops, like those in physics simulations and signal-processing kernels, where stack traffic dominates the trace.

Remove ads

Better Machine Code Generation

The JIT now emits tighter machine code on both x86-64 and AArch64. The change reduces the memory footprint of generated code and improves general code quality. This is distinct from register allocation, which decides where values live rather than what instructions get emitted.

Two pull requests track this change: macOS performance and AArch64 assembler optimization.

Smaller machine code puts less pressure on the instruction cache. That benefit compounds when a workload has several hot traces sharing the same cache space.

Skip Redundant Work

A hot loop runs the same code over and over, so the interpreter does the same work on every pass. The three upgrades in this section let the JIT skip the following repeated work: recomputing values that never change, updating reference counts it doesn’t need to, and, where it’s safe, creating a new object for every result.

More Constant Propagation

The JIT now performs more constant propagation. When the optimizer detects that user code produces constant values, it can simplify the dependent code along the trace instead of recomputing on every iteration.

This optimization is tracked in the automatic constant propagation issue.

You’ll see the benefit most in code with constant configuration values, scaling factors, or arithmetic on literal constants inside inner loops, where the same computation runs every iteration but produces the same answer.

Fewer Reference Counts

CPython tracks object lifetimes with reference counts, and every operation that touches an object normally has to bump and decrement them. The 3.15 JIT now skips those updates whenever it’s safe to do so, which reduces the cost of most operations across the board.

You can follow the details in the issue on eliminating redundant refcounting.

Ken Jin reports a small but real improvement on the microbenchmark side:

“About 6% improvement on microbenchmarks such as nbody.” (Source)

In Python 3.14, the interpreter already skipped 90 percent of LOAD_FAST refcount updates, which gave a 2–3 percent speedup on general Python code. The 3.15 JIT now does the same inside compiled traces, so any tight loop that touches Python objects benefits, and numeric loops see the biggest gains.

In-Place Operations on Integers and Floats

By tracking unique references to objects, the JIT optimizer can now perform in-place operations on int and float values. Instead of allocating a fresh object for each arithmetic result, the JIT updates the existing object in place when nothing else holds a reference to it.

This work spans three issues:

Unique reference tracking in Tier 2 for the infrastructure
In-place float operations for floats
In-place int operations for ints

Float-heavy numeric loops are the most visible winners. That’s exactly the shape of workload() in the quick_bench.py script you ran earlier, where the tight x = x * 1.0001 loop updates one float object in place instead of allocating a new one each pass.

Remove ads

Benchmark With the Official `pyperformance` Suite

The quick benchmark at the beginning of this tutorial gave you an idea of the speedup on a tight numeric loop. For a fuller picture across realistic workloads, you can point your build at the canonical pyperformance suite, which is pip-installable.

Note that the full suite will take 30 to 60 minutes to complete on an average laptop. If that’s too long to wait, then skip ahead to the end of this section, where you’ll find live results from the core developers’ continuous benchmarking dashboard.

You can install pyperformance into your 3.15 environment and run it once with the JIT off and once with it on, saving the raw results to JSON:

$ python-mpipinstallpyperformance
$ PYTHON_JIT=0pyperformancerun-ojit-off.json
$ PYTHON_JIT=1pyperformancerun-ojit-on.json
$ pyperformancecomparejit-off.jsonjit-on.json

In this example, you install pyperformance, capture two JSON files with the timing data, and ask pyperformance compare to diff them. The compare step prints a table of per-benchmark deltas and a geometric mean at the bottom. That geometric mean is what the core developers quote when they report the 8–9 percent figure.

Note: You don’t have to install pyperformance into the build you’re testing. You can install it once in any Python environment and point it at the interpreter you want to benchmark with the -p option:

$ PYTHON_JIT=0pyperformancerun\
-p~/.pyenv/versions/3.15.0b2/bin/python-ojit-off.json

The pyperformance tool spins up an isolated virtual environment for each benchmark anyway.

If you’d rather skip the local run entirely, then the same data is published in two places:

doesjitgobrrr.com hosts a live dashboard with per-day runs across hardware and build configurations.
faster-cpython/benchmarking-public commits raw pyperformance JSON, Markdown comparison tables, and violin plots straight into the repository.

Both sources stay current with the in-development 3.15 builds, so you can see how the numbers move week to week as the optimizer changes land.

Decide Whether to Enable the JIT

Now that you’ve seen the optimizer upgrades and the benchmarks, the practical question is whether to turn the JIT on for your own workloads. The table below summarizes where the JIT helps the most:

Use Case	JIT Benefit	Reason
CPU-bound hot loops with predictable types (numerics, parsers, encoders)	Strong	The new optimizer upgrades target exactly this shape of code.
I/O-bound code such as web requests and async network handlers	Marginal	Most time goes to waiting on the network, not to the inner Python loop.
NumPy or Cython workloads	Minimal	The numeric work already runs in compiled C, so the JIT has little Python code to optimize.

As a pragmatic rule of thumb, turn the JIT on for CPU-bound batch jobs, and leave it off for production web services until the optimizer stabilizes.

The JIT is still flagged as experimental, the numbers still shift between pre-releases, and the sys._jit interface is explicitly marked as non-public. None of that is a reason to avoid the JIT, but it’s a reason to keep a PYTHON_JIT=0 rollback handy.

Note: Shortly after the 3.15 beta, the Python Steering Council paused new JIT development on CPython’s main branch. The JIT stays in 3.15 and will still receive bug and security fixes, but no new features will land until a PEP resolves open questions about its maintenance, compatibility, and long-term future.

If no PEP is accepted within six months, then the JIT code may be removed from the main branch, and any further work on it would continue outside CPython.

The pause halts new optimizer work, not the JIT you already have—everything you measured here still runs in 3.15 and keeps receiving fixes. The open question is how much further JIT development goes, not whether it works today.

Conclusion

Python 3.15 adds another round of JIT improvements, this time focused on the optimizer itself. A new tracing frontend, basic register allocation, tighter codegen, more constant propagation, fewer reference counts, and in-place numeric ops all stack together. The result is an 8–9 percent pyperformance geometric mean on x86-64 Linux and 12–13 percent on AArch64 macOS, with float-heavy loops seeing much more.

In this tutorial, you’ve learned that:

The Python 3.15 JIT is opt-in, and you turn it on by setting PYTHON_JIT=1.
The tracing frontend records real execution paths, and the optimizer handles more bytecode operations than before.
Register allocation keeps values in CPU registers, and better codegen trims the emitted machine code.
Reference-count elimination and in-place arithmetic give the biggest gains on float-heavy loops.
On x86-64 Linux, the official pyperformance suite measures an 8–9 percent geometric mean speedup.

The JIT is still experimental and a moving target until the 3.15 final release. However, if you have CPU-bound Python code, then this is a good release to start benchmarking your own workloads. Real Python’s Profiling Performance in Python course can help you find the functions that dominate your runtime, so you know exactly where the JIT will help most.

Get Your Code: Click here to download the free sample code you’ll use to benchmark the JIT and measure the speedup on your own machine.

Remove ads

Frequently Asked Questions

Now that you have some experience with the upgraded Python 3.15 JIT, you can use the questions and answers below to check your understanding and recap what you’ve learned.

These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each question to reveal the answer.

Set the PYTHON_JIT=1 environment variable when running a JIT-capable build. The python.org installers ship with the JIT compiled in on Windows and macOS, while Linux builds need the --enable-experimental-jit=yes-off configure flag.

No. The JIT is still flagged experimental and stays off unless you opt in with PYTHON_JIT=1. The flag may change once the JIT stabilizes in a future release.

The official pyperformance suite reports an 8–9 percent geometric mean speedup on x86-64 Linux and 12–13 percent on AArch64 macOS. Tight numeric loops can see much larger gains, while I/O-bound code sees almost none.

Probably only marginally. Web request handlers spend most of their time waiting on the database, the network, or the template engine, so the JIT’s gains on inner Python loops don’t move the overall response time much.

The JIT and free threading are independent build options that can be combined, but their combined performance is still in its early days. Check the build flags on your installation to confirm which features are active.

Take the Quiz: Test your knowledge with our interactive “Python 3.15 Preview: Upgraded JIT Compiler” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python 3.15 Preview: Upgraded JIT Compiler

Check your understanding of the Python 3.15 JIT compiler, from enabling and benchmarking it to the optimizer upgrades that make your code faster.

Python 3.15 Preview: Upgraded JIT Compiler

Set Up a Python 3.15 Build With the JIT Compiler

Run a Quick JIT Benchmark

Understand the JIT-Powered Speedup

Trace More of Your Code

A New Tracing Frontend

Extended Bytecode Operation Support

Emit Tighter Machine Code

Basic Register Allocation

Better Machine Code Generation

Skip Redundant Work

More Constant Propagation

Fewer Reference Counts

In-Place Operations on Integers and Floats

Benchmark With the Official pyperformance Suite

Decide Whether to Enable the JIT

Conclusion

Frequently Asked Questions

Benchmark With the Official `pyperformance` Suite