Introduction
- In this post I’ll discuss the differences between statically and dynamically typed languages, interpreted and compiled languages, what the languages in each set have in common. I’ll also talk about JIT compilation, and in the end compare the V8 engine developed by Google, used in Chrome and Node.js to run JavaScript, and the Java Virtual Machine (JVM).
Concepts before the comparison
Programming languages classification about typing
- Dynamically typed: types are determined at runtime.
- Statically typed: types are checked at compile time, before execution.
-
Strongly typed: the language strictly enforces type rules and limits implicit type conversions.
- Strongly typed isn’t a formal term in CS. It is more of a spectrum describing how strict a language is with type rules. Languages with fewer implicit conversions are considered “stronger,” while those with more relaxed conversions are considered “weaker.”
- C is statically typed but generally considered less strictly typed compared to Java due to more permissive casts and lower-level type operations.
- Given this, we could say that Python is stronger than JavaScript when talking about typing, since JavaScript does multiple uncommon type coercions such as
1 - "4" = -3
Programming languages classification about execution model
- Compiled language: also called ahead-of-time (AOT) compilation, is when the source code is translated into machine code or some lower-level code before execution.
- Interpreted language: source code is directly executed by another program.
General characteristics of each language based on their "classification"
- Generally, compiled languages have characteristics such as static typing, strict type checking rules, and fewer type coercions. On the other hand, interpreted languages are generally dynamically typed.
- Furthermore, compiled languages are generally faster than interpreted ones because interpreted languages add runtime overhead by doing type checks (dynamic typing) and other runtime checks that are done ahead of time in compiled languages. There are also other compile-time strategies to achieve performance optimizations.
- There are advantages for both statically/dynamically typed languages:
- compiled and statically typed language: robustness because of type checks, performance.
- interpreted and dynamically typed language: productivity because of dynamic types and also portability, since it doesn’t have to be compiled for each architecture, if it contains the interpreter, it will run.
Hybrid model (Just in time compilation - JIT):
- Today it is hard to give a strict binary classification for a language’s execution model (compiled or interpreted). We generally have a “hybrid” approach; even languages such as JavaScript that were historically interpreted are not interpreted anymore.
- Reference: NodeJS Documentation
- JIT is a technique to implement the execution model of a language and achieve better runtime speed. Instead of simply compiling or interpreting, with JIT compilation we first interpret the language (JVM bytecode in the case of Java, V8 bytecode in the case of JavaScript) and analyze its initial execution. After that, we can identify hot paths, most-called functions, and the values they are called with. With this information, it compiles these specific parts of the code to machine code to achieve better performance (the JVM does it, V8 does it).
- This is a good example of JIT performance improvement: Just In Time (JIT) Compilers - Computerphile (watch time 5:49)
Virtual Machine
-
Virtual machine: A software that acts as an emulator of a physical computer system.
- System virtual machines: Replicates all that is required to run an etire system, with its own OS.
- Process virtual machine (where JVM is located): Also called application virtual machine, because it is used as a single process in the OS that provides a high level abstraction to run compatible languages in any computer architecture since the Process VM abstracts the compilation process to that specific computer architecture. One example is the JVM, in which the developer don’t need to worry about the underlying hardware platform your code is running - providing portability (runs anywhere).
What is the Java Virtual Machine (JVM)
- The technology responsible for making Java both OS and hardware independent by abstracting specific computer architecture machine code.
- An abstract computing machine, with its own instruction set (like a real machine that contains move, load, add…). The JVM knows nothing about the Java programming language; actually, it understands only the
class file format, which contains JVM instructions (bytecodes), a symbol table, and more auxiliary information. - The JVM contains a public spec of its bytecode (the class file format -
example.class), and this JVM bytecode is generated byjavacduring the compilation of Java source code. - Note: Since it contains a public spec of its bytecode and is intended to guarantee long-term support, other languages can also compile to JVM bytecode and enjoy the advantages of not worrying about the underlying computer architecture, having much more portability than with AOT compilation.
- Image reference: Wikipedia
What is V8 JavaScript Engine
- An engine that executes ECMAScript and WebAssembly specifications and handles memory allocation, garbage collection, and more. It contains AST parsing logic, an interpreter (Ignition), an internal bytecode, JIT compilers, and more specific logic to run the source code.
-
Why is V8 an engine and not a process VM like the JVM? The reason is that, differently from the JVM, which contains a public spec for the bytecode it runs (generated by
javacand other compilers), V8 takes source JavaScript code (not lower-level code like bytecode) and uses Ignition to generate its bytecode. It doesn’t contain a public spec of its bytecode and doesn’t guarantee long-term support for it. Although V8 works similarly to the JVM to execute code, it contains important differences.- JVM: abstract definition of a machine that contains its bytecode spec, instruction set, and more.
- V8: real implementation of the execution model of ECMAScript and WebAssembly specs.
Simplified execution model
Javascript with V8
- image reference: Franziska Hinkelmann
JS source code -> AST -> Ignition (interpreter) compiles to bytecode -> Ignition executes its bytecode -> JIT hot paths with Turbofan or Crankshaft
image reference: V8 documentation
Java with JVM
So why people say JS is interpreted and Java is compiled if in practice both use JIT?
- Although we saw that languages actually use a hybrid approach, we may ask ourselves: “Why do people call these languages interpreted or compiled if in reality it is not even true anymore?”
- Because they actually have strong reasons to do it.
1st: Traditional definitions vs real execution model
-
Code runs without an explicit compilation step = interpreted.
- In this case, JS source code is executed by V8, which first compiles the code into Ignition bytecode, then runs it through an interpreter, and uses JIT to compile JS hot paths (most used code) into machine code and execute them (hybrid approach). Even though it is a hybrid approach, the fact that execution starts immediately in a runtime is why we put it in the “interpreted languages bag.”
-
Code is translated to a lower level before execution = compiled.
-
Java is compiled to bytecode using
javac, and then the bytecode is executed in the JVM, which uses an interpreter and also JIT to compile hot paths into native machine code. It is also a hybrid approach because it runs bytecode and compiles hot paths to native machine code, but since there is a clear step where the source code is compiled into bytecode before execution, we put it in the “compiled languages bag.”
-
Java is compiled to bytecode using
2nd: Historical design of these languages (Java and JS)
- Because they were historically created like this, with these “style decisions” (compiled or interpreted) reflected in their syntax and way of working.
- For example, Java was designed as a statically and strongly typed language with a clear compilation step, focused on large-scale systems where correctness and structure before execution are important.
- JavaScript, on the other hand, was created as a lightweight scripting language for the browser. It is dynamically typed, making it more flexible and permissive, which was ideal for quick scripting and web interactivity, even if it can introduce more runtime ambiguity and errors.
- JavaScript is extremely permissive, allowing users to do unusual things to avoid breaking browser web pages (type coercion is a good example).
Which nomenclature to use for JS and Java and other languages (interpreted or compiled)?
- Based on what I pointed out, the traditional nomenclatures (JS interpreted and Java compiled) are good choices.
- But it’s important to understand how their execution model really works to achieve modern day performance, and also that we don’t stand still when it comes to programming languages’ execution models.
- Note: Obviously, there are “pure” compilers or interpreters without this hybrid approach. GNU Compiler Collection - GCC, for example, is an ahead-of-time (AOT) compiler for C and other languages.
References
- https://docs.oracle.com/javase/specs/jvms/se26/html/jvms-1.html
- https://docs.oracle.com/cd/E57471_01/bigData.100/extensions_bdd/src/cext_transform_typing.html
- https://v8.dev/docs
- https://nodejs.org/learn/getting-started/the-v8-javascript-engine
- https://www.youtube.com/watch?v=d7KHAVaX_Rs
- https://medium.com/dailyjs/understanding-v8s-bytecode-317d46c94775
- https://www.youtube.com/watch?v=p-iiEDtpy6I
- https://en.wikipedia.org/wiki/Virtual_machine