In this article, I will show you how to write a minimal, bare-bones, x86-64JIT compiler in C++ that runs on macOS, Linux and could potentially run on Windows through WSL.
For our purposes, JIT compilation is a technique through which a program generates machine code at runtime, based on the user input. A C++ program is AOT (ahead of time) compiled, which typically means that once the original code was compiled for a particular machine it can’t be changed at runtime (and from a security point of view this is a desirable feature). A simple, useful application, of a C++ JIT compiler is on the fly compilation of a new function that is based on other functions already defined in the original code.
Let’s start with an even simpler example. Write a C++ program that asks the user for his name and generates, at runtime, a function that simply prints a greeting. While not a very practical program (you really don’t need to compile this to a separate function), this example will exemplify how to create and execute code at runtime.
Since machine code is inherently non portable between various processors, we need to chose a particular processor to run our example. For this article, we’ll use the x86-64 Intel processor as a our target machine. Even more restrictive, machine code that runs on an operating system is not portable between various operating systems. For example, if your machine runs both Windows and Linux, a piece of code compiled for Linux won’t run on Windows and vice-versa, not without some translation layer. As the target OS, we’ll use Linux, but the code should be trivial to port to macOS and I will show you how to do it. In theory, you should be able to follow along on Windows 10 if you use the Windows Subsystem for Linux (WSL).
You can find the complete source code for the next examples on the GitHub repo for this article.
All the programs from this article were checked with GCC 5.4 on Linux, Apple Clang 9.0.0 and GCC 7.2 on macOS. Example, on macOS:
Note, if you want to use C++17, check my article about compiling GCC 7.2 on macOS or, if you prefer to use Clang, pass -std=c++1z to the compiler.
So, what we want is to generate machine code for the equivalent C++ code, particularly for the highlighted line, that uses the system write function to print a string:
To better exemplify what we need to do, let’s use the OS write function to print the greeting:
Basically, what we want to do is to generate at runtime a function greeting() that will replace line 14 from the above code:
Let’s write in x86-64 assembly the body of greeting from the above code:
Try to assemble and disassemble. Assuming you saved the above code in a file named chunk.s, this is how you can see the generated code on Linux:
On macOS, the syntax for objdump is a bit different:
This is what I see on my Linux machine (I kept only the disassembled machine code for brevity):
Please note the 0x1f from the first highlighted line, this indicates where you point when you go rip+0xa from the current instruction.
Let’s store the above machine code in a vector of eight bits unsigned integers. For brevity, I will show only the code that is relevant for each step. You can find the complete code, as mentioned, on the GitHub repository for this article.
First line, line 0:, from the above code can be stored as (please note the different system call numbers stored in rax based on the OS):
The remaining code is the same for both systems:
Next, we need to store the message size from index 24 to 27 in the machine code vector:
We can abstract the above piece of code in a separate function:
Now, the message body can be appended to the end of the vector:
At this point, we are done with the code generation part for our toy example. All we have to do now, is to transfer the code to an executable memory region and call it. You can find the complete code of the last example in hello_2.cpp on the GitHub repository for this article.
On Linux and macOS, you allocate memory pages with mmap as a multiple of:
Here is an utility function that will give you an estimate of the required memory for a given generated machine code:
Next, we allocate the required memory size and transfer the generated machine code to the executable memory:
All we have to do at this point, is to get a pointer to the beginning of our executable code and cast it to a function pointer, after which we can use the generated function:
You can find the complete code for the last example on GitHub in file hello_3.cpp . Here is the result of running hello_3.cpp on a macOS machine: