What is symbolic execution you might ask, and I would say execution of a symbolic nature.

This is the first tutorial in my series on Symbolic Execution.

Basic Theory
Before we really dive into symbolic execution I want to define some terms that I will be using.

Basic Block:
A basic block is a sequence of instructions that will _always_ execute together (excluding some 'exception' such as a divide by zero might trigger some exception handling behavior etc etc, don't consider that a branch.) Basically, there is no branching between the set of instructions. So for example, the below snippet of code is ONE basic block:
Code:
push eax
mov eax, 10
call bobs_shitty_subroutine
sub eax, 3
int 3
BUT the below code has three basic blocks (as denoted by instruction grouping):
Code:
push eax
sub eax, 2
jz is_zero

sub eax, eax
push eax

is_zero:
pop eax
int3
Constraint:
A constraint is essentially a limit or rule that is observed when we assert a particular fact. So for example, if we assert that x > 3: we can say that there is a constraint on x that says it must be greater than 3.

Additionally, if we assert that fact that x + y = 4, there is a constraint on x that, when added to y it must be 4. Likewise, we also have a constraint on y of similar nature.

From a collection of constraints, we can derive possible solutions. For example, if we observe the following facts about x:
1. x is an integer
2. x > 0
3. x < 10

We can derive from these constraints a set of possible solutions, which are {1,2,3,4,5,6,7,8,9}

A variable without many or sufficient constraints can have a virtually infinite number of solutions (I say virtually because in instance of, for example, a 32 bit integer, there are ofcourse bounds)

Requirements
Before we get started, you will need to get some things setup on your system. If you already have Python 3 setup and know how to create a venv and install libraries with pip, then you don't need to read this. Just create a python3 venv and install angr (pip install angr)

For the rest of anyone reading this:
1. I am going to be using debian linux in these examples So you need to have access to either a linux machine or (if on windows) you can enable the Ubuntu Subsystem for Windows (learn how to do that here: https://docs.microsof*****m/en-us/win.../install-win10 )

Now, open your terminal window and type the following:
0. "sudo apt update"
0.5. "sudo apt install build-essential"
1. "sudo apt install python3 python3-venv python3-dev" < This will install python3 & python3 venv
2. "mkdir ~/phones/"
3. "cd ~/phones/"
4. "python3 -m venv venv"
5. "source ./venv/bin/activate"

And thats it; I suggest you read up on venvs and how they work here, it is very important as a python programmer to understand venvs: https://docs.python.org/3/library/ven*****ml

6. Now install the "angr" package. Use the command "pip install angr"

7. Annd you are all setup good job. Remember, if you return to your terminal and don't see (venv) prefix, you need to reactivate the venv before you run any of your python files (step 5.) (Read up on venv using the provided link above if you want to know why...)


What is a Z3 solver?
As a precursor to this section, I suggest you read: https://docs.angr.io/core-concepts/solver

Remember in high-school or university when you did algebra? You were given a system of equations and were then asked to solve for possible solutions or constraints on the input. For example

x + y = 10
x - y = 8

This is a simple system of equations, we can solve it to identify that {y = 1, x = 9}

This is basically all that a Z3 engine does - but it supports various types that computers commonly work with. They are all derived from bit-vectors, which is just a fancy name for "collection of bits." Anyways, you don't really need to work with the Z3 solver directly in most cases since our "system of equations" will be the code we "symbolically execute" rather than mathematical equations.

Basic Example using a Z3 Solver
How do we solve systems of equations in programming? Well, if you've ever tried, you probably used a matrix with some identities, calculating the inverse etc.

Now, I will show you how the cool kids do it using a Z3 solver.

Lets explore the Z3 solver a little bit more using claripy, which is an abstraction layer for Z3 solvers, it comes with angr.

From the terminal you setup in the previous step (make sure there is a venv prefix) spin up a python interactive terminal by typing the command "python"

Next, run the following code:
Code:
import claripy
x = claripy.BVS("x", 32)
y = claripy.BVS("y", 32)

s = claripy.Solver()
s.add(x + y == claripy.BVV(10, 32))
s.add(x - y == claripy.BVV(8, 32))

s.batch_eval([x, y], 100)
Okay, so the first two lines (excluding the import) is just create two symbolic variables (BVS = Bit Vector Symbolic.) The first parameter is just a name we give them, the second is the number of bits. It is important to note, these values are unsigned bit vectors. They cannot store floating point numbers. Basically, this two lines are saying "Here are two variables, I don't know what is in them, but they are 32-bit unsigned integers."

The next thing we do is instantiate a solver. A solver will basically take (as input) a series of constraints and spit out possible solutions. In this case, we add two constraints. We say x + y is equal to a 32bit BVV (Bit Vector Value) of 10. And we add another constraint as well.

Next, we ask the solver to evaluate x & y - which means, try to solve these variables given the provided constraints. Where does "100" come from? 100 is the number of possible solutions we want the solver to find. The reason we specify a limit is that in some instances, the number of solutions could be infinite.

In this case, there are two solutions for x & y. Now, you math instructor will probably disagree which is fine, but they would be mistaken. Since we are dealing with Bit Vectors, we must acknowledge the effect of wrap-around. If you take a 32 bit unsigned int that is equal to 0xFFFFFFFF and you add 1, what do you get? 0, with an overflow flag set in a CPU register somewhere ofcourse.

The solutions claripy found are:
Code:
[
    (9, 1), 
    (2147483657, 2147483649)
]
So the solutions are {x=9, y=1} and {x=2147483657, y=2147483649}. The second solution is from the overflow nature of unsigned ints (cool story, this is actually how people use symbolic execution to invoke unexpected behavior and utilize exploits in software.)

What is symbolic execution
A symbolic executor BUILDS onto a Z3 solver. A symbolic execution engine will derive constraints from the code it executes. Each step along the way, defining more and more constraints.

Imagine we have the following
Code:
mov eax, ecx
add eax, 10
sub eax, 3
cmp eax, 0
je block_b:
block_a:
...
ret
block_b
...
ret
At the end of executing the code up-until the branch (the first basic block), the constraint on the eax register will be that it must be 7 more than ECX.

Conclusion
I will end this first section with an important description snippet of information; please tune in for the next tutorial in the series.

When people write programs, the encode the desired behavior INTO code by using tools like a compiler. We can essentially decompose a PROGRAM into a series of possible STATES. Symbolic execution lets us try and examine how we get INTO a state. What are the REAL constraints of entering a state? If you enter a state in unexpected circumstances, you will more or less unlock an EXPLOIT or learn lots about how a program works without having to read the code or sift through obfuscation.

Challenge Question
When we look at the system of equations:

x + y = 7
x - y = 4

We can try to solve for these variables using the following code:
Code:
import claripy
x = claripy.BVS("x", 32)
y = claripy.BVS("y", 32)

s = claripy.Solver()
s.add(x + y == claripy.BVV(7, 32))
s.add(x - y == claripy.BVV(4, 32))

s.batch_eval([x, y], 100)
But, we get the error:
Code:
claripy.errors.UnsatError: unsat
Why is that?