Symbolic Execution and You - Tutorial 1 - Z3 Solvers

**PhoneKeyboard** · 10-16-2020

What is symbolic execution you might ask, and I would say execution of a symbolic nature.

This is the first tutorial in my series on Symbolic Execution.

Basic Theory
Before we really dive into symbolic execution I want to define some terms that I will be using.

Basic Block:
A basic block is a sequence of instructions that will _always_ execute together (excluding some 'exception' such as a divide by zero might trigger some exception handling behavior etc etc, don't consider that a branch.) Basically, there is no branching between the set of instructions. So for example, the below snippet of code is ONE basic block:

Code:

push eax
mov eax, 10
call bobs_shitty_subroutine
sub eax, 3
int 3

BUT the below code has three basic blocks (as denoted by instruction grouping):

Code:

push eax
sub eax, 2
jz is_zero

sub eax, eax
push eax

is_zero:
pop eax
int3

Constraint:
A constraint is essentially a limit or rule that is observed when we assert a particular fact. So for example, if we assert that x > 3: we can say that there is a constraint on x that says it must be greater than 3.

Additionally, if we assert that fact that x + y = 4, there is a constraint on x that, when added to y it must be 4. Likewise, we also have a constraint on y of similar nature.

From a collection of constraints, we can derive possible solutions. For example, if we observe the following facts about x:
1. x is an integer
2. x > 0
3. x < 10

We can derive from these constraints a set of possible solutions, which are {1,2,3,4,5,6,7,8,9}

A variable without many or sufficient constraints can have a virtually infinite number of solutions (I say virtually because in instance of, for example, a 32 bit integer, there are ofcourse bounds)

Requirements
Before we get started, you will need to get some things setup on your system. If you already have Python 3 setup and know how to create a venv and install libraries with pip, then you don't need to read this. Just create a python3 venv and install angr (pip install angr)

For the rest of anyone reading this:
1. I am going to be using debian linux in these examples So you need to have access to either a linux machine or (if on windows) you can enable the Ubuntu Subsystem for Windows (learn how to do that here: https://docs.microsof*****m/en-us/win.../install-win10 )

Now, open your terminal window and type the following:
0. "sudo apt update"
0.5. "sudo apt install build-essential"
1. "sudo apt install python3 python3-venv python3-dev" < This will install python3 & python3 venv
2. "mkdir ~/phones/"
3. "cd ~/phones/"
4. "python3 -m venv venv"
5. "source ./venv/bin/activate"

And thats it; I suggest you read up on venvs and how they work here, it is very important as a python programmer to understand venvs: https://docs.python.org/3/library/ven*****ml

6. Now install the "angr" package. Use the command "pip install angr"

7. Annd you are all setup good job. Remember, if you return to your terminal and don't see (venv) prefix, you need to reactivate the venv before you run any of your python files (step 5.) (Read up on venv using the provided link above if you want to know why...)

What is a Z3 solver?
As a precursor to this section, I suggest you read: https://docs.angr.io/core-concepts/solver

Remember in high-school or university when you did algebra? You were given a system of equations and were then asked to solve for possible solutions or constraints on the input. For example

x + y = 10
x - y = 8

This is a simple system of equations, we can solve it to identify that {y = 1, x = 9}

This is basically all that a Z3 engine does - but it supports various types that computers commonly work with. They are all derived from bit-vectors, which is just a fancy name for "collection of bits." Anyways, you don't really need to work with the Z3 solver directly in most cases since our "system of equations" will be the code we "symbolically execute" rather than mathematical equations.

Basic Example using a Z3 Solver
How do we solve systems of equations in programming? Well, if you've ever tried, you probably used a matrix with some identities, calculating the inverse etc.

Now, I will show you how the cool kids do it using a Z3 solver.

Lets explore the Z3 solver a little bit more using claripy, which is an abstraction layer for Z3 solvers, it comes with angr.

From the terminal you setup in the previous step (make sure there is a venv prefix) spin up a python interactive terminal by typing the command "python"

Next, run the following code:

Code:

import claripy
x = claripy.BVS("x", 32)
y = claripy.BVS("y", 32)

s = claripy.Solver()
s.add(x + y == claripy.BVV(10, 32))
s.add(x - y == claripy.BVV(8, 32))

s.batch_eval([x, y], 100)

Okay, so the first two lines (excluding the import) is just create two symbolic variables (BVS = Bit Vector Symbolic.) The first parameter is just a name we give them, the second is the number of bits. It is important to note, these values are unsigned bit vectors. They cannot store floating point numbers. Basically, this two lines are saying "Here are two variables, I don't know what is in them, but they are 32-bit unsigned integers."

The next thing we do is instantiate a solver. A solver will basically take (as input) a series of constraints and spit out possible solutions. In this case, we add two constraints. We say x + y is equal to a 32bit BVV (Bit Vector Value) of 10. And we add another constraint as well.

Next, we ask the solver to evaluate x & y - which means, try to solve these variables given the provided constraints. Where does "100" come from? 100 is the number of possible solutions we want the solver to find. The reason we specify a limit is that in some instances, the number of solutions could be infinite.

In this case, there are two solutions for x & y. Now, you math instructor will probably disagree which is fine, but they would be mistaken. Since we are dealing with Bit Vectors, we must acknowledge the effect of wrap-around. If you take a 32 bit unsigned int that is equal to 0xFFFFFFFF and you add 1, what do you get? 0, with an overflow flag set in a CPU register somewhere ofcourse.

The solutions claripy found are:

Code:

[
    (9, 1), 
    (2147483657, 2147483649)
]

So the solutions are {x=9, y=1} and {x=2147483657, y=2147483649}. The second solution is from the overflow nature of unsigned ints (cool story, this is actually how people use symbolic execution to invoke unexpected behavior and utilize exploits in software.)

What is symbolic execution
A symbolic executor BUILDS onto a Z3 solver. A symbolic execution engine will derive constraints from the code it executes. Each step along the way, defining more and more constraints.

Imagine we have the following

Code:

mov eax, ecx
add eax, 10
sub eax, 3
cmp eax, 0
je block_b:
block_a:
...
ret
block_b
...
ret

At the end of executing the code up-until the branch (the first basic block), the constraint on the eax register will be that it must be 7 more than ECX.

Conclusion
I will end this first section with an important description snippet of information; please tune in for the next tutorial in the series.

When people write programs, the encode the desired behavior INTO code by using tools like a compiler. We can essentially decompose a PROGRAM into a series of possible STATES. Symbolic execution lets us try and examine how we get INTO a state. What are the REAL constraints of entering a state? If you enter a state in unexpected circumstances, you will more or less unlock an EXPLOIT or learn lots about how a program works without having to read the code or sift through obfuscation.

Challenge Question
When we look at the system of equations:

x + y = 7
x - y = 4

We can try to solve for these variables using the following code:

Code:

import claripy
x = claripy.BVS("x", 32)
y = claripy.BVS("y", 32)

s = claripy.Solver()
s.add(x + y == claripy.BVV(7, 32))
s.add(x - y == claripy.BVV(4, 32))

s.batch_eval([x, y], 100)

But, we get the error:

Code:

claripy.errors.UnsatError: unsat

Why is that?

**Matthew** · 10-17-2020

I love someone else using this section and I have the answer to your question. I took the system of equations given, solved it using a calculator and got 2 float values. Now you already claimed that these are unassigned 32 bit integers we are working with so that is what is throwing an exception. I actually did further research to find out that it is throwing an unsatisfactory error, assuming because the answer to this system of equations is 2 float values. Now to further iterate on this, you would have to assign a satisfactory float bit value to both of the instantiated variables which in Python is 64.

Code:

import claripy
x = claripy.BVS("x", 64)
y = claripy.BVS("y", 64)

s = claripy.Solver()
s.add(x + y == claripy.BVV(7, 32))
s.add(x - y == claripy.BVV(4, 32))

s.batch_eval([x, y], 100)

This edit to the code should take care of the exception in theory and should output 2 float values; 11/2 and 3/2 respectively. Interested if I am correct with my assertion @PhoneKeyboard

**PhoneKeyboard** · 10-17-2020

Originally Posted by Sheep Brain

I love someone else using this section and I have the answer to your question. I took the system of equations given, solved it using a calculator and got 2 float values. Now you already claimed that these are unassigned 32 bit integers we are working with so that is what is throwing an exception. I actually did further research to find out that it is throwing an unsatisfactory error, assuming because the answer to this system of equations is 2 float values. Now to further iterate on this, you would have to assign a satisfactory float bit value to both of the instantiated variables which in Python is 64.

Code:

import claripy
x = claripy.BVS("x", 64)
y = claripy.BVS("y", 64)

s = claripy.Solver()
s.add(x + y == claripy.BVV(7, 32))
s.add(x - y == claripy.BVV(4, 32))

s.batch_eval([x, y], 100)

This edit to the code should take care of the exception in theory and should output 2 float values; 11/2 and 3/2 respectively. Interested if I am correct with my assertion @PhoneKeyboard

Thanks for taking the time to reading my tutorial.

Your explanation is exactly correct, but the solution actually requires you to look up some items in the claripy documentation.

It has less to do with the data size and more to do with how claripy interprets the bitvectors. Everything is a bitvector, an int, signed or unsigned, a float or a double. They are all composed of a vector of bits. However, there are further subclasses, specifically for floating point numbers. It comes down to how claripy will interpret the vector of bits & thus how it performs algorithmic operations.

Code:

import claripy
X = claripy.FPS("X", claripy.FSORT_DOUBLE)
Y = claripy.FPS("Y", claripy.FSORT_DOUBLE)

S = claripy.Solver()
S.add(X + Y == 7)
S.add(X - Y == 4)

S.batch_eval([X, Y], 100)

Which actually gives you more than one solution. The reason being is that the way computers store doubles / floats is somewhat approximate.

Anyways, thanks for reading!

*EDIT*

Also, one very important misunderstanding you have that I want to clear up: this has nothing to do with how python stores data. The only description of a bitvector is that it is x bits in length. There is otherwise no association between them and any type system (including C, C++, python, x86, x64 or ARM.) < Once we get to symbolic execution engines you will see how Angr interprets the x86 instructions to derive some typing. But strictly speaking, in a Z3 solver we just look at the as a collection of bits.

**Matthew** · 10-17-2020

Originally Posted by PhoneKeyboard

Thanks for taking the time to reading my tutorial.

Your explanation is exactly correct, but the solution actually requires you to look up some items in the claripy documentation.

It has less to do with the data size and more to do with how claripy interprets the bitvectors. Everything is a bitvector, an int, signed or unsigned, a float or a double. They are all composed of a vector of bits. However, there are further subclasses, specifically for floating point numbers. It comes down to how claripy will interpret the vector of bits & thus how it performs algorithmic operations.

Code:

import claripy
X = claripy.FPS("X", claripy.FSORT_DOUBLE)
Y = claripy.FPS("Y", claripy.FSORT_DOUBLE)

S = claripy.Solver()
S.add(X + Y == 7)
S.add(X - Y == 4)

S.batch_eval([X, Y], 100)

Which actually gives you more than one solution. The reason being is that the way computers store doubles / floats is somewhat approximate.

Anyways, thanks for reading!

I see, I'm actually just a beginner at Python as it was one of my classes I had to take for my Cyber Security degree. Obviously we never touched any advance topics, although threading was pretty confusing to me (my last topic learned). Every post I made in this section thus far was code from my assignments/projects I had to make. I actually ended up making a disc server for my peers as they didn't all grasp Python with ease like me and made some tutorial videos on my private youtube channel

**PhoneKeyboard** · 10-17-2020

Originally Posted by Sheep Brain

I see, I'm actually just a beginner at Python as it was one of my classes I had to take for my Cyber Security degree. Obviously we never touched any advance topics, although threading was pretty confusing to me (my last topic learned). Every post I made in this section thus far was code from my assignments/projects I had to make. I actually ended up making a disc server for my peers as they didn't all grasp Python with ease like me and made some tutorial videos on my private youtube channel

Cool! I work in cyber security so if you ever have any questions or would like me to cover any topic etc just let me know. I am going to cover binary lifting after symbolic execution.

I'm not sure where your cyber security degree focuses, but this is more reverse engineering and binary analysis.

**Matthew** · 10-17-2020

Originally Posted by PhoneKeyboard

Cool! I work in cyber security so if you ever have any questions or would like me to cover any topic etc just let me know. I am going to cover binary lifting after symbolic execution.

I'm not sure where your cyber security degree focuses, but this is more reverse engineering and binary analysis.

Actually I'm curious what important topics I should know going into the field, mainly topics that you'd see in a networking class (haven't paid the most attention in that class). I haven't really started any security classes yet, most of them will be next semester like security basics, firewalls, computer forensics, etc. Also, I doubt my degree will go in depth on reverse engineering if they even touch that topic

Thread: Symbolic Execution and You - Tutorial 1 - Z3 Solvers

Thread Tools

Display

Symbolic Execution and You - Tutorial 1 - Z3 Solvers

The Following User Says Thank You to Matthew For This Useful Post:

The Following User Says Thank You to PhoneKeyboard For This Useful Post:

Similar Threads

The end to mpgh and you?

MPGH, its just me and you.

my day with mpgh and you

Visual Basic 6 - Download and Install Tutorial

[Trade] How to make OPK and Invi Tutorial