It's about time for me to make something useful out of my nearly nonexistent existence.
Since this is the first of a set of tutorials/walkthroughs I plan to write and release here (yes, I write them on the fly), I will make the point about the general content of those, who they'll target and their goal, for those who'd decide to follow my activity.
First off, my intention is to make it easier for willing people, no matter what their skills or knowledge are, to better understand concepts I've had either issues with, or judged to be poorly documented/explained.
In other words, I will try to fill some gaps as well as possible, and give ethic tips whenever I can. (You could say I'm a perfectionist)
Second, the range of people my releases will target will vary from beginner to experimented, as I plan to cover various more or less complex topics.
Though I'd like everyone to take note that I will focus on 32bit (x86) C++ development and reverse engineering, both under Windows.
Third, I have to admit that it is very hard to get through an entire wall of text that isn't entertaining at all and is just plain boring, therefore I will try to entertain you the best I can. And even if technical and entertainment don't get along too good, I will shake sky and earth to make it possible.
Plus, it is easier to remember something if you feel high emotions at the time you learn, which is a wonderful natural defense mechanism.
And at last but not least, I will try to keep my releases as clear and as easily readable as possible for the better understanding of everyone. All of them will be constructed under a very strict scheme, which is as following :- An introduction, mainly to get a rough idea (more detailed than the title) about what it will all be about. (aboutception!)
- The prerequisites, because there are things that need to be fully understood before jumping onto more complex concepts.
- Sections, the entire releases will be divided in distinct sections, where each of them will cover a specific concept, an issue, et cetera.. In most cases, a section will use concepts that were explained in the previous sections, so it is important to start reading the release at its beginning, unless you want to confuse yourself.
- A conclusion, to finish in beauty and serenity. This will vary from a little resume or a quick overview, to my very opinion about the matter or about what people think about it in general.
- A FAQ, I will frequently answer questions asked, and pick the most relevant ones (where the concept could be approached from a different point of view, or where my explanations weren't detailed enough) to add them to this mini-FAQ, so that browsing through the entire thread remains unnecessary.
So here we go.
For starters, this tutorial will give an overview of pure and raw virtual memory, what it is and what it implies.Then I will describe how it is accessed and how it can be used, how it is handled, and of course how memory is related in C++ programming.
The goal of this tutorial is to grant anyone the ability to perfectly visualize virtual memory and to mentally compute memory operations in advance with great ease.
- Numeral System : Hexadecimal (I suggest being familiar with it to avoid unnecessary confusion, because it is overly used when it comes to computing)
- C/C++ Data Types (it is preferable to know these for the pointer section)
- Calculator (calc.exe, for the sake of being lazy asses !)
I don't think anything else is really necessary.. This tutorial is pretty basic after all.
Let's get at it, what's memory ?
Memory is all and simply a temporary data container. That's it.
Any data that needs to be processed, whatever it might be or where ever it might come from, inevitably has to be loaded into memory at very first.
It can be a picture, a song, simple numbers, text, instructions, et cetera..
Why you might want to ask ?
Well, there are a bunch of reasons, but I'll only state the most relevant ones.
First, as you probably know, doing memory operations is a lot faster than doing hard drive operations, thus, if data chunks were to be read directly from hard drives every single time they needed to be accessed, computers would run extremely slowly, no matter what the CPU (Central Processing Unit, aka. processor) specs might be.
Second, CPUs nowadays are designed to cache instructions directly from memory, so any other way around is a no go.
And so, when you start a program, whichever it might be, it will first be loaded into memory, and then only is its code executed.
As you might have noticed a bit above, I mentioned "virtual" memory, which pretty much implies there is also some kind of "real" memory.
Well yes, that would be the physical memory, where data is physically stored in a computer's RAM (Random Access Memory) module(s).
But that's not what we're interested in, whereas physical memory is entirely handled by the OS (Operating System) alone*, virtual memory is a virtual space equally created for each process to use at free will (or almost), taking place as an easy ready-to-use interface between our data and its physical location.
Now you don't have to remember all of this, I went a little out of the topic to roughly explain the why and the how, but all you need to know is the difference between virtual and physical memory, and that we only care about the virtual one.
* This is not always the case, but hell this is irrelevant here.
Now that's nice and all, but what does memory actually look like ?
All right, I'm not gonna try to get around it, it has to be faced, so I'll shove it right in your faces.
All numbers here are represented in hex (abbreviation for hexadecimal).
The blue ones vertically aligned on the left are addresses, telling us where the data (all the black numbers) is located.
This is the most common and generic way to represent a block of raw memory.
Why ? Because we don't know how to read or represent it, so we fall back on a fail-safe representation, which is this.
I will explain it more in details later.
For those who wouldn't know how to read this or who think "what the fuck am I looking at", it's pretty simple.
Each black number (number, not digit !) represent a byte, where one byte can hold a value from 0x00 to 0xFF (0 - 255 in decimal, also note that I prefix hex numbers with 0x, which is one of the most common ways to tell that the following number is in hex), which is the smallest referenceable value, and for the sake of readability, they have all been separated by spaces.
On the picture, each row contains (from left to right) 16 (0x10) bytes, and you might have noticed that at each row, the starting address is increased by 0x10.
This means that in this representation, each address annotated refers only to the first byte of each row.
Thus, 0x00f23b60 refers to 0x46, 0x00f23b61 refers to 0x4c, 0x00f23b62 refers to 0x41, ..., 0x00f23b70 refers to 0x39, et cetera..
So, that's cool and stuff, but we ain't getting far with just little numbers, right ?
Right. Let's take a look at the same block of data, but with an additional representation.
OMAGAD - NO WAI!!1
No way ? Ya way.
In memory, everything depends on how a block of data is viewed and interpreted, and since one byte can represent any character of the ASCII table, it is pretty common to systematically have the data block's ASCII representation nearby.
Thus, it becomes easier to distinct text strings from unreadable data. In fact, it seems we have one string here : "FLAC 1.2.1 20070917".
Also, for those who wouldn't understand how those numbers got magically transformed into characters, here's a little trick.
This is my most precious and favorite tool when it comes to numeric conversions or hex calculations, because I'm a lazy bitch and fuck you.
Your calculator is probably set to the standard view, so go to the "View" menu, and click "Programmer".
Now set the mode to Hex (it's on Dec by default), input the number 4C, and then set the mode back to Dec.
What do you see ? 76, yup ! That's a converter for you, saves you the trouble of doing it yourself~!
And a last little step, open notepad and enter the Alt + 76 combination (using the numpad), and tadaaa~ aL magically appeared. Marvelous, isn't it ?
So yup, the value 76 (0x4c) represents the character L.
The same applies to every other number in the data block. You can check it out if you want.
Here's a full ASCII table for the curious ones :
Data TypesI said there were multiple ways to read and represent memory, didn't I ? Yes, I think I did.
Of course, it would be utterly retarded to only have little bytes to play with, and therefore, there are several data types in ASM and in C/C++ allowing us to read wider chunks of data at once (larger numbers~!)
Let's take a simple example.
If we want to break the 0xFF numeric limit of a single byte, we have to multiply the container's width by 2. (Simply because the very nature of computing is based on multiples of two, binary much, well, it can't be helped)
So instead of taking one byte.. We'll take two of them and put them together, just like that !
So let's see.. at 0x00f23b90, we have 0x24 and 0x3c as first two bytes..
If we mix them together (fuuuuusioooonnnnn~!) we get 0x3c24 ! Easy, isn't it ? (Note the reversed order, it's not a mistake)
And so, by putting two bytes together, we created a word. No, not one of the fancy little thingies you find in a dictionary, but rather a wider data type capable of holding a value up to 0xFFFF (65'535).
If we want to go even further, we have the dword (double word, 4 bytes) which can go up to 0xFFFFFFFF (4'294'967'295).
And even even further, there is the qword* (quad word, 8 bytes) which can hold a value up to go fucking calculate it yourselfasdfkjldfjkadsfjladsfjk. >.>
And for the clear understanding of the slower-thinking peepz :
Number read from 0x00f23b90 as byte : 0x24
Number read from 0x00f23b90 as word : 0x3c24
Number read from 0x00f23b90 as dword: 0x00f23c24
Also, if we were to define this variable in C++
int myVar = 0x00f23c24
In memory, it would look exactly like in the picture up above.
Now, the reason why the bytes are put in a reversed order when read as wider data type.
The purpose of this design is actually quite logical.
It is made this way, so that one can easily change the access width to the data (located at the same address), without having to worry about either the address, or the number itself.
In this case, we have a dword holding 0x00000001, if we were to read it as a word, it would end up being 0x0001, which is still 1 !
And so, 0x00f23bb0 --> (dword)0x00000001 = (word)0x0001 = (byte)0x01 = 1
Of course, we'd experience a data loss if we shrink, in example, a word that contains a number above 0xff (:
Note that I've only mentioned Integer data containers so far, which are the most widely used.
There are also containers for floating numbers (14.12258 in example), but those require a more complex scheme of processing, which I won't overlook in this tutorial.
* On 32bit systems, the computation of 64bit (8bytes) integers isn't natively possible, thus a qword needs to be divided into 2 dwords before being processed (much of a trick), making it the largest integer container available.
Well well, pointers pointers.
I've seen many people complain about those, saying that they are the hardest part to understand and master when learning C/C++..
I've had some issues myself, but I believe that's because the tutorials I learnt them from just.. weren't good enough.
Because in fact, pointers happen to be soooo simple, I couldn't believe it myself.
I've said it before, and I'm gonna say it again.
C/C++ can never, EVER be fully understood unless one has the ASM knowledge that goes with it.
This is an absolute rule, and I will nuke anyone who disagrees, do you understand ?
So, what's a pointer ?
A pointer is simply a memory address put into an accessible variable, allowing us to access whatever is at that address.
On 32bit systems, a pointer is always a dword, restricting a program's virtual address range from 0x00000000 to 0xFFFFFFFF, thus a pointer viewed in memory will look like a normal number.
Wut? Purpose where ?
Let's say we need to access a specific part of memory, a variable or whatever, but that part isn't anywhere in our own code, how do we access it ?
Yup, that's where pointers come in handy.
It would be absolutely impossible for a program to know where its pieces of memory are if there were no pointers to reference them.
This is why pointers are crucial in C/C++ and in ASM.
They allow us to access any data inside the program's virtual memory.
But how does that all work ?
All right. Consider this little bit of code in C++
int myNumber = 57;int *myPointer = &myNumber;
myNumber contains 57.
myPointer contains the address where myNumber is located in memory.
In this case, the * operator declares myPointer as a pointer.
The fact that the declaration is prefixed with int is simply to tell what kind of data is located at that address, one integer in this case.
The & operator retrieves the address of a variable as a value, which is then put into myPointer.
If we were to do this
int mySameNumber = *myPointer;
The value in mySameNumber would be 57, which is the same as myNumber.
So the * operator has two uses, first to declare pointers, second to access memory through a pointer.
Both of the operators above are conventionally named after the term "reference".
But in my opinion, it only makes it all overconfusing.
& performs a reference
* performs a dereference
And I'm like
So, let's look at this in ASM.
If we were to read the data contained in myNumber, it would look like this
mov eax, myNumber
That's it. Pretty easy.
In ASM, there are 8 local variables called registers, one of them is called eax.
The MOV instruction simply copies the content from one part of memory or a register, into another part of memory or register.
In this case, we copy the content of myNumber (57) into eax.
Now if we were to do the same using a pointer, it would look like this
mov eax, [myPointer]
Instead of putting the value of myPointer (which is the address of myNumber) into eax, we use thebrackets  to actually fetch the value that's located at the address myPointer contains.
In both cases, eax will contain 57, which is the value myNumber contains.
Thus, we could say that
The operator *, equivalent to  in ASM, points into and fetches data.
Whilst the operator & retrieves the address of the data.
And.. that's it.
That's all and everything pointers are, not much of a fuss.
In the end, memory is definitely not something to be scared of.
It might seem mysterious and unattainable at first, but it is actually pretty sweet once you understand what it really is and how it works.
Pointers aren't a big deal either, unlike what most people think.
They are directly linked to memory, part of the core processing of a program, which is why they are so important.
Also, C/C++ granting us such a free access to them is one of the reasons C/C++ is preferred when it comes to game hacking (which involves direct memory tampering), unlike other languages that don't let us use them.
And for those who'd want more, you can always try messing with memory yourselves with programs like debuggers (OllyDbg, IDA, ...) or memory viewers/editors (Cheat Engine, HxD, ...)
I hope the tutorial isn't too messed up in its entirety, this is my first and I took many breaks during its redaction.. =]