Security: Why hacking was easy but not anymore

Explaining security vulnerabilites to laymen is hard, but not impossible. Let me try it. First of all, let's define what a program is and what a CPU does. A program is a series of instructions. Simple. A CPU executes instructions. Also simple. So what are security vulnerabilites then? Well, there are different kind of security vulnerabilites (such as for example information disclosure or not checking permissions correctly (i.e. something that allows you to edit someone else's facebook profile because the website doesn't check correctly who's editing it), but we will focus on code execution vulnerabilites. First we need to understand the concept of reusable code and functions. A function is a series of instructions in a program which can be used repeatedly from everywhere within the program. This is a very important concept because it allows programs to be smaller and easier to write and maintain. Let's say you have a function that calculates the square root of a number. You want to make use of this without having to copy it to every location in your program where you want to calculate a square root. No. You just want to be able to use this function (which programmers call "calling" or "invoking" a function). In "machine code" this will roughly look like this:


   ; some code here
   call SQUARE_ROOT
   ; continue

SQUARE_ROOT:
       ; do some calculations here
   ret ; ret causes the CPU to return from a function 

The important thing here is the ret instruction. If you call a function you want to be able to continue in your code where you left. This means when you call a function the CPU jumps to the function and then jumps back to where you called it to continue. If it wouldn't jump back then your code could jump to a function but then you can continue executing your code. The CPU needs to jump back. But how does the CPU know where to jump back to? After all, you can call a function from everywhere. The CPU stores the location of where it has to jump back (called the return address) during the call instruction. Where this is stored depends on the CPU architecture but for the CPU in your laptop it will be stored in memory where everything else is stored too. If we can trick the program into overwriting the stored return address we can make the program jump to wherever we want. But how can we do this? Easy. Let's look at the following memory layout:

+-----------+--------------------+
| DATA [16] | RETURN ADDRESS [4] |
+-----------+--------------------+

Programs have to manage space. In this case, the program has allocated 16 bytes to store some data and after this data the CPU stores the return address (which is 4 bytes long). Now, let's say a program reads a file from the disk into memory but does not check how large the file is. If the file is smaller than 17 bytes then this won't be a problem. But what happens when the file is 20 bytes long? It will overwrite the return address with the last 4 bytes in that file. Thus we can create a file that will make this program jump to wherever we want. If the file is longer than 20 bytes it will just overwrite more memory. The 4 bytes after 16 bytes in the file will always overwrite the return address. Now, since instructions reside in memory as well we can create a file where the first 16 bytes are valid instructions which will be loaded into memory where DATA is. Since we know where DATA is we can set the next 4 bytes to the location/address of DATA. This means the CPU will jump to the beginning of DATA and will execute the instructions there. This means we can create files that cause the program to execute arbitrary code. And it really was this simple but now there are techniques available to prevent this.

But why is it hard now?

Looking at the paragraph above we can see one way to make this harder. We need to know where DATA is. If we don't know the location of DATA we don't know what we have to set the last 4 bytes to. And this is what ASLR (Address Space Layout Randomization) does. It's a security measure that randomizes the locations of things in memory. Another security measure is making sure that there's a secret (unknown to the hacker) value between DATA and RETURN ADDRESS. Then before returning from the function this secret value is checked. If it was modified (because it was overwritten) then we abort the program (the program crashes, but at least the hacker can't execute code). Since the hacker doesn't know this secret value they can't craft a file that will cause the program to execute their code, but they can craft files to crash the program. Another technique is telling the CPU to never execute instructions in certain regions of memory. This requires hardware support from the CPU but the CPU in your laptop supports this. This way we can tell the CPU that DATA is not supposed to be executed. If the CPU jumps into DATA it will refuse to execute it and inform the operating system about this. The operating system will then usually terminate the program. So at worst a hacker can crash the program, but not execute code.

So we're 100% safe? No. a.) there can be security holes in the implementation of these techniques. b.) there are ways to circumvent these c.) not all programs/operating systems use them BUT! If all these three are in place it's really, really hard to actually execute code.