010100110110000101111001 what?
Posted by: Brent York
on May 12, 2009
I thought I'd take a slight sideways jaunt from the security list that I was going to post and bring up a topic that seems to have several people I know very curious and interested. That topic is reverse engineering. Fair warning, this topic will be covered in four separate blog posts (including this one) at a rate of 1 per week. After that we'll get back to the previously mentioned topics that I said I would cover.
While the topic is related to security (heavily so), and programming, we won't be concentrating specifically on security related examples. That's because while there's a plethora of security related examples we could go over, and reverse engineering is not just related to security.
Reverse engineering generally involves taking a finished target, and taking it apart bit by bit to understand how it works as a whole. In that respect it has similarities to forensics, where you're trying to find clues that give you a bigger picture of exactly what went on. Forensics however is another topic.
For software this is generally done with decompilers, disassemblers, debuggers, and patching tools. For hardware this is generally done with high priced equipment such as oscilloscopes, logic analyzers, vector analyzers, and in-circuit emulators. In some cases integrated circuits are even acid-etched to reveal the silicon inside, allowing a reverse engineer to see and document the circuit or circuits therein.
In the case of protocols this is most often done with network sniffers, taps, and network analyzers, and in some cases specialized tooling from the hardware side listed above.
In all cases, you can see that reverse engineering requires a great deal of analysis, and it's not an undertaking for those with little patience :). While I am only going to cover software reverse engineering here. Suffice it to say hardware and protocol reverse engineering are similar in process if not entirely similar in actions taken and tools used.
Reverse engineering is done for any number of reasons including but not
limited to the following (Not all of them are good!):
- Security research (reverse engineering viruses or trojans for instance)
- Finding bugs in release software (for debugging, or exploit)
- Understanding protocols
- Figuring out how hardware works when no documentation is available
[Time for one of those anecdote things...]
Not too long ago, I had a remedy ticket assigned to me concerning a piece of software that we run for customers who's name I will withhold to protect the innocent. This particular piece of software was crashing on start-up but only at one particular customer site, and the rest of them seemed to be starting up fine.
The problem with debugging it, was that the software was compiled in release mode. So to make a long story short, I had the ops guys here (Hey guys!) grab me the crash info, and then I went to work with IDA, and dumpbin to see what I could find out. After a little digging around and working with the disassembly of the compiled binary I found out what was causing the crash. I also found out through contextual cues where the code was that was causing the crash in the original sources (since we have them).
This allowed us to identify the bug, and possibly several other places that the same bug might have lived, and allowed us to repair the software and deploy it to the sites. I'm proud to say that all of them work as intended today.
[End anecdote...]
So, what does this little story tell us? Well it tells us that reverse engineering can be a very useful tool when the chips are down and you don't have much information to debug. It also shows us that given enough time, and effort not only can the basic algorithm be defined, but one can even get great detail about the design and or implementation specifics for the code of the application itself.
Ok, so now we know what it is, and why to use it... but where do we start?
As with any good "hack", reconaissance is everything, and we need to know a few things about the piece of software we intend to reverse. These few things help define what tools we will use to break open the software and take a look around.
Obviously the reconnaissance method depends on the type of binary data you've got. It goes without saying that methods for getting information, and the types of information required will be different for different types of binaries. For example information that we need about a binary for the PIC18F series of micro-controllers won't be the same as what we need for the binary for Windows XP, but some general hard and fast rules apply.
- Determine the platform(s) the executable is supposed to run on
- Determine to a reasonable level if the program is compiled to machine instructions or is intermediary code (aka pi or p code)
- Identify any human readable strings within the binary, store them away for later use.
- In the case of bugs, or abnormal program termination, save a crash dump of the executable, or note down the addresses and values given on the illegal operation dialog.These can be used to track down the exact cause of the crash.
We can start with this, and generally speaking it's enough to get our foot in the door. The next big challenge is to identify if at all possible the language that the source code for the program was written in.
Yes, you read that right... it's possible to do that, it's also possible in many cases to tell what compiler was used, and even what version of the compiler was used :). Neat huh?
While knowing what compiler was used, and what version of the compiler is in most cases not required, it's a huge leg up when attempting to analyze code compiled by optimizing compilers.
So what does this get us?... Well now we know what tools to select. For example for a compiled windows binary that I've identified as being written with Visual C++, I might choose IDA, WinDBG and dumpbin. On Linux I'm likely to choose Lida or LDasm , objdump and gdb. Where as for an application written in Java, however, I'm more likely to use a tool such as JD.
Gathering strings can in many cases (due to sloppy secure programming methods!) reveal things such as passwords, or at the very least reveal strings which have addresses that allow a good reverse engineer to find out where those lines are shown to the user. Doing that the reverse engineer can work backwards in the call stack, which drastically shortens the time used to find things like password verification routines :).
In the case of a crash or abnormal program termination, gathering things like address and register contents from abnormal program terminations gives the reverse engineer several powerful pieces of information:
- Where it crashed
- Why it crashed
- A general idea of what was going on when it crashed
- Possibly how to make it crash"your way" in the case of something like a buffer overflow or format overflow exploit.
ABEND information is both useful for debugging purposes, as well as possibly for nefarious purposes, and also can aid a reverse engineer in continuing their own task of reversing the program.
Join me next week, when I actually get into reverse engineering a few small test binaries that I made to show how all this works :), in the meantime, keep your white hats on.




