IDAscope update: Crypto Identification


After being quiet for almost three weeks, today I want to share with you my latest additions to IDAscope.

Focus of this post will be a new widget that I call Crypto Identification. Now you may say “oh no, yet another crypto detection tool?” Well, yes, but before you stop reading let me introduce you to an approach you might find useful.

Heuristics-based crypto detection by code properties

About 2 years ago, during literature research on network protocol reverse engineering, I came across an interesting paper called “Dispatcher: Enabling Active Botnet Infiltration using Automatic Protocol Reverse-Engineering” by Juan Caballero et al. Besides the description of an approach on how to identify and dissect message buffers into protocol fields, it contains a section on automated detection of cryptographic routines (“Detecting Encoding Functions”, p. 10).
The main idea is pretty straight forward:

While the approach described in the paper is applied to dynamically achieved instruction traces, there is no reason why not to employ it in static code analysis. So my goal for today is to show you how to make “academic things” practically usable. ;)

I use the following set of arithmetic/logic instructions, please tell me if I missed something:

[
    "add", "and", "or", "adc", "sbb", "and", 
    "sub", "xor", "inc", "dec", "daa", "aaa", 
    "das", "aas", "imul", "aam", "aad", "salc", 
    "not", "neg", "test", "sar", "cdq"
]

The following screenshot shows the widget in action:

crypto identification

The functionality I just described is located in the upper part of the widget. There are three double-sliders that can be used to adjust the following parameters:

With these filter functions, we can greatly narrow down the number of suspicious basic blocks to those really containing interesting crypto or compression algorithms. Once the initial scanning has been performed (sample with 700 functions, less than one second), the sliders update the visualization in real-time. Qt only chokes when viewing all 9500 basic blocks at once, but that’s not what you want anyway.

The two checkboxes give further ways to refine the search:

Here is a use case for this widget: When I am trying to identify cryptography in malware samples, I often have problems finding compact but frequently used crypto algorithms such as RC4 that usually do not carry constants with them (which would allow to spot them by simple signature matching). In the above screenshot (from a current Citadel sample with 724 functions) you can see that the candidate blocks have been reduced to 23 out of 9526 basic blocks. The filters are set to show only blocks with a rating of above 30%, with a size of 10 or more instructions and 1 or 0 call instructions. 23 blocks is a number small enough for me to look at in just a few minutes, identifying the relevant parts in a very short amount of time.

Among the 23 blocks is the following one:

citadel rc4

containing the modified stream cipher that is used in Citadel. In addition to the normal XOR/substitutions, Citadel also XORs against the characters of a static hash contained in the binary, which is considered one of the “advancements” from its predecessor Zeus 2. While this may be a weak example because the block is easily identified by searching for exactly this hash, you probably get the idea on how to use the widget. The heuristic also successfully identifies all the other crypto parts in the sample like the AES and CRC32 algorithms.

If you wonder about how you get double-sliders in Qt (because it is not a standard widget): The idea and code of this widget called “BoundsEditor” is adapted from Enthought’s TraitsUI, which luckily is open-source software. I took the code and reduced it back to a standard Qt widget, having a great and compact control element to adjust my parameters.

Signature-based crypto identification

The second part of the widget does what you might have expected in the first place. It simply uses a set of constants in order to find well-known cryptographic algorithms. It’s basically inspired by tools like the IDA findcrypt plugin or the KANAL plugin for PEiD. It does the same job, except being directly coupled to IDA and allowing to instantly jump to the code locations referencing the identified constants. The following screenshot (from an old but gold conficker sample) shows both types of matches:

conficker constants

The colors mean:

The currently supported algorithms are (with ingredients from Ilfak Guilfanov’s findcrypt, Felix Gröbert’s kerckhoff’s, a crypto detection implementation by Felix Matenaar from his Bachelor thesis, and some of my own adaptions):

The only thing missing right now is renaming / tagging those functions based on the signatures, maybe I will implement that, too.

Other changes to IDAscope

To conclude this post, I want to briefly discuss some more changes I did to IDAscope since the last post:

Next to come is the integration of Alex’ latest scripts into widgets.

link to original post on blogspot.