Classification of metamorphic malware using value set analysis
Authors
More about the book
Malicious software, also known as malware, has been a growing problem over the last decade. The development and management of malware has grown into an underground economy with professional business cases. The illegal activities conducted with the help of malware can often be found around criminal organizations. Newer developments also show political and ideologic motivations, like in the context of cyberwar. Some malware is used in the context of advanced persistent threats that target specific organizations. Most malware aims to stay hidden once it is installed on the victim’s system. The defense tech- niques used by malware are constantly evolving. Early techniques included the use of packers to hide the real malicious code. This has been developed further from using different packers when spreading (oligomorphism) to self-modifying packers (polymorphism). The most sophisticated form in this context is metamorphism. Metamorphic malware is able to change its whole code in a way that it is impossible to create classical signatures. Each day, several thousands of new malicious files are found. Good classiciation is required in order to identify special files within this bulk. The need for classification can be seen in all areas from malware monitoring, to the identification of samples for ongoing investigations, to the development of countermeasures. The resources for classification, both automatic and manual, are already scarce. This, in com- bination with the lack of suitable classification techniques, leads to situations in which new important malware families stay hidden for months, like in the case of the infamous stuxnet. Even though metamorphic malware changes significantly, all samples from the same family include the same functionality. This thesis is based on the hypothesis that the functionality is reflected by values that are used during computations and other operations inside an executable. The experimental evaluation shows that metamorphic malware can be classified using character- istic sets of values. These are extracted using the means of Value Set Analysis, which is based on the concept of static analysis and data flow tracking. The special characteristics of the Value Set Analysis require the definition of a suitable match- ing scheme. Different possibilities are discussed and developed into a similarity score. This is tuned to maximize the classification performance of the presented approach. The evaluation shows the real-world applicability of this approach. The investigated criteria for real-world usage are a good classficiation performance, generic applicability that does not require any or only minimal adaptions for specific families, and run-time efficiency. The classification performance is evaluated against both benign programs and a large set of other malware. Both result in a perfect separation of family specific variants from other files. The matching scheme that is derived in this thesis can be used for all of the considered metamorphic families without adaptations. The average run-time performance is good enough to analyze in real-time. The presented work shows that information about data that is processed by executables can be used to classify executables. This is particularly useful for metamorphic malware.