Research

Signal and Image Processing methods for Malware Analysis

My PhD research is on exploring techniques from Signal and Image Processing and applying them to analyze malware. I will briefly explain below.

SARVAM: Search And RetrieVAl of Malware

In the first part, I consider a malware binary sample as a digital grayscale image:

When visualized as grayscale images, for many malware families, the images belonging to the same family appear very similar in layout and texture. We could see a visual similarity in malware variants. Here are some examples:

Variants of Dialplatform family

Variants of Agent.FYI family

To see more images of malware, check out the Malware Image Album and the SPAM project page.

We also ran in to some peculiar images of malware. The images inside these malware are the icons that the binary uses to display.

Later, we created a content based malware image search and retrieval system named SARVAM: Search And RetrieVAl of Malware, which we made publicly accessible for researchers and security professionals to upload a malware query and find its best match.

Currently, SARVAM has a database of more than 7 million malware. We have received more than 250,000 malware submissions since its launch in 2012.

Here is a list of publications related with this research:

SPAM: Signal Processing to Analyze Malware
L Nataraj, BS Manjunath
IEEE Signal Processing Magazine, Vol. 33 (2), 2016

SARVAM: Search And RetrieVAl of Malware
L Nataraj, D Kirat, BS Manjunath, G Vigna
Annual Computer Security Applications Conference (ACSAC) Workshop on Next Generation Malware Attacks and Defense (NGMAD) 2013

SigMal: A Static Signal Processing Based Malware Triage
D Kirat, L Nataraj, G Vigna, BS Manjunath
Annual Computer Security Applications Conference (ACSAC) 2013

A comparative assessment of malware classification using binary texture analysis and dynamic analysis
L Nataraj, V Yegneswaran, P Porras, J Zhang
Proceedings of the 4th ACM workshop on Security and Artificial Intelligence and Computer Security (AISec) 2011

Malware images: visualization and automatic classification
L Nataraj, S Karthikeyan, G Jacob, BS Manjunath
Proceedings of the 8th International Symposium on Visualization for Cyber Security (VizSec) 2011

SATTVA: SpArsiTy inspired classificaTion of malware VAriants

In the second part, I consider a malware binary as a one dimensional digital signal rather than a two dimensional grayscale image. Although images provide better visualization and image similarity features have been richly studied in literature, there is some arbitrariness in choosing the column width. Here is the signal representation of a malware binary:

We then model an unknown malware as a sparse linear combination of malware from the dataset. Since malware binaries can vary in size, the dimensionality can be very high. So we apply Random Projections to reduce the dimensions of the binaries and then do sparse modeling:

The publication associated with this work can be accessed here:

SATTVA: SpArsiTy inspired classificaTion of malware VAriants
L Nataraj, S Karthikeyan, BS Manjunath
ACM Workshop on Information Hiding and Multimedia Security (IH and MMSEC) 2015

More research to come soon. Stay tuned!