How does Threatray code detection work?

Last updated:

July 2, 2024

Threatray employs an innovative code detection technology to attribute unknown code to specific malware families. Unlike traditional YARA and antivirus (AV) signatures, our approach offers enhanced resilience against malware variants that can bypass existing technologies. This article explains how our technology works.

‍

How it works

Threatray maintains an extensive malware database containing code snippets extracted from reference malware samples. These reference samples, whose families are known, are broken down into function blocks and stored in the database. A function block refers to a function identifiable by disassemblers like IDA Pro or Ghidra.The database includes function blocks from thousands of malware families, with some families contributing code from multiple samples. Regular updates ensure the database remains current with the latest malware threats.

When analyzing an unknown sample, it is first decomposed into distinct function blocks. The figure below shows these function blocks represented by various shapes within the unknown sample.

Next, our code search engine matches the function blocks of the unknown sample with those in the malware database. As shown in the figure, the red function blocks in the unknown sample match known function blocks associated with the QakBot malware family. Specifically, 4 out of the 7 function blocks in the unknown sample align with QakBot code, resulting in a 57% match with QakBot. Based on this identified code, our logic concludes that the unknown sample is part of the QakBot malware family.

‍

Code detection in action

In the platform, verdicts contain information about the number of functions matched to a known malware family. In the screenshot below, we see a DLL file attributed to the HermeticWizard family (1) via our code detection technology. The DLL file contains 338 functions (2), of which 181 are benign (3) and 130 belong to the HermeticWizard family (4). Benign functions are not used to attribute code to malware families. Based on these findings, our algorithms determine that the DLL file belongs to the HermeticWizard family.

Additionally, faint code detections (5) to malware families related to the same actor, APT44/SANDWORM, are visible. This indicates code sharing among these families, providing useful intelligence even if the amount of code reuse is not sufficient to trigger a detection.

‍

Code detections can be inspected in detail down to a single function with a click of a button.

The table shows the functions in this DLL file (1) with code detections per function (2). Each function is characterized as benign or malicious (3), with a reference malware sample provided for each malicious function (4). The reference sample is the sample used by Threatray to create the code-based signature leading to these detections. Compared to traditional approaches, this offers transparency and enables users to pivot to the reference malware sample for further inspection if desired.

‍

Why Threatray is resilient to malware variants

Threatray’s ability to analyze and attribute all functions within an unknown piece of code makes it highly resilient to new malware iterations and modifications designed to circumvent conventional tools such as AV and YARA.

One key strength lies in Threatray’s comprehensive analysis. As new malware versions emerge, introducing changes, additions, or removals of code, certain segments of the code persist or show similarity to their predecessors - sometimes spanning months or years. Threatray can identify these persistent or similar code segments, effectively detecting and identifying novel malware versions. In contrast, conventional technologies like AV and YARA rely on a limited set of byte sequences or strings for malware identification and detection, a more fragile approach that often fails to handle new malware versions.

‍