Writing Your First Yara Rule: From Sample to Signature

22 August 2023 | justruss.tech

Yara is a pattern-matching tool for malware identification. A Yara rule describes characteristics of a file — byte strings, text strings, PE header values, file size, entropy — and Yara returns a match when a file satisfies the rule conditions.
This is a walkthrough of writing a rule from scratch against a real Qakbot loader sample from the obama200 campaign.

Initial sample analysis

# Basic file info
file Qakbot_loader.exe
# PE32 executable (GUI) Intel 80386, for MS Windows

# File hash
sha256sum Qakbot_loader.exe
# 3a4b5c6d7e8f... (submit to VirusTotal first to check existing coverage)

# Check existing Yara coverage
yara /path/to/malware_ruleset.yar Qakbot_loader.exe
# 0 matches - no existing coverage for this variant

Extracting strings

strings -a -n 8 Qakbot_loader.exe | sort -u | grep -v "^[A-Z][a-z]" | head -100

# Interesting output:
SOFTWARE\Microsoft\Dtcpipe
obama200
PluginStart
PluginStop
PluginCode
C2List
EncryptData
DecryptData

The registry key SOFTWARE\Microsoft\Dtcpipe is a known Qakbot persistence location. The string obama200 is the campaign tag embedded in this specific build. The export function names (PluginStart, PluginStop, etc.) are
consistent across Qakbot loader variants.

Identifying byte patterns with a hex editor

# Use radare2 to look at the decryption routine
r2 -A Qakbot_loader.exe
[0x00401000]> s main
[0x00401234]> pdf
# Look for the XOR loop - common in Qakbot config decryption:
# 0x00401240  8b45f8  mov eax, [var_8h]
# 0x00401243  33c2    xor eax, edx       ; XOR with key
# 0x00401245  8945f8  mov [var_8h], eax

# Find the XOR key value - look for immediate value in XOR instructions
[0x00401000]> /x 35deadc0de
# hit at 0x00401890

The 4-byte XOR key DE AD C0 DE (0xDEADC0DE) appears as an immediate operand in the decryption routine and is consistent across several samples in the obama200 campaign.

PE structure analysis

python3 -c "
import pefile
pe = pefile.PE('Qakbot_loader.exe')

print('Exports:')
if hasattr(pe, 'DIRECTORY_ENTRY_EXPORT'):
    for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
        print(f'  {exp.name.decode() if exp.name else \"\"}')

print(f'\nTimestamp: {pe.FILE_HEADER.TimeDateStamp:#x}')
print(f'Compile time: {pe.FILE_HEADER.dump_dict()[\"TimeDateStamp\"][\"Value\"]}')

# Check for anomalous section entropy (packed/encrypted sections have high entropy)
for section in pe.sections:
    entropy = section.get_entropy()
    print(f'Section {section.Name.decode().strip(chr(0))}: entropy={entropy:.2f}')
"

# Output:
# Exports:
#   PluginStart
#   PluginStop
#   PluginCode
# Compile time: Fri Aug 18 14:22:31 2023
# Section .text:   entropy=6.21
# Section .rdata:  entropy=4.87
# Section .data:   entropy=7.94  <-- high entropy, likely packed/encrypted

The rule

rule Qakbot_obama200_loader {
    meta:
        author      = "justruss"
        description = "Qakbot loader - obama200 campaign variant"
        date        = "2023-08-20"
        hash        = "3a4b5c6d7e8f..."
        reference   = "https://justruss.tech"
        tlp         = "WHITE"

    strings:
        // Registry persistence path
        $reg_path   = "SOFTWARE\Microsoft\Dtcpipe" wide ascii

        // Campaign tag
        $campaign   = "obama200" nocase

        // XOR decryption key as immediate operand
        $xor_key    = { 35 DE AD C0 DE }

        // Export function names present in all loader variants
        $export_1   = "PluginStart" ascii
        $export_2   = "PluginStop"  ascii
        $export_3   = "PluginCode"  ascii

    condition:
        // Must be a valid PE file
        uint16(0) == 0x5A4D
        and uint32(uint32(0x3C)) == 0x00004550

        // Reasonable size range for this loader family
        and filesize > 50KB
        and filesize < 2MB

        // Must match at least 2 of the 3 export names (some variants missing one)
        and 2 of ($export_*)

        // Plus either the registry path, campaign tag, or XOR key
        and 1 of ($reg_path, $campaign, $xor_key)
}

Testing the rule

# Test against the original sample
yara -r qakbot_obama200.yar Qakbot_loader.exe
# Qakbot_obama200_loader Qakbot_loader.exe

# False positive check against clean Windows binaries
yara -r qakbot_obama200.yar C:\Windows\System32\ 2>/dev/null
# (no output = no false positives)

# Retrohunt against sample corpus
yara -r qakbot_obama200.yar /opt/malware_samples/ 2>/dev/null | grep Qakbot
# Lists all matching samples - useful for measuring family coverage