Yara is a pattern matching tool for malware identification. You define conditions based on strings, byte sequences, PE header characteristics, entropy values, or file properties, and Yara tells you whether a file matches. Writing rules from scratch against real samples is the most direct way to understand what makes a rule robust versus fragile, and this walkthrough covers the complete process from raw sample to a production-quality rule.
Initial sample analysis workflow
// Step 1: Basic file identification
file Qakbot_loader.exe
# PE32 executable (GUI) Intel 80386, for MS Windows
// Calculate hashes for record-keeping and VirusTotal lookup
sha256sum Qakbot_loader.exe
md5sum Qakbot_loader.exe
// Step 2: Check existing coverage before writing new rules
// Submit to VirusTotal, check if the family is already detected
// and what names it is being detected under
// Step 3: Extract strings - look for anything distinctive
strings -a -n 8 Qakbot_loader.exe | sort -u | \
grep -v "^[A-Za-z][a-z]*$" | \ // filter common words
head -100
// Notable strings from the Qakbot obama200 sample:
// SOFTWARE\Microsoft\Dtcpipe (known Qakbot persistence key)
// obama200 (campaign tag, hardcoded)
// PluginStart (export function name)
// PluginStop
// PluginCode
// EncryptData
// DecryptData
Finding byte-level patterns with radare2
// Open in radare2 for binary analysis
r2 -A Qakbot_loader.exe
// Search for the XOR key used in config decryption
// Common Qakbot XOR keys appear as immediate values in XOR instructions
[0x00401000]> /x 35deadc0de
# Matches at: 0x00401890
# 0x35 is the XOR opcode, 0xdeadc0de is the immediate key
// View the decryption routine
[0x00401890]> pd 20
// Shows the XOR loop with the key
// Search for string references
[0x00401000]> iz
// Lists all strings in the binary with their addresses
// Look at cross-references to suspicious strings
[0x00401000]> axt @ [address_of_obama200_string]
// Shows where this string is referenced from
PE structure analysis
python3 < 7.0 suggests packed or encrypted section
if entropy > 7.0:
print(f" *** HIGH ENTROPY - likely packed/encrypted ***")
print("\n=== Import Directory ===")
if hasattr(pe, 'DIRECTORY_ENTRY_IMPORT'):
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print(f" {entry.dll.decode()}")
for imp in entry.imports:
if imp.name:
print(f" {imp.name.decode()}")
EOF
Writing the rule: from indicators to condition
// Full production-quality Yara rule for Qakbot obama200 loader
rule Qakbot_obama200_loader {
meta:
author = "justruss"
description = "Qakbot loader binary from the obama200 campaign (2023)"
date = "2023-08-20"
hash_sample = "3a4b5c6d7e8f..."
reference = "https://justruss.tech/post/writing-yara-rules"
tlp = "WHITE"
strings:
// Registry persistence key (this path is unique to Qakbot)
$reg_key = "SOFTWARE\\Microsoft\\Dtcpipe" wide ascii
// Campaign identifier tag hardcoded in the config blob
$campaign = "obama200" nocase
// XOR key as immediate operand in decryption loop
// { 35 DE AD C0 DE } = XOR EAX, 0xDEADC0DE
$xor_key = { 35 DE AD C0 DE }
// Export function names consistent across loader variants
$export_1 = "PluginStart" ascii fullword
$export_2 = "PluginStop" ascii fullword
$export_3 = "PluginCode" ascii fullword
// C2 communication strings
$c2_enc = "EncryptData" ascii
$c2_dec = "DecryptData" ascii
condition:
// Must be a valid PE file (MZ header + PE signature)
uint16(0) == 0x5A4D
and uint32(uint32(0x3C)) == 0x00004550
// Reasonable size range for this loader family (50KB to 2MB)
and filesize > 50KB
and filesize < 2MB
// Must match at least 2 of the 3 export function names
// (provides resilience when one is removed in a variant)
and 2 of ($export_1, $export_2, $export_3)
// Must match at least 1 unique family indicator
and 1 of ($reg_key, $campaign, $xor_key)
}
Testing and validation
// Install Yara
sudo apt install yara -y
// Test against the original sample (must match)
yara -r qakbot_obama200.yar Qakbot_loader.exe
# Expected: Qakbot_obama200_loader Qakbot_loader.exe
// False positive check against clean Windows binaries
yara -r qakbot_obama200.yar C:\Windows\System32\ 2>/dev/null
# Expected: (no output)
// False positive check against a broader clean file set
yara -r qakbot_obama200.yar /usr/bin/ /usr/lib/ 2>/dev/null
# Expected: (no output or very few unexpected matches)
// Test against your malware corpus if you have one
yara -r qakbot_obama200.yar /opt/malware_samples/ 2>/dev/null | grep Qakbot
# Shows all matching samples - useful for measuring family coverage
// Performance test for rules that will be used in high-throughput scanning
time yara qakbot_obama200.yar large_file.bin
# Rules scanning large files should complete in milliseconds
Scanning memory for Yara matches
// Yara can scan live process memory (requires appropriate privileges)
// Scan all running processes for the Qakbot rule
yara -p 8 qakbot_obama200.yar $(ps ax -o pid= | tr -s ' ' | sed 's/^ //')
// Or scan a specific process by PID
yara qakbot_obama200.yar /proc/1234/mem 2>/dev/null
// Using Volatility to run Yara against a memory dump
vol -f memory.raw windows.vadyarascan --yara-file qakbot_obama200.yar
// Scans each process VAD region against the rule
// Much faster than scanning raw memory because it respects process boundaries
Building a Yara rule set for your environment
// Organise rules by family and campaign for maintainability
// Directory structure:
// rules/
// qakbot/
// qakbot_obama200_2023.yar
// qakbot_bb_2024.yar
// cobalt_strike/
// cs_default_watermarks.yar
// cs_beacon_config.yar
// generic/
// process_injection_indicators.yar
// credential_dumping_tools.yar
// Master rules file that includes all others
// master.yar:
// include "rules/qakbot/qakbot_obama200_2023.yar"
// include "rules/cobalt_strike/cs_default_watermarks.yar"
// ...
// Run the master ruleset against a scan target
yara -r master.yar --print-tags --print-meta scan_target/