Post

CoughDrop: Extreme OPSEC Hardening for BOF Loaders

A systematic teardown of 19 OPSEC defects in open-source COFF loaders, and how CoughDrop eliminates every one of them to achieve zero IOCs against Moneta and PE-Sieve.

CoughDrop: Extreme OPSEC Hardening for BOF Loaders

Introduction

Beacon Object Files (BOFs) have become a staple of modern red team operations. They run inside the C2 agent’s process, avoid creating new processes or loading reflective DLLs, and finish in milliseconds. The tradeoff is that the loader — the code that parses the COFF object, resolves symbols, applies relocations, and calls the entry point — becomes a permanent fixture in the agent’s memory. If the loader leaves forensic traces, every BOF execution becomes a detection opportunity.

TrustedSec’s COFFLoader is the most widely referenced open-source BOF loader. It was designed as a reference implementation — a clean, readable example of how to build an in-memory COFF loader. It achieves that goal well. However, as a reference implementation, it prioritizes clarity over stealth. The result is a loader that modern memory scanners can identify with high confidence.

CoughDrop is a drop-in replacement that systematically eliminates every detectable behavior in the loading pipeline. Same BOFs, same Beacon API, same go(char *args, int alen) entry point — with the detection surface removed. This post walks through the 19 specific OPSEC defects we identified, the techniques used to fix each one, and the development process that got us to zero IOCs (Indicators of Compromise — any artifact that a scanner can use to determine something suspicious happened, such as memory with unusual permissions, modified DLL code, or executable regions that don’t correspond to any file on disk) against both Moneta and PE-Sieve.

Background

What Is a BOF?

For a more detailed walkthrough of BOF internals — including how they interact with a C2 agent’s memory space, how DFR symbols are resolved, and how the NtApi[] indirect syscall table routes calls — see the BOF deep dive in our NOFILTER-NFEXEC post. This section provides a shorter overview.

A Beacon Object File is a compiled C object file (.o) — not a full executable, not a DLL. A DLL (Dynamic Link Library) is a .dll file that contains reusable code and data. Windows loads DLLs into a process’s memory when the process starts or when it explicitly requests one — kernel32.dll, ntdll.dll, and advapi32.dll are examples of system DLLs that virtually every Windows process has loaded. A BOF contains machine code (a .text section), data (.data, .rdata), and a symbol table that tells the loader which external functions the BOF needs. The BOF exposes a single entry point called go(), which takes a byte buffer of arguments and its length.

The key advantage of BOFs is that they execute inside the existing agent process. There is no CreateProcess, no CreateRemoteThread, no reflective DLL injection. Reflective DLL injection is a technique where a DLL is loaded entirely from memory — the DLL never touches disk, and the standard Windows loader is bypassed. This avoids file-based detection but leaves detectable memory artifacts, which is exactly the problem BOFs were designed to avoid. Instead, the agent’s COFF loader reads the .o file, allocates memory, patches in the right function addresses, and calls go(). When go() returns, the memory is freed. From the OS perspective, nothing happened except some memory allocations inside an already-running process.

What Is a COFF Loader?

A COFF loader is the component inside a C2 agent (or a standalone test harness) that performs the following steps:

  1. Parse the COFF headers to find sections, symbols, and relocations.
  2. Allocate memory for each section (.text, .data, .rdata, etc.) and a Global Offset Table (GOT) for resolved function pointers.
  3. Copy section data into the allocated memory and apply relocations — the COFF file references functions and data by relative offsets that need to be adjusted to the actual memory addresses.
  4. Resolve Dynamic Function Resolution (DFR) symbols. When a BOF declares KERNEL32$Sleep, the loader finds Sleep inside kernel32.dll and writes its address into the GOT so the BOF’s code can call it.
  5. Execute the BOF by calling go() through a function pointer.

What Do Memory Scanners Look For?

Two tools define the state of the art in usermode memory analysis:

Moneta (by Forrest Orr) scans a live process and flags: private or mapped memory with executable permissions that is not backed by a file on disk (when Windows loads a DLL via its normal loader, the memory pages are “backed” by the DLL file — the OS knows which file the bytes came from and can verify they match. Memory allocated by VirtualAlloc has no such backing and is called “private”) (“abnormal private executable memory”), code within loaded DLLs that differs from the on-disk image (“modified code”), and threads with start addresses in non-image memory.

PE-Sieve (by hasherezade) performs a similar analysis but adds shellcode pattern detection, PE implant detection in private memory regions, and inline hook identification in loaded modules.

Both tools compare the in-memory state of a process against what a “clean” process should look like. Every deviation is a potential Indicator of Compromise (IOC). The goal of CoughDrop is to leave zero deviations after a BOF has executed.

The following diagram shows where each component lives in the process’s virtual address space during a BOF execution with CoughDrop:

Process virtual address space during BOF execution showing ntdll, kernel32, clean and shifted cabinet.dll mappings, consolidated block, heap, and stack

The 19 OPSEC Defects

We performed a line-by-line audit of the upstream COFFLoader source. The defects fall into five categories: memory permissions, API resolution, memory hygiene, execution traces, and telemetry. Here is each one, with its root cause and the fix CoughDrop applies.

Category 1: Memory Permissions (Defects 1-3, 8)

Defect 1 (Critical): RWX Section Memory. Every section is allocated with PAGE_EXECUTE_READWRITE. RWX private memory (memory that is simultaneously readable, writable, and executable, and not backed by a file on disk) is the single strongest IOC for injected code. Legitimate applications almost never allocate RWX memory. Moneta flags this as “Abnormal private executable memory” immediately.

CoughDrop fix: All sections are allocated as PAGE_READWRITE during loading. After relocations are complete, each section’s permissions are set based on its COFF Characteristics field: .text gets PAGE_EXECUTE_READ, .rdata gets PAGE_READONLY, .data stays PAGE_READWRITE. The permission transition uses NtProtectVirtualMemory via indirect syscall (an NT-native API call that goes directly to the kernel, bypassing EDR hooks on the Win32 VirtualProtect wrapper) rather than the Win32 VirtualProtect wrapper.

Defect 2 (High): RWX Function Mapping / GOT. The Global Offset Table (a block of memory that stores the resolved addresses of functions the BOF calls) is also allocated with PAGE_EXECUTE_READWRITE. The GOT contains only data — function pointers — and is never executed. RWX is unnecessary and doubles the number of detectable RWX regions.

CoughDrop fix: GOT is allocated PAGE_READWRITE, then optionally flipped to PAGE_READONLY after all symbols are resolved.

Defect 3 (Medium): MEM_TOP_DOWN flag. All allocations use MEM_TOP_DOWN, which forces the OS to allocate at the highest available virtual address. Legitimate applications rarely use this flag, and multiple high-address allocations create a distinctive pattern.

CoughDrop fix: Removed. The OS chooses natural addresses.

Defect 8 (Medium): Section Characteristics ignored. The COFF section header contains flags (IMAGE_SCN_MEM_EXECUTE, IMAGE_SCN_MEM_WRITE, etc.) that specify exactly what permissions each section needs. The upstream loader logs these in debug builds but never uses them.

CoughDrop fix: Characteristics are parsed and used to drive the permission transitions described in Defect 1.

Category 2: API Resolution (Defects 4-5)

Defect 4 (Critical): Direct LoadLibraryA / GetProcAddress. When a BOF declares NTDLL$NtOpenProcess, the loader calls LoadLibraryA("ntdll.dll") and GetProcAddress(handle, "NtOpenProcess") to resolve the address. These two APIs are the most heavily hooked Win32 functions. “Hooking” means the EDR has overwritten the first few bytes of the function’s machine code with a JMP instruction that redirects execution into the EDR’s own inspection routine. The EDR logs the arguments (which DLL? which function?), decides whether the call looks suspicious, and then either allows the original function to proceed or blocks it. Because LoadLibraryA and GetProcAddress are the standard way for programs to find API functions at runtime, virtually every EDR hooks them. Every call is intercepted and the arguments (the DLL name and function name) are recorded. An EDR seeing LoadLibraryA("ntdll.dll") followed by GetProcAddress("NtOpenProcess") from unbacked memory is a direct detection.

CoughDrop fix: PEB-based module enumeration and manual export table parsing, described below.

Step 1 — Find the DLL in memory via the PEB. Every Windows process has a structure called the Process Environment Block (PEB). On 64-bit Windows, the OS stores a pointer to the PEB inside the Thread Environment Block (TEB), which the CPU’s GS segment register always points to. The assembly instruction mov rax, [gs:0x60] reads the PEB address directly from the CPU — no API call, no function pointer, nothing an EDR can hook. Inside the PEB is a field called Ldr, which contains a linked list of every DLL currently loaded into the process. CoughDrop walks this list, computes a FNV-1a hash (a fast, non-cryptographic hash) of each module name, and compares it against precomputed constants embedded in the binary. When a match is found, CoughDrop has the DLL’s base address in memory.

Step 2 — Parse the DLL’s export table to find the function. Every DLL is a PE (Portable Executable) file — the binary format Windows uses for .exe and .dll files. A PE file begins with a DOS header, followed by NT headers that describe the file’s layout. One of the data structures inside the NT headers is the export directory, which lists every function the DLL makes available to other programs. The export mechanism uses three parallel arrays: AddressOfNames (function name strings), AddressOfNameOrdinals (index numbers corresponding to each name), and AddressOfFunctions (the actual function addresses, indexed by ordinal). To resolve a function, CoughDrop searches AddressOfNames for a hash match, reads the corresponding ordinal from AddressOfNameOrdinals, and uses that ordinal as an index into AddressOfFunctions to get the final address. No LoadLibraryA or GetProcAddress call is ever made.

PEB walk pointer chain from GS:0x60 through PEB, Ldr, module list, DOS header, to export directory and final function address

Defect 5 (High): Plain-text symbol strings in memory. DLL and function names are copied into a 1KB stack buffer in plain text during symbol resolution. A memory dump reveals every API the BOF resolved.

CoughDrop fix: After each symbol is resolved, the buffer is zeroed with cd_secure_zero() — a function specifically designed to resist compiler optimization. The problem: when you write memset(buffer, 0, size) to zero a buffer, and the buffer is never read again after the zeroing, a C compiler may silently remove the memset call entirely as a “dead store” optimization — it reasons that since nobody reads the buffer after zeroing, the zeroing is unnecessary. CoughDrop’s cd_secure_zero() defeats this by using a volatile pointer (which tells the compiler “every write through this pointer has observable side effects — do not remove them”) and the __attribute__((noinline)) directive (which prevents the compiler from inlining the function, which would allow it to “see through” the volatile barrier).

Category 3: Memory Hygiene (Defects 6-7)

Defect 6 (High): No memory scrubbing before VirtualFree. When the loader finishes, it calls VirtualFree on each section’s memory. But it does not zero the contents first. The physical pages may persist and be recoverable through forensic analysis or crash dumps.

CoughDrop fix: cd_secure_zero() is called on every allocation before freeing. This includes all section memory, the GOT, and all temporary buffers. Section sizes are tracked unconditionally in all builds, not only in debug builds.

Defect 7 (Medium): Original COFF data not scrubbed. The full COFF file passed to the loader — including the symbol table, string table, and raw section data — is never zeroed. The caller eventually frees it, but the data persists in heap memory until the pages are reused.

CoughDrop fix: At the end of RunCOFF(), the entire coff_data buffer is zeroed with cd_secure_zero() before the function returns.

Category 4: Execution Traces (Defects 9-11)

Defect 9 (High): Unbacked return address on the stack. The BOF is called directly via a function pointer. When the BOF’s go() function calls any API, the return address on the stack points back into the loader’s VirtualAlloc’d memory — memory that is not backed by any file on disk. An EDR stack walker (a component that examines the chain of return addresses to determine who called what) traversing the call chain sees a return address that belongs to no loaded module. This is the “unbacked caller” IOC.

CoughDrop fix: Return address spoofing. Before calling go(), CoughDrop sets up the stack so that the visible return address points to a ret instruction. On x64, ret is a single-byte instruction with opcode 0xC3. When the CPU executes ret, it pops the top value from the stack and jumps to that address — effectively returning from the current function to whoever called it. By pointing the return address at a ret inside ntdll.dll, CoughDrop makes it look like the BOF was called from ntdll, which is a perfectly normal thing to see inside ntdll.dll. The gadget is found by scanning ntdll’s .text section for the byte pattern 0xC3 0xCC (ret followed by int3, which confirms we are at an instruction boundary). CoughDrop uses jmp instead of call to transfer control to the BOF. This distinction matters: call automatically pushes the real return address onto the stack (which would overwrite the spoofed address), while jmp simply jumps without pushing anything. Since CoughDrop has already placed the spoofed ntdll address on the stack manually, using jmp preserves it — and that is what the stack walker sees.

Defect 10 (Medium): VirtualAlloc burst pattern. The original loader makes N+1 separate VirtualAlloc calls in rapid succession — one per section plus one for the GOT. This burst of memory allocation events from a single thread is detectable via ETW (Event Tracing for Windows, a kernel-level telemetry system) telemetry.

CoughDrop fix: All sections and the GOT are consolidated into a single allocation. CoughDrop computes the total size needed, adds page-aligned padding between sections (each section starts at a multiple of 4096 bytes, because VirtualProtect and its NT equivalent operate on whole 4 KB pages — you cannot set different permissions on two regions that share the same page), allocates one contiguous block, and partitions it internally. Only NtProtectVirtualMemory differentiates the sections afterward.

Defect 11 (Low): Incomplete .bss handling. The .bss section (which holds uninitialized and zero-initialized static variables) is partially handled: if PointerToRawData is zero, the section is zero-filled. But the loader does not check the IMAGE_SCN_CNT_UNINITIALIZED_DATA flag, and .bss sections with SizeOfRawData == 0 get no memory allocated at all.

CoughDrop fix: When IMAGE_SCN_CNT_UNINITIALIZED_DATA is set, memory is allocated using max(VirtualSize, SizeOfRawData, 4096) — always at least one page, zero-filled.

Category 5: Telemetry and Advanced Evasion (Defects 12-19)

Defect 12 (Medium): ETW telemetry surface. All memory operations go through Win32 wrappers (VirtualAlloc, VirtualProtect, VirtualFree) that generate ETW events consumed by EDR products.

CoughDrop fix: All memory APIs are replaced with their NT-level equivalents via indirect syscall. The System Service Number (SSN, the numeric ID the kernel uses to identify which system call to execute) for each function is resolved at runtime via the Halo’s Gate technique: if the target function’s ntdll stub is hooked (the first bytes have been overwritten by an EDR’s JMP instruction), CoughDrop examines neighboring stubs at 32-byte intervals to find an unhooked neighbor, reads its SSN, and derives the target’s SSN by simple arithmetic. Because NT syscall stubs are laid out in SSN order inside ntdll (SSN 0 at the first stub, SSN 1 at the next, and so on), if an unhooked neighbor one stub above has SSN 0x4A, the target must be SSN 0x4A + 1 = 0x4B. The actual syscall instruction is executed not from CoughDrop’s own code, but by jumping to a syscall; ret gadget — a short sequence of two instructions (syscall followed by ret, encoded as the byte pattern 0F 05 C3) that already exists inside ntdll’s code section. The term “gadget” comes from Return-Oriented Programming (ROP): instead of executing your own code, you reuse existing instruction sequences inside trusted modules. By jumping to ntdll’s own syscall; ret bytes, the CPU’s instruction pointer during the kernel transition points into ntdll — a legitimate, expected location that EDR kernel callbacks will not flag as anomalous.

Defects 13-17: Module Stomping, Smart Target Selection, Module Shifting, Metadata Erasure, ETW Suppression. These are covered in the dedicated sections below, as they represent the most significant engineering challenges in the project.

Defect 18 (Medium): Smart Stomp Target Selection. Rather than hardcoding a single DLL as the stomp target, CoughDrop maintains an internal candidate list and automatically selects the first DLL whose .text section is large enough for the BOF:

C
static const wchar_t * const stomp_candidates[] = {
    L"cabinet.dll",   // rarely monitored
    L"uxtheme.dll",   // rarely monitored
    L"dbghelp.dll",   // common, not security-critical
    L"winhttp.dll",   // larger .text for big BOFs
    NULL
};

No operator configuration is needed. The RunCOFF signature is unchanged, so C2 integration requires zero protocol changes.

Defect 19 (Critical): Module Shifting. This is the technique that eliminated the final IOC. Covered in detail in the next section.

Deep Dive: Module Shifting

Before explaining Module Shifting, a brief note on Module Stomping — the technique it replaces. Module Stomping is an injection technique where the attacker loads a legitimate DLL (such as amsi.dll or cabinet.dll) into the process via the normal Windows loader, then overwrites the DLL’s .text section with malicious code. The advantage is that the executable memory now appears to belong to a legitimate, file-backed DLL rather than a suspicious VirtualAlloc allocation. The disadvantage — and the reason CoughDrop moved beyond it — is that Moneta detects the overwritten bytes by comparing the in-memory DLL against its file on disk.

Module Shifting is the technique that took CoughDrop from “mostly clean” to “zero IOCs.” The problem it solves is fundamental to any Module Stomping approach: when you write BOF code into a loaded DLL’s .text section, Windows marks the affected pages as Copy-on-Write (COW) — the process gets a private copy of those pages, and Moneta detects the difference between the private copy and the original file on disk. Even if you write the original bytes back afterward, the page remains privately committed in the working set — Windows’ internal bookkeeping of which memory pages a process is actively using. When a page in a memory-mapped DLL is written to for the first time, Windows creates a private copy of that page for the process (this is the Copy-on-Write mechanism). The original shared page stays untouched, but the process now has its own private copy. Even if you overwrite the private copy with the exact original bytes, Windows’ page table still records that this page was privately copied — it is no longer the shared, file-backed original. Moneta checks exactly this: it queries the working set to find pages that have been privately committed when they should still be shared with the on-disk image.

Module Shifting at page level: clean mapping pages remain shared with disk while shifted mapping triggers COW on private copies

The solution is to never write to the loaded DLL at all. Instead, CoughDrop creates a second independent mapping of the same DLL file:

  1. Clean mappingLdrLoadDll loads the DLL normally. It appears in the PEB module list. Its pages are never modified.
  2. NtCreateFile — Open the DLL’s file on disk (the path is captured from the PEB Ldr entry at load time).
  3. NtCreateSection(SEC_IMAGE) — Create a section object (a kernel-level object that represents a memory-mappable region of a file — the OS creates one internally every time it loads a DLL) over the file handle with the SEC_IMAGE flag, which tells the kernel to treat this as a PE image mapping with per-section permissions, just like a normal DLL load.
  4. NtMapViewOfSection — Map a second view of this section at a kernel-chosen base address (different from the clean mapping).
  5. Write BOF code into the shifted view’s .text section. This triggers COW on the shifted mapping’s pages — the clean mapping’s pages remain shared and untouched.
  6. Execute the BOF from the shifted view.
  7. NtUnmapViewOfSection — After go() returns, unmap the shifted view. All COW pages are released. The clean mapping stays in the PEB, looking exactly as it did before the BOF ran.

The critical distinction: with basic Module Stomping, Moneta compares the loaded DLL’s pages against the file on disk and finds differences. With Module Shifting, the loaded DLL’s pages were never touched — the differences existed only in a separate view that no longer exists.

The REL32 Distance Problem

One challenge with Module Shifting is that the BOF’s .text code runs at the shifted mapping’s address, but the GOT (which holds resolved function pointers) lives in CoughDrop’s consolidated allocation block. On x64, REL32 relocations (relative 32-bit offsets used for function calls and data access) have a maximum reach of approximately 2 GB. If the shifted mapping and the consolidated block are more than 2 GB apart in virtual address space, the 32-bit signed offset cannot represent the distance, the value wraps around, and the patched address points to the wrong location — causing a crash or silent corruption.

CoughDrop solves this with cd_valloc_near() — a function that allocates memory near a given hint address by iterating NtAllocateVirtualMemory with base address hints at 64 KiB increments, checking that both endpoints of the allocation fall within 2 GB of the target. If a particular attempt lands outside range, it is immediately freed and the next increment is tried.

Scan Results

The following screenshots show CoughDrop’s actual scan results on Windows, taken while the loader is paused after BOF execution and cleanup.

BOF execution with PID output and post-cleanup pause:

CoughDrop executing a BOF and pausing for OPSEC scan

Moneta scan — only “Unsigned module” on the exe itself (acceptable):

Moneta scan showing only Unsigned module IOC on coughdrop.exe

PE-Sieve scan — Total suspicious: 0 across all categories:

PE-Sieve scan showing Total suspicious 0

One thing to note: Moneta flags coughdrop.exe itself as an “Unsigned module.” This is expected and not a loader defect. Every .exe and .dll on Windows can optionally carry a digital signature (Authenticode) that proves who compiled it and that the file has not been tampered with. Our development build is not signed, so Moneta correctly reports it as unsigned. In a real engagement, the loader code would be embedded inside a signed agent binary or injected into an already-trusted process, at which point this flag disappears. The important result is everything else: no “Modified code” on any DLL, no “Abnormal private executable memory,” no shellcode implants. PE-Sieve reports zero in every category: Hooked 0, Replaced 0, Implanted 0, IAT Hooks 0.

Development Lessons

The BeaconPrintf Silent Failure

The first functional test of CoughDrop — loading a minimal BOF that calls BeaconPrintf — produced no output. No crash, no error, just silence. The loader reported success, but the BOF’s message never appeared.

The root cause was a compound bug: Module Stomping placed the BOF’s .text into amsi.dll at a high virtual address, while the consolidated block (containing the GOT) landed at a low address chosen by the OS. The distance exceeded 2 GB, causing REL32 relocation overflow. But the overflow code path did not set the error flag — it silently continued. The loader reported success, BeaconGetOutputData returned NULL, and the output was lost.

This taught us two things: first, every error path in the relocation loop must set retcode = 1. Second, testing with Wine is invaluable — it let us iterate the fix cycle without transferring files to a Windows VM for every attempt.

Copy-on-Write and the Last IOC

After implementing Module Stomping with disk-based byte restoration, we expected Moneta to show zero IOCs. Instead, a single page of “Modified code” persisted on amsi.dll. Byte-by-byte comparison confirmed the restored bytes were identical to the original. The VirtualProtect cycle alone (RW then RX, with no actual writes) did not trigger the flag in a control test. The cause was Windows’ working set tracking: once a page has been privately committed via COW, the kernel records this in the working set metadata even if the bytes are later restored. Neither DiscardVirtualMemory nor snapshot restoration clears this flag.

This is what led to the Module Shifting architecture. By never writing to the loaded DLL’s pages, the COW flag is never set in the first place.

Wine as a Development Environment

CoughDrop cross-compiles on Linux (WSL) with MinGW. MinGW (Minimalist GNU for Windows) is a version of the GCC compiler that runs on Linux but produces Windows .exe and .dll files. The specific compiler binary is x86_64-w64-mingw32-gcc. This lets you write C code on a Linux machine, compile it, and get a Windows executable without ever opening Visual Studio. The produced binary is a native Windows PE file — Windows cannot tell it was compiled on Linux and targets Windows x64. Early development hit a wall: every code change required manually copying the .exe to a Windows VM and running it there. Wine solved this — wine ./coughdrop.exe go test/test_bof.x64.o runs the full loader pipeline, including DFR resolution, relocations, and BOF execution.

One caveat: Wine’s ntdll does not have the standard NT stub prologue (4C 8B D1 B8 <SSN>). Every NT system call function in ntdll.dll (such as NtAllocateVirtualMemory, NtProtectVirtualMemory, etc.) begins with the same byte sequence: 4C 8B D1 (mov r10, rcx — saves the first argument) followed by B8 xx xx 00 00 (mov eax, <SSN> — loads the System Service Number). The SSN is a numeric identifier (encoded as a 32-bit immediate in the mov eax instruction, though actual values are small — typically under 0x200) that the kernel uses to look up which function to execute when the syscall instruction fires. Hell’s Gate reads these bytes to extract the SSN. If the bytes don’t match this pattern — because an EDR has overwritten them with a JMP — the function has been hooked. CoughDrop handles this with a fallback: if SSN extraction fails at initialization, indirect syscalls are disabled and the loader falls back to direct function pointers resolved via PEB walk. This keeps Wine compatibility for development while using indirect syscalls on real Windows.

Limitations and Future Work

Havoc Integration. CoughDrop currently runs as a standalone executable. Integrating it into Havoc’s Demon agent requires adapting the output pipeline (CoughDrop writes to stdout; Demon sends output via the C2 transport protocol) and reconciling the syscall infrastructure. This is planned as a separate blog post.

Sleep-time Encryption. CoughDrop’s current architecture loads, executes, and cleans up in a single synchronous call. For long-running BOFs or sleep-time obfuscation (encrypting the agent’s code in memory between C2 callbacks), additional work is needed to integrate with techniques like ShellcodeFluctuation.

Code Signing. The “Unsigned module” IOC on coughdrop.exe is inherent to any unsigned binary. In a real engagement, the loader code would be embedded in a signed agent or in a process that is already trusted.

Conclusion

CoughDrop demonstrates that systematic OPSEC hardening of a COFF loader is both achievable and measurable. Starting from the upstream COFFLoader’s 19 identified detection surfaces, we eliminated each one through a combination of: permission isolation, PEB-based resolution, indirect syscalls, memory scrubbing, return address spoofing, consolidated allocations, and Module Shifting.

The final result — zero IOCs against both Moneta and PE-Sieve — is verified automatically via a scan loop that builds, executes, and scans after every code change. The loader maintains 100% backward compatibility with existing BOFs: same entry point, same DFR convention, same Beacon API.

CoughDrop is available at github.com/y637F9QQ2x/CoughDrop under the BSD 3-Clause License, with credit to TrustedSec’s COFFLoader as the foundation.

Disclaimer: For authorized security testing and defensive research only.

References

This post is licensed under CC BY 4.0 by the author.