The challenge with using Cobalt strike for advanced red team exercises

While next-generation AI and machine-learning components of security solutions continue to enhance behavioral-based detection capabilities, at their core many still rely on signature-based detections. Cobalt Strike being a popular red team Command and Control (C2) framework used by both threat actors and red teams since its debut, continues to be heavily signatured by security solutions.

To continue Cobalt Strikes operational usage in the past, we on the IBM X-Force Red Adversary Simulation team invested significant research and development efforts to customize Cobalt Strike with internal tooling. Some of our Cobalt Strike specific internal tools have public versions, such as “InlineExecute-Assembly”, “CredBandit”, and “BokuLoader”. In the last two years, given over-signaturing of Cobalt Strike, we restrict its use to simulating less sophisticated threat actors, and instead leverage other 3rd party and in-house C2 when performing more advanced red team exercises.

Through research and development efforts, we have found better operational success in advanced red team exercises with:

  • Custom internal tooling.
  • Custom internal loaders.
  • Custom internal C2 framework.
  • Continuing to invest in expanding the capabilities and stealth of alternate 3rd party C2 frameworks.

However, there are still a large amount of threat actors leveraging pirated copies of Cobalt Strike, and it remains important to be able to simulate these threat actors. For red teams willing to put in the research and development effort, they may still find operational success with Cobalt Strike while simulating these adversaries. Additionally, Cobalt Strike is a great learning tool, which can be leveraged by newcomers to get hands-on experience with a C2 framework through red team training courses.

As we continue to expand our C2 capabilities, we are sharing some insight into how we have built on the Cobalt Strike framework in the past, specifically by developing custom reflective loaders. It is also intended for defenders to understand how Cobalt Strike works to create more robust detections.

Building on the framework with reflective loaders

This blog post is the first of a series that serves as a primer, covering the basics of developing a Cobalt Strike reflective loader. As we progress through this series, we will build upon this foundation and reference this post.

By the end of this series, we aim to create a reflective loader that integrates with Cobalt Strike’s existing evasion features and even enhances them with advanced techniques not currently present in the tool. Future posts will delve deeper into the development of specific evasion features and how to implement them into our Cobalt Strike reflective loader.

To kick things off, this post will cover:

  • The issues with loading a C2 implant from disk with the Windows DLL Loader.
  • The concepts and mechanics of Cobalt Strike’s reflective loading process.
  • The design requirements necessary for building an effective reflective loader.
  • The phases involved in the reflective loading process.

As we explore Cobalt Strike’s reflective loading through the lens of an offensive security tool developer, we’ll highlight opportunities for detections and evasions. Some development aspects will be omitted or simplified, and we encourage you to fill in the gaps by debugging existing reflective loader projects, rebuilding them from scratch, or seeking out training.

Loading the beacon DLL

The Cobalt Strike C2 implant, known as Beacon, is a Windows Dynamic-Link Library (DLL), and the modular capability of using our own DLL loader in Cobalt Strike is known as the User-Defined Reflective Loader (UDRL).

The built-in Windows DLL Loader

Typically, the built-in Windows DLL Loader is responsible for loading DLLs into a process’s virtual memory space. The Windows DLL Loader exists primarily within user space, although it does cross over into kernel space when mapping DLLs from disk.

Using the Windows DLL Loader presents a few drawbacks when used during adversary simulations:

  • The raw DLL must be present on the file system.
  • The raw DLL must be free from obfuscation.
  • Kernel image load events are triggered by the Windows DLL Loader.

Therefore, using the Windows DLL Loader for loading our beacon DLL is not an ideal solution. To overcome these challenges, we load the beacon DLL from memory with a reflective loader.

The three main detection points reflective loading avoids are:

  1. Avoids signatured malware on the file system.
  2. Avoids kernel image load events, which can be monitored by security solutions.
  3. Avoids our C2 implant DLL listed in the Process Environment Block (PEB).

Reflective Loader vs Windows DLL Loader

Reflective loading can be thought of as simply loading a raw DLL directly from memory, as opposed to loading it from the file system.

Reflective loading and the built-in Windows DLL Loader both serve the same purpose of loading a DLL from raw file format into the virtual memory space of a process. However, reflective loading has a key advantage over the Windows DLL Loader in that it doesn’t require the DLL file to exist on the file system. This in-memory loading allows for an unlimited number of chain loading phases, as the C2 implant DLL can be hidden within layers of encryption and encoding within the memory of the process.

Raw file format vs virtual address format

A key concept to understand when loading a DLL, is knowing that the DLL will be formatted differently on disk versus in-memory. The main differences between the DLL in raw file format versus virtual address format are:

Raw File Format:

  • The format for the DLL as it exists on a file system.
  • The sections of the DLL are tightly packed together.
  • The offsets are based on the start of the raw DLL file as it would exist on disk.
  • This format takes up less memory space.

Virtual Address Format:

  • The format for the DLL as it exists in the virtual memory space of a process.
  • The sections are spaced out.
  • The offsets are Relative Virtual Addresses (RVA).
  • When running in a process, the DLL and other modules determine locations via RVA.
  • This format takes up more memory space.

Raw beacon vs virtual beacon

By examining a HTTP beacon DLL in the PE-Bear tool by Aleksandra Doniec, we see the differences between the raw and virtual addressing for each section of the DLL:

Table listing raw and virtual addresses of each section of the beacon DLL.

This HTTP/S beacon DLL is 0x52000 bytes (327KB) in size when loaded into the virtual memory space of a process, compared to 0x44000 bytes (272KB) in size as it exists on the file system. This size difference is due to the sections being spaced out in virtual address format, as opposed to being packed tightly together in its raw file format.

PE-Bear provides a visual representation of our beacon DLL as it exists in raw file format versus virtual address space format:

Visual representation of beacon DLL in raw format (left) versus virtual format (right)

Loading Beacon with the Windows DLL Loader

While not the wisest move to perform during an adversary simulation, dropping a raw beacon DLL with no obfuscation to disk and loading it with the Windows DLL Loader is a great way to demystify both beacon and DLL loading. Essentially, beacon is just a DLL. The Windows DLL Loader and a reflective loader just load a DLL into a process.

To load the beacon DLL with the Windows DLL Loader, we perform the following steps:

  1. Generate a raw beacon DLL with no obfuscation.
  2. Create a program which:
    1. Uses the LoadLibrary API to load our beacon DLL from disk.
    2. Executes our beacon by calling the virtual beacon DLL’s entry point.
  3. Place our executable program and our beacon DLL in the same folder.
  4. Execute our program.

Generating a raw beacon DLL free from obfuscation

First, we disable all of the Malleable PE options which make our beacon DLL unloadable by the Windows DLL Loader. To do this, we modify our Malleable C2 profile and disable Malleable PE evasion options located in the stage block:

Malleable C2 profile stage block modified to disable Cobalt Strike evasion features.

After modifying the profile, we restart the Cobalt Strike Team Server, supplying our no_evasion.profile profile as an argument.

Command line example of starting the Cobalt Strike Team Server.

We connect to the Team Server with the Cobalt Strike client. Then we create a Windows Stageless Payload with the output option set to Raw and listener set to https. We save the payload as beacon.dll.

Screenshot of creating a “raw stageless” beacon DLL from the Cobalt Strike Client

Creating our Beacon DLL Loader program

Using the below code, we create a C program named loadBeaconDLL.c and compile it:

Windows C code to load the beacon DLL from disk using the Windows DLL Loader.

We use the Kernel32.LoadLibraryA API to load our raw beacon DLL from disk. This API will call the built-in Windows DLL Loader which will load our beacon DLL from disk into the virtual memory space of our host process.

As part of the loading process, the Windows DLL Loader will initialize our beacon DLL by calling its entry point with DLL_PROCESS_ATTACH (1) as an argument.

After the Windows DLL Loader has loaded and initialized our beacon DLL to the virtual memory space of our process, we will need to again call the virtual beacon DLL’s entry point with the argument 0x4.

Our program must know our virtual beacon DLL’s entry point to execute our virtual beacon DLL. This can be done dynamically within the program by parsing the virtual beacon DLL’s headers for the entry point Relative Virtual Address (RVA), or we can quickly look at what it is and hardcode the value.

For our proof-of-concept we will manually discover and hardcode our beacon DLL’s entry point RVA into our program. Using PE-Bear we discover that the RVA to beacon’s entry point is 0x1D840:

Screenshot of finding the beacon DLL entry point RVA using PE-Bear

The LoadLibraryA API returns the base address of our virtual beacon DLL. We simply add this to the entry point RVA to determine the entry point.

With our code ready to go, we compile our C program into a Windows executable:

Command used to compile our program.

Positioning our program and Beacon DLL on the file system

By placing our beacon DLL and our executable beacon loader program in the same directory, the Windows DLL Loader will be able to discover our DLL as it performs its loading routine.

We place both beacon.dll and loadBeaconDLL.exe on the file system within the same directory:

Beacon DLL and loader program placed in the same directory.

Executing our program

From our Windows desktop, we double-click our loadBeaconDLL.exe program and establish an active beacon connection to our Team Server.

Successful connect to C2 Team Server from beacon DLL loaded using the Windows DLL Loader.

Cobalt strike reflective loading

Cobalt Strike uses a modified version of the Reflective Loader project by Stephen Fewer. This legendary in-memory DLL loader is over a decade old and has been used in Metasploit and other notable offensive security tools.

UDRL usage considerations

Over the years the Cobalt Strike reflective loader has been enhanced to handle all the Malleable PE evasion features Cobalt Strike has to offer. The major disadvantage to using a custom User-Defined Reflective Loader (UDRL) is that Malleable PE evasion features may or may not be supported out-of-the-box.

Some evasion features are fully implemented when using a UDRL, being patched into the beacon DLL by Cobalt Strikes Malleable PE engine on beacon payload creation. However, currently features like obfuscate must be handled by the UDRL, while others like sleepmask and cleanup can be handled by beacon with proper UDRL integration.

Reflective loading methods

Original reflective loader method

The original Reflective Loader project requires compiling the ReflectiveLoader into our DLL project and exporting it within our C2 implant DLL.

Then another project is responsible for:

  1. Discovering the virtual address of the ReflectiveLoader export.
  2. Executing the ReflectiveLoader export, which returns the entry point to our loaded DLL.
  3. Calling the reflectively loaded DLL’s entry point.

Diagram of the original reflective loader, loading a DLL to virtual memory.

Prepend reflective loader method

An alternative method is prepending the reflective loader to the DLL. This allows any unmanaged DLL to be loaded and does not require compiling the DLL from source code. This is a robust reflective loading method that can be capable of loading any PE file (EXE or DLL).

Diagram of a reflective loader prepended to a DLL, loading a DLL to virtual memory.

Cobalt Strike’s reflective loader method

Cobalt Strike’s implementation of reflective loading uses a hybrid of the above two methods. This reflective loading method may be familiar to those with knowledge of how Metasploit’s Meterpreter does reflective loading.

Like the original reflective loader method, the ReflectiveLoader function is compiled and exported within the original beacon DLL. When an operator generates a beacon payload from the Cobalt Strike client, Cobalt Strike’s Malleable PE engine patches the raw beacon DLL to inform the reflective loader on the Malleable PE options to use. Beacon’s DOS header is patched to call the ReflectiveLoader export at a hardcoded offset. The initial patched bytes of beacon’s DOS header, which call the ReflectiveLoader export, will be referred to in this blog as the “call reflective loader stub”.

When a UDRL is loaded into Cobalt Strike, and an operator generates a beacon payload from the Cobalt Strike client, Cobalt Strike’s Malleable PE engine patches in the reflective loader shellcode at the raw file offset of the ReflectiveLoader export.

When the Malleable PE engine completes the patching of the raw beacon DLL, the raw beacon DLL is given to the operator in an executable shellcode-like format.

Diagram of the Cobalt Strike reflective loader, loading the beacon DLL to virtual memory.

Beacon’s call reflective loader stub

Looking at the initial bytes in the PE-Bear disassembler we can see that the beacon DLL itself is executable:

The call reflective loader stub shown as executable assembly operation codes.

The initial bytes MZAR are customizable through the Malleable PE options in Cobalt Strikes C2 profile. These bytes must be executable and result in a no-operation (nop).

After executing optionally prepended nops and magic bytes, the call reflective loader stub:

  • Creates a stack frame.
  • Uses RIP relative addressing to determine the base address of the raw beacon DLL.
  • Calls the ReflectiveLoader export at the known 0x16E3C raw file offset.
  • Calls the entry point of the loaded beacon DLL.

We confirm that the raw file offset for the ReflectiveLoader export is 0x16E3C by looking at the beacon DLLs export directory:

Screenshot of using PE-Bear to determine the raw file offset of the ReflectiveLoader export.

As it exists within the export directory, the address for the ReflectiveLoader export is in RVA format, referring to the beacon DLL in its virtual state. Since the ReflectiveLoader export is executable, we know that it exists within the .text section of the beacon DLL.

To discover the raw file offset of the ReflectiveLoader export, we first need to know the difference between the .text sections virtual and raw address. With the difference known, we can simply subtract it from the ReflectiveLoader export’s RVA, to discover the ReflectiveLoader export’s raw file offset.

The virtual and raw addresses for the .text section are listed within the beacon DLL’s section headers:

Raw and virtual addresses of the .text section of the beacon DLL.

The difference between the two is 0xC00 bytes. By subtracting the ReflectiveLoader export’s RVA of 0x17A3C by the difference, we discover that the raw file offset is 0x16E3C.

We can confirm this in PE-Bear by right-clicking the ReflectiveLoader export’s Function RVA and the clicking Follow RVA:17A3C. The hex viewer in the above widget will jump to viewing the ReflectiveLoader export at its raw file offset.

In summary, the Cobalt Strike reflective loading process flow is:

  • A thread executes the raw beacon DLL.
  • The call reflective loader stub calls the ReflectiveLoader export at a known raw file offset.
  • The reflective loader loads the raw beacon DLL to the virtual memory of the host process.
  • After loading, the reflective loader returns the virtual beacon DLL’s entry point to the call reflective loader stub.
  • The call reflective loader stub calls the entry point of the virtual beacon DLL.

Diagram showing the main phases of how Cobalt Strike performs reflective loading of the beacon DLL.

Reflective loader design requirements

Position independent code

Since our reflective loader is executed before the beacon DLL is loaded, the reflective loader code needs to be pure shellcode.

The easiest way of making complex shellcode is to write it in C with no external dependencies. Then the C file is compiled to an object file. Everything must be included in the text section of the object file. Finally, we rip out the .text section to get the reflective loader shellcode.

How Cobalt Strike inserts our UDRL

Cobalt Strike’s Malleable PE engine will handle the work of getting the shellcode from our reflective loader object file and patching it into the raw beacon DLL at the raw file offset of the ReflectiveLoader export. This is done in the UDRL Aggressor script as seen below:

Aggressor script to write reflective loader shellcode into the raw beacon DLL leveraging Cobalt Strike.

Our UDRL Aggressor script has Cobalt Strike write in our reflective loader shellcode by performing these steps:

  1. We open a $handle to our UDRL object file with the openf function.
  2. With the file $handle we read the byte stream and save it into the $data byte array variable.
  3. Then we close the file $handle with the closef function.
  4. The built-in <a href="https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics_aggressor-scripts/as-resources_functions.htm#extract_reflective_loader">extract_reflective_loader</a> Cobalt Strike Aggressor function will parse our UDRL object file from the $data byte array, locate the .text section from our UDRL object file, extract the .text section and save it into the $loader byte array variable.
  5. The built-in <a href="https://hstechdocs.helpsystems.com/manuals/cobaltstrike/current/userguide/content/topics_aggressor-scripts/as-resources_functions.htm#setup_reflective_loader">setup_reflective_loader</a> Cobalt Strike Aggressor function will use the Malleable PE engine to discover the raw file offset of our ReflectiveLoader export, and patch our UDRL shellcode from the $loader byte array variable.
  6. Finally, we return the modified beacon DLL to Cobalt Strike and save our file from the client.

Reflective loading phases

Cobalt Strike has done the work for us regarding extracting the .text section from our reflective loader object file, patching in our reflective loader shellcode, and calling our reflective loader with the call reflective loader stub located in the beacon DLL header.

These are the phases we must develop to reflectively load beacon:

  1. Find Raw Beacon DLL
  2. Parse Beacon DLL Headers
  3. Allocate Memory for Virtual Beacon DLL
  4. Load Sections to Virtual Memory Space
  5. Load DLL Dependencies
  6. Resolve Import Address Table
  7. Resolve Relocations
  8. Execute Beacon

Phase 1: Finding the raw Beacon DLL base address

There are several different methods we can use to discover the address for the raw beacon DLL in memory. Some methods are:

  • Hunt backward for MZ & PE Headers
  • Hunt backward for an Egg
  • Get raw beacon DLL base address from reflective loader caller stub

Finding our position in memory

When using a method that hunts backward, we need to first get the current address of our thread’s Instruction Pointer (RIP). We can use this simple trick to getRip:

  1. In our UDRL we create a function called getRip.
  2. We call getRip which will push the address following the “call getRip” onto the top of the stack. This is the return address.
  3. Then in our getRip function, we simply copy the caller’s return address from the top of the stack.
  4. In x64 Windows C coding, functions can return a value. This returned value is returned to the caller through the RAX register. By moving the return address of the caller into the RAX register, we are returning the caller’s return address to the caller.

Intel x64 assembly code to get the raw beacon DLL base address from the RDI register.

Hunting backward for MZ & PE headers

The original reflective loader project hunts backward for the MZ and PE headers. These headers have become detection points. To overcome this Cobalt Strike added the magic_mz and magic_pe Malleable PE evasion features.

The Cobalt Strike documentation states that the magic_mz option:

  • “Override the first bytes (MZ header included) of Beacon’s Reflective DLL. Valid instructions are required. Follow instructions that change CPU state with instructions that undo the change.”

When configured the MZ-- bytes at raw file offset 0x00 and the PE00 bytes at raw file offset 0x80 are known to the reflective loader. They are patched into the beacon DLL by the Malleable PE engine.

These bytes must be somewhat unique, or the reflective loader won’t be able to find them. Additionally, the bytes for the MZ header must be no-operation and executable. They cannot be values like 0x00 or beacon may crash. This may be a potential detection point.

Hunting backward for an egg

After discovering this potential detection point I developed a different, but similar method to find the raw beacon DLL’s base address. This method uses an egg hunter capable of searching backward from RIP, which hunts for two repeated instances of a unique 64-bit egg at the known beacon.dll+0x50 raw file offset.

The address beacon.dll+0x50 was chosen because this is the location of the “This program cannot be run in DOS mode” banner, which is not required when reflectively loading beacon.

Since we don’t have easy access to the Java Malleable PE engine, the BokuLoader.cna UDRL Aggressor script can be used to write the 0xB0C0ACDC egg into beacon. The below code shows how the raw beacon DLL can be modified to contain the egg:

Aggressor script to write an egg into the raw beacon DLL and display the changes in the Cobalt Strike script console.

The UDRL code must know the egg value written to the raw beacon DLL by the UDRL script. With the egg known, the egg hunter searches backward for two instances of the egg, as seen in the code below:

Intel x64 assembly code for an egg hunter which searches backward for two instances of a 64-bit egg.

  • Both the UDRL aggressor script and the UDRL C code can be modified to use different eggs.

Now that the MZ and PE headers are no longer used, we can nop them out in the UDRL Aggressor script:

Aggressor script to mask MZ, PE, and unused bytes of the DOS banner located in the raw beacon DLL’s headers.

Getting the raw Beacon DLL base address from Call Reflective Loader stub

There is also another, Cobalt Strike specific way, to discover the raw beacon DLL’s base address. As we saw above, the initial bytes in the call reflective loader stub store the raw beacon DLL’s base address in the RDI register before calling the reflective loader. Rather than hunting backward from RIPfor some egg, we can simply get the value from the RDI register at the start of our reflective loader code.

To examine this further in the debugger, we generate a beacon, prepend a breakpoint (0xCC), and open the beacon up in x64dbg. Since the breakpoint is prepended, the base address of the raw beacon is at +1 of the allocated memory. As we saw above, the call reflective loader stub uses RIPrelative addressing to get the raw beacon DLL’s base address:

X64dbg screenshot of stepping through call reflective loader stub to see that the raw beacon DLL base address is saved in the RDI register before calling the reflective loader.

Below is a working example of how to get the raw beacon DLL’s base address from the call reflective loader stub:

Inline-assembly C code to get the raw beacon DLL base address from the RDI register.

Phase 2: Parsing the headers of the Beacon DLL

With the base address of the raw beacon DLL, we can now get the values we need to load beacon into the virtual address space of the process.

The below table lists values we need from the raw beacon DLL’s headers, the locations we will find them at, and their types:

Table listing values from the raw beacon DLL header which are useful for loading the beacon DLL.

Evasions

Not all contents of the headers are required for loading the beacon DLL. Required values can be repacked or obfuscated. Values not required can be removed or randomized.

Phase 3: Allocating memory for Virtual Beacon

Once we know the SizeOfImagefrom the raw beacon DLL’s header, we need to allocate memory of this size. This memory space will hold our virtual beacon DLL.

Different methods can be used for allocating memory for the virtual beacon DLL. Different methods will use different types of memory. The different methods supported by the Cobalt Strike’s default reflective loader are:

Table showing Cobalt Strike memory allocation options for the virtual beacon DLL.

Evasions

This can be taken a step further with UDRL. The NTAPI version of these functions can be used instead. Even further, the NTAPI functions could be called via direct or indirect system calls which may or may not help with bolstering evasion capabilities.

When the allocator method is set to VirtualAlloc in the Cobalt Strike Malleable C2 profile, currently the BokuLoader project will use a direct system call to NtAllocateVirtualMemory to allocate memory for the virtual beacon DLL:

Code sample from BokuLoader project showing a direct system call is used to allocate memory for the virtual beacon DLL.

  • The system call number is discovered using the HellsGate method.
  • If a userland hook exists at the system call stub, the HalosGate method is used.

The below image shows a code example of using the HellsGate and HalosGate methods to determine the system call numbers:

Code sample from BokuLoader project showing how system calls are discovered from the process.

Phase 4: Loading sections to virtual memory space

Now that we have allocated memory for our virtual beacon DLL, we need to copy beacon’s sections from their raw file offsets, as they exist in the raw beacon DLL, to the allocated memory at their relative virtual offsets.

If we allocated our memory with READWRITE we will need to track the address of the .text section and its size. Before calling the entry point of the virtual beacon DLL we will need to change the memory protections of the .text section to executable.

Allocating our memory with READWRITE_EXECUTE makes the reflective loading process easier but increases chances of detection by security solutions.

Below is a simplified code example, from the BokuLoader project, which demonstrates this:

Code sample from BokuLoader project showing sections copied from the raw beacon DLL to the virtual beacon DLL .

Evasions

Some evasion features regarding loading sections are:

  • Not copying the beacon headers to the virtual beacon DLL.
  • Deallocating the memory space in the virtual beacon DLL where the headers would exist.

In the public BokuLoader project, the headers for the beacon DLL are not copied from the raw beacon DLL to the virtual beacon DLL. Currently the first 0x1000 bytes of the virtual beacon DLL are nulls (0x00‘s). From my testing, beacon does not depend on its headers after beacon has been properly loaded into virtual memory. Avoiding copying the headers may assist in evading in-memory scanners, but these null bytes could also be a potential detection point.

Another possible evasion opportunity is having the UDRL Aggressor script encrypt the sections. The sections could be decrypted in memory by the UDRL, using a key shared between the UDRL and the UDRL Aggressor script.

Phase 5: Loading DLL dependencies

The x64 HTTP/S beacon relies on four DLLs to function properly. If these DLLs are not currently loaded into the process, our reflective loader will need to load them.

The four DLLs are listed in the HTTP/S beacon DLL’s import directory:

Screenshot from PE-Bear listing DLLs from the beacon DLL’s import directory.

The built-in Cobalt Strike reflective loader uses the kernel32.LoadLibraryA API for DLL loading.

Evasions

DLL loading can be achieved in a variety of different ways, with different operational security considerations. Some methods are:

If the DLL already exists in the process, then the above Windows APIs can still be used to get the DLL base addresses, although this may trigger unwanted detection alerts.

Alternatively, the PEB holds a pointer to the <a title="https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb_ldr_data" href="https://learn.microsoft.com/en-us/windows/win32/api/winternl/ns-winternl-peb_ldr_data">_PEB_LDR_DATA</a> struct. Within, there is a linked list of all the DLLs loaded in the process and their relative information (InMemoryOrderModuleList). BokuLoader leverages this to discover the DLL information, avoiding unnecessary API calls.

If the DLL does not exist in the InMemoryOrderModuleList, currently BokuLoader uses  the NTDLL.LdrLoadDll API to load the DLL dependency into memory, leveraging the built-in Windows DLL Loader.

Nested reflective loading cannot easily be used to load DLL dependencies because reflective loaders generally do not register the DLL to the process. Code external to the DLL cannot properly use a reflectively loaded DLL. The DarkLoadLibrary project may be capable of properly loading a DLL into memory without triggering a kernel image load event.

Code sample from BokuLoader project showing how loaded DLL’s base addresses can be resolved by walking the InMemoryOrderModuleList.

Phase 6: Resolving the import address table

With the required DLLs loaded into the process, the APIs listed in the import directory must be resolved. The API addresses will then need to be written to the virtual beacon DLL’s Import Address Table (IAT). This way beacon knows what address to jump to when it needs to call APIs such as WININET.HttpSendRequest.

The import entry will either need to be resolved via the ordinal or name string.

In the image below, we see that the Cobalt Strike beacon DLL uses a combination of ordinals and name strings for import entries:

Screenshot from PE-Bear showing some import entries for beacon DLL must be resolved by ordinal.

The built-in Cobalt Strike reflective loader uses the Kernel32.GetProcAddress API to resolve virtual addresses for import entries.

Evasions

Some evasion methods to resolve API addresses are:

  • Custom code implementations of GetProcAddress
  • NTDLL.LdrGetProcedureAddress

BokuLoader uses a custom code implementation of GetProcAddress to resolve the address for the import entry, handling both name strings and ordinals.

The NTDLL.LdrGetProcedureAddress is capable of handling both name strings and ordinals as well. If the returned address for the Import Entry is a forwarder to another DLL, BokuLoader defaults to the NTDLL.LdrGetProcedureAddress to resolve the forwarder.

While writing the IAT, hooking can be implemented by writing the virtual addresses of hook functions we have implemented rather than the intended APIs virtual address. As long as the expected output is returned to beacon when the address in the IAT is called, we can execute additional code before returning to beacon. Future posts and public BokuLoader releases will demonstrate how we can leverage IAT hooking for advanced evasion features.

With a recent release, the public BokuLoader project supports the obfuscate Malleable PE feature from the Cobalt Strike C2 profile with a custom implementation. By modifying the masking key in the BokuLoader.cna UDRL Aggressor script, obfuscation can be improved by choosing your own single-byte XOR key.

Regarding operational security, it is important to know that pattern matching engines are capable of brute-forcing single-byte XOR masks. Future posts will demonstrate how we can create our own Malleable PE engine using Cobalt Strikes Aggressor scripting functionality to obfuscate beacon to overcome pattern matching.

Phase 7: Resolving relocations

The beacon DLL has many relocations which must be resolved and written to the virtual beacon DLL’s Base Relocation Table before it is executed.

In PE-Bear we can see that the beacon DLL by default has the image base address of 0x180000000:

Screenshot from PE-Bear showing image base address of the beacon DLL.

Before we start writing relocations, we need to calculate the delta between the base address of our virtual beacon DLL and the hardcoded base address.

For example, let’s pretend the base address for our virtual beacon DLL is 0x7FFC44FE0000. We subtract the hardcoded base address from our virtual beacon DLL’s base address to get the base address delta:

Next, to determine the virtual address for each relocation entry in the Base Relocation Table, we add the base address delta to the hardcoded relocation entry address to determine the relocation within our virtual beacon DLL.

In the below image we can see that beacons relocation entries are written backward in little-endian format:

Screenshot from PE-Bear showing some relocation entries exist in little-endian format.

The hardcoded address for this relocation entry is 0x1800341C8.

We add this address to the base address delta, to get the virtual address for the relocation as it exists in the virtual beacon DLL:

For each relocation entry we will need to check that the type is <a title="https://learn.microsoft.com/en-us/windows/win32/debug/pe-format" href="https://learn.microsoft.com/en-us/windows/win32/debug/pe-format">IMAGE_REL_BASED_DIR64 (0xA)</a>. If this is false we will skip writing the relocation.

Once we determine the virtual address of the relocation as it exists within the virtual beacon DLL, we write it to the memory space which holds the hardcoded relocation entry address.

If you are interested in learning more about how to do PE relocations, check out the doRelocations function code in the public BokuLoader project. Before releasing this blog post, I changed the relocations code from assembly to hopefully human-readable C code, to assist others wanting to know the technical details of how this is done.

Phase 8: Executing Beacon

Executing beacon can be broken down into three steps:

  • Ensuring the virtual beacon DLL sections have the correct memory permissions.
  • Initializing the virtual beacon DLL.
  • Calling the virtual beacon DLL’s entry point.

Making Virtual Beacon executable

If the memory we allocated for our virtual beacon DLL is READWRITE_EXECUTE, we do not need to change the memory protections to have beacon function properly without crashing.

If we allocated our virtual beacon memory as non-executable (READWRITE), we will need to change the .text section of our virtual beacon DLL to executable. The location and the virtual size of the .text section should have been previously saved within our UDRL main function as a variables.

In the public BokuLoader project, memory protections changes are performed by direct system calls to NTProtectVirtualMemory, as seen in the code example below:

Code sample from BokuLoader project demonstrating changing the .text section of the virtual beacon DLL to executable.

The .data section of our virtual beacon DLL should have the permissions READWRITE. If the section is not writable, our beacon DLL may crash while executing.

Initializing the Virtual Beacon DLL

For the virtual beacon DLL to run properly, it must first be initialized by calling the virtual beacon DLL’s entry point. The first argument is the base address of the virtual beacon DLL. The second argument is the fwdReason and it should be set to DLL_PROCESS_ATTACH (1).

Code sample from BokuLoader project initializing the virtual beacon DLL.

Executing our Virtual Beacon DLL

After initializing the virtual beacon DLL, we can either return the entry point of virtual beacon to the call reflective loader stub, or we can call virtual beacon DLL’s entry point in our UDRL with the fwdReason set to 0x4.

Unlike a typical DLL where the first argument hinstDLL to <a href="https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain">DLLMAIN</a> would be the base address of the virtual DLL, beacon expects the base address of the raw beacon DLL. If this is not supplied, some Malleable PE evasion features may fail.

Code sample from BokuLoader project showing two different ways to execute the virtual beacon DLL.

Closing thoughts

Hopefully this blog post helps both red teams and blue teams better understand Cobalt Strike and the reflective loading process. There are still tons of evasion opportunities that can be implemented through reflective loading. With a deeper understanding of these concepts, organizations can better prepare themselves for a successful defense against cyber threats.

Future posts in this series will focus on integrating UDRL with current Cobalt Strike evasion features, dive into undocumented evasion features already present in the public BokuLoader, as well as advanced features that have not yet been released to the public. Stay tuned for more in-depth information and techniques to learn how to take your Cobalt Strike game to the next level with UDRL development!

More from Adversary Services

Abusing MLOps platforms to compromise ML models and enterprise data lakes

15 min read - For full details on this research, see the X-Force Red whitepaper “Disrupting the Model: Abusing MLOps Platforms to Compromise ML Models and Enterprise Data Lakes”.Machine learning operations (MLOps) platforms are used by enterprises of all sizes to develop, train, deploy and monitor large language models (LLMs) and other foundation models (FMs), as well as the generative AI (gen AI) applications built on top of these models. The rush to leverage AI throughout enterprises has meant that security has been often…

Getting “in tune” with an enterprise: Detecting Intune lateral movement

13 min read - Organizations continue to implement cloud-based services, a shift that has led to the wider adoption of hybrid identity environments that connect on-premises Active Directory with Microsoft Entra ID (formerly Azure AD). To manage devices in these hybrid identity environments, Microsoft Intune (Intune) has emerged as one of the most popular device management solutions. Since this trusted enterprise platform can easily be integrated with on-premises Active Directory devices and services, it is a prime target for attackers to abuse for conducting…

Racing Round and Round: The Little Bug That Could

13 min read - The little bug that could: CVE-2024-30089 is a subtle kernel vulnerability I used to exploit a fully updated Windows 11 machine (with all Virtualization Based Security and hardware security mitigations enabled) and scored my first win at Pwn2Own this year. In this article, I outline my straightforward approach to bug hunting: picking a starting point and intuitively following a path until something catches my attention. This bug is interesting because it can be reliably triggered due to a logic error.…

Topic updates

Get email updates and stay ahead of the latest threats to the security landscape, thought leadership and research.
Subscribe today