13 min read
The little bug that could: CVE-2024-30089 is a subtle kernel vulnerability I used to exploit a fully updated Windows 11 machine (with all Virtualization Based Security and hardware security mitigations enabled) and scored my first win at Pwn2Own this year.
In this article, I outline my straightforward approach to bug hunting: picking a starting point and intuitively following a path until something catches my attention. This bug is interesting because it can be reliably triggered due to a logic error. The error occurs in a specific state within an inter-process communication system, which then causes a use-after-free. Finding the bug required comparing the program’s code paths across its various possible states, a process I describe in detail. Equally intriguing is the bug’s origin and Microsoft’s approach to patching it. These topics are also covered in this post.
A common question I receive about vulnerability research is how to get started. In fact, picking a target and sticking to it might be one of the most difficult steps of the research process. The vulnerability discussed here is in the Microsoft Kernel Streaming Service (mskssrv.sys). Check out this blog post to get a general overview of the subsystem. In that post, I pointed out some characteristics of the MSKSSRV subsystem that might make it a good attack surface, specifically its inter-process communications (IPC) mechanism.
The code base of MSKSSRV is pretty small, and the last vulnerability in this subsystem I discovered was also independently exploited in the wild as a 0-day. I also heard about additional efforts from other researchers and companies to audit this driver. Because of this, I initially fell into the common trap of assuming there are no more bugs left to find in this attack surface. But, because I had suggested it in my previous blog post, I chose to trust my instincts and continue looking.
A great way to get new research ideas is by staying informed on current research. I read an excellent blog post by k0shl that sparked the inspiration to hunt for a particular type of bug. In the vulnerability found by k0shl, an object’s reference count is initialized and incremented without proper locking, creating a use after free window. Despite k0shl’s bug being a userland bug and not in the kernel, the coding style of the vulnerable library reminded me of when I previously audited the MSKSSRV driver.
The MS KS Server (MSKSSRV) interacts with a userland process viaFSStreamReg
objects.
and FSContextReg
are both derived from the baseFSRegObject
does not implement a locking vtable function, and the base class (
) implementation is simply a nop
instruction. This means no locking mechanism is actually implemented for
objects. Conversely,FSStreamReg
:
Code Block 1: FSRendezvousServer::Close, locking and unlocking path for FSRegObjects
There are two objects derived from the
As I mentioned in my last blog post, the inter-process object sharing aspect of MSKSSRV is an interesting avenue for vulnerabilities, so I decided to focus on it further. The IPC mechanism of the subsystem is illustrated in the following diagram:
Diagram 1: Inter-process Communication in MS KS Server
Opening a file handle to the MSKSSRV device, via CreateFile, creates a FILE_OBJECT that corresponds to that handle. Using that handle, a process can initialize a new stream or a context object by sending the device an IOCTL using the DeviceIoControl function. The initializing process designates which remote process can register the object by specifying the process ID via thelpInBuffer
FSRegObject
is stored in
Irp->CurrentStackLocation->FileObject->FsContext2
.
The same pointer to the FSRegObject
object is stored twice in FsContext2
, once in the
used by the initializing process and once in the
used by the registering process. In this way, references to the
object can be shared across processes. For example, using an FSStreamReg
object, multiple processes can have access to stream frame buffer, as shown in Diagram 1. An
’s reference count is initialized to 1 and then incremented again after initialization is complete. Registering the
increases its reference count again, for a total of three references per object.
Diagram 2: Initializing and Registering FSContextReg Objects
Code Block 2: DispatchCleanup and DipatchCleanup routines for MS KS Server Driver
A dispatch routine handles one or more types of IRPs, which are packaged I/O requests. In Windows, when all handle references to a file have been closed, the corresponding file system driver for the file receives an
In MSKSSRV,
Code Block 3: FSRendezvousServer::Close, Process checks on FSRegObjects
Process specific information is stored in the
Irp->CurrentStackLocation->FileObject->FsContext2
at the time of initialization or registration. The driver determines which processes specific resources (
objects, event objects, and other stuff) it needs to release by checking the caller’s process ID. If the process is the initializing or registering process, some additional cleanup is done for those process specific resources.
This stood out to me because generally, all Dispatch routines execute in an arbitrary process context, with some exceptions. In other words, the system picks a thread to do the Dispatch work; what thread it picks is arbitrary. I discovered that
Additionally, as a feature of the Windows OS, handles can be shared with other processes (by child process inheritance or using the DuplicateHandle API function). Via the shared file handle, the other process can also interact with the same
Diagram 3: Sharing MS KS Server Device Handle
Due to this, it is also possible that the process context during
Diagram 4: Foreign Process Closing Last Handle to a FILE_OBJECT
I noticed that the function
Code Block 4: FSRegObject::Release can be called twice from FSRendezvousServer::Close
In general, more dereferences than there are references on an object is not the only way a use after free can occur. However in this case, we can be sure that if an
has been freed, its reference count has dropped to zero. The last time a valid
is accessed during the
IRP requests is in a call to
. So, if a use-after-free is possible, a call to
will always occur after the object has already been freed. During the call, the object will be once again dereferenced. For that reason, counting the number of dereferences is a good heuristic to find use-after-frees for this particular case.
The only thing left to do was to trace out the possible states of the program, taking note of when the object is freed and accessed. I did this by mentally emulating the program logic during
requests, each beginning with the corresponding Dispatch functions (Code Block 2), for each of the possible cases.
Shown below are the states based on which process closes the final reference to a handle. Each entry represents the number of dereferences of the FSContextReg
object that occur if the corresponding process closes the final handle. Note: there is no functional difference between HANDLE #1 (initializing handle) and HANDLE #2 (registering handle), as the
field points to the same memory in both
s represented by the corresponding handles.
FSContextReg dereferences for each of the possible MSKSSRV IPC states
Success! The last state results in four dereferences: two by the foreign process and two by the initializing (and also registering) process, while only having three references initially, meaning a use after free is possible! I also repeated the same exercise with
While doing the virtual machine brain exercise outlined above, I found the problem. If the process is the initializing process or registering process, the appropriate cleanup happens, and the pointer stored in
Code Block 5: FSRegObject::CloseInitProcess sets FSContext pointer to NULL
This means
However, if the calling process is a foreign process, no cleanup occurs and
Now, if the second handle is closed by a process that both initialized and registered the FSContextReg object, the object will clean up all its stored process resources, making it empty. This causes
This makes for a total of four dereferences on a single
Diagram 5: CVE-2024-30089 depicted
The reader following along might wonder why then a foreign process can’t be the last one to close both handles, since this would seemingly also lead to four dereferences. Before the object is destructed and freed, it is unlinked from a list stored in the global FSRendezvousServer
, the pointer in
is checked to be a valid member of the list. In this case, the object is freed in the second call to
at the end of the function. During the fourth call to
, the object has already been unlinked from the list, making it an invalid object, so it cannot be used. In order to trigger a use-after-free, it must occur after the object has been retrieved and validated in
. The code snippet below shows the use-after-free primitive that can be obtained by the vulnerability:
Code Block 6: UAF primitive path
In the security update guide for this vulnerability, the CVSS score indicates that “Exploitation [is] More Likely” and the attack complexity for this vulnerability is “Low”. While Microsoft does not provide detailed explanations for their scoring, I have noted some patterns while patch diffing other vulnerabilities. The vulnerability likely received this score because it stems from a logic error, making it reliably triggerable. By following the steps outlined in Diagram 5, an attacker can consistently trigger the use-after-free scenario depicted in Code Block 6. However, this doesn’t mean that exploiting it in practice is straightforward. A detailed walkthrough of the exploitation steps will be covered in the next part of this series.
Understanding how a bug occurred is important for cultivating a proactive approach to secure development practices. To pinpoint how the vulnerability was introduced, I analyzed previous versions of the driver obtained from Winbindex and looked for any differences in logic in the
In the vulnerability section, I mentioned that the ultimate cause of this bug was not setting
Irp->CurrentStackLocation->FileObject->FsContext2
to
if the calling process is a foreign process. To my surprise, I saw this exact line of code in an early version of mskssrv.sys:
Code Block 7: Early version of FSRendezvousServer::Close, FsContext2 is set to NULL
In the code block shown above,
Code Block 8: FsContext set to NULL within a feature flag check
Shown in the code block above is a check for the feature flag
Feature_Servicing_TeamsUsingMediaFoundationCrashes
.
Feature flags are a component of Windows that toggle various functionality and experiments, though there is not much public information about them. In this previous blog post, we discuss how feature flags have been used for vulnerability patches. Feature flags are sometimes used to test out a functionality before it is officially adopted. In this case, if the
Feature_Servicing_TeamsUsingMediaFoundationCrashes
FileObject->FsContext2
is not set to NULL
, introducing the vulnerability. This feature was observed to be enabled by default on Windows 10 installations. In Windows 11 and as shown in Code Block 1, this feature flag conditional is not present and the pointer is not set to NULL
, making it vulnerable as well.
Due to the name of the feature, I looked into the functionality of Microsoft Teams, the video conferencing software. I confirmed that the application can use MSKSSRV functionality to share media streams across processes. It is possible that stream handle sharing was causing Teams to crash. An interesting topic for further research would be to examine how Teams shares MSKSSRV device handles across processes, and why performing proper pointer cleanup could cause the application to crash.
This part of the series is focused on the vulnerability itself, which includes its patch. I was particularly interested in examining the patch for this bug, since my proposed fix seemed to trigger crashes in Microsoft Teams. It’s important to mention that at this point I have yet to examine how a real application uses the MSKSSRV driver in practice. Not having this context introduces blind spots into the understanding of why a system is designed the way it is. A complete patch for this bug would require some base code restructuring and could reveal more details about how the IPC system is intended to function. I was also hoping to glean some insight into secure coding practices from Microsoft developers.
To my disappointment, the logic error that caused the vulnerability, which was patched in the June 2024 Security updates, was not addressed directly. Instead, an access token check was added before the vulnerable code paths. See the code below for the initialize context IOCTL, handled by the function
Code Block 9: IOCTL function begins with a feature flag check and checks if calling process is a frame server
The function above begins by checking if a feature is enabled. This is likely the feature flag corresponding to the patch. If the feature is enabled,
Let’s take a look at
Code Block 10: KsIsCurrentProcessFrameServer performing a SID check on the calling thread’s access token
This function checks the calling thread’s token against two specific security identifiers (SIDs). The SIDs correspond to a token in group
. If either of the SIDs are enabled in the calling thread’s access token, then the vulnerable function code can execute.
After seeing this, I suspected there likely was an Administrator to Kernel bug still present. Ultimately, the memory corruption problems were not addressed at all. I confirmed this by making a slight modification to my original exploit: An administrator user can start the
service, open a handle to the service and create the exploit process using the handle. I was able to obtain a full kernel R/W primitive on a fully patched system.
While Microsoft does not consider Administrator to Kernel to be a security boundary, similar bugs have been used by threat actors to gain a kernel R/W primitive and use it for EDR blinding and rootkit operations. If you’re interested in what kind of things can be done with this primitive, check out my BlackHat talk alongside FuzzySec.
This post focused on the vulnerability research part of my Pwn2Own endeavor, which consisted of finding a 0-day kernel vulnerability that can be exploited for privilege escalation. This post outlines the journey: getting inspired by other research, failing to find a bug, picking a new angle, finding something suspicious, and then finally pinpointing where the vulnerability lives. Now that a bug has been identified and there’s a use-after-free primitive, the rest should be straightforward, right? Microsoft seems to think so, they rated this bug “Exploitation More Likely” with attack complexity “low”. Are they right? I’ll cover that, the exploitation strategy, and unveil the meaning of the series title, in the next part!
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
Andréa Piazza, for the amazing diagrams
Emma Kirkpatrick, for patiently explaining the Windows security model to me
IBM web domains
ibm.com, ibm.org, ibm-zcouncil.com, insights-on-business.com, jazz.net, mobilebusinessinsights.com, promontory.com, proveit.com, ptech.org, s81c.com, securityintelligence.com, skillsbuild.org, softlayer.com, storagecommunity.org, think-exchange.com, thoughtsoncloud.com, alphaevents.webcasts.com, ibm-cloud.github.io, ibmbigdatahub.com, bluemix.net, mybluemix.net, ibm.net, ibmcloud.com, galasa.dev, blueworkslive.com, swiss-quantum.ch, blueworkslive.com, cloudant.com, ibm.ie, ibm.fr, ibm.com.br, ibm.co, ibm.ca, community.watsonanalytics.com, datapower.com, skills.yourlearning.ibm.com, bluewolf.com, carbondesignsystem.com, openliberty.io