Saturday, July 5, 2014

0x0000119 Debugging - Invalid Fence IDs

Now that my extremely exciting week has come to an end, and I now have a moment to sit and relax, I figured what better way to do that then to go ahead and write a blog post! In this post we'll be discussing the 0x0000119 bug check, otherwise known for its name as VIDEO_SCHEDULER_INTERNAL_ERROR. I worked on one not too long ago, found the thread while I was cleaning out some reference bookmarks, and figured I'd do a write-up!

---------------------------

As usual, let's have a look at the basic description of the bug check:

VIDEO_SCHEDULER_INTERNAL_ERROR (119)

This indicates that the video scheduler has detected a fatal violation.

See this MSDN article for more information on Windows video scheduling, memory management, etc.

With this out of the way, let's jump right in and have some fun!

Using the basic !analyze -v:
-- By the way, ! is known as bang. Interesting tidbit of the day : )

VIDEO_SCHEDULER_INTERNAL_ERROR (119)
The video scheduler has detected that fatal violation has occurred. This resulted
in a condition that video scheduler can no longer progress. Any other values after
parameter 1 must be individually examined according to the subtype.
Arguments:
Arg1: 0000000000000001, The driver has reported an invalid fence ID.
Arg2: 0000000000000c00
Arg3: 0000000000000c01
Arg4: 0000000000000c01
Great, so right away we actually have some pretty helpful information, which is the 1st argument tells us that 'The driver has reported an invalid fence ID'. Now that we know this is the reason behind the bug check occurring on the system, we need to understand what driver reported an invalid fence ID, and what a fence ID even is.

Regarding arguments 2, 3, and 4, I believe 2 is the invalid fence ID we're dealing with, and 3 & 4 are what the expected fence ID was.

First off, we need to understand the Windows Display Driver Model (WDDM) - Article here. After reading this, we can understand that a fence ID is essentially a glorified ticket for the GPU to have access to process a Direct Memory Access (DMA) buffer. This is done so the GPU itself doesn't have to bother the CPU or OS, and its life is a lot easier.

---------------------------

Now that we know the above, let's take a look at the call stack:

0: kd> k
Child-SP          RetAddr           Call Site
fffff800`04438528 fffff880`015ed22f nt!KeBugCheckEx
fffff800`04438530 fffff880`07807ec5 watchdog!WdLogEvent5+0x11b
fffff800`04438580 fffff880`07808131 dxgmms1!VidSchiVerifyDriverReportedFenceId+0xad
fffff800`044385b0 fffff880`07807f82 dxgmms1!VidSchDdiNotifyInterruptWorker+0x19d
fffff800`04438600 fffff880`078f513f dxgmms1!VidSchDdiNotifyInterrupt+0x9e
fffff800`04438630 fffff880`073d64d8 dxgkrnl!DxgNotifyInterruptCB+0x83 <--- DMA buffer completed.
fffff800`04438660 fffffa80`08d938e8 igdkmd64+0x1744d8
fffff800`04438668 fffff800`031f4e80 0xfffffa80`08d938e8
fffff800`04438670 fffff800`04438840 nt!KiInitialPCR+0x180
fffff800`04438678 fffffa80`0923d7a8 0xfffff800`04438840
fffff800`04438680 fffffa80`0925b000 0xfffffa80`0923d7a8
fffff800`04438688 fffffa80`00000c00 0xfffffa80`0925b000
fffff800`04438690 fffff880`0c52b000 0xfffffa80`00000c00
fffff800`04438698 00000000`00000000 0xfffff880`0c52b000

Essentially what happened here was after the GPU finished processing the DMA buffer, the Intel Graphics driver (igdkmd64.sys) was notified that it finished what is was doing and provided an ID # of the DMA Buffer (known as a fence ID). In our case, this was in invalid fence ID, therefore DirectX said 'woah, this isn't right' and called the bug check to stop the GPU from continuing with illegal accessed memory.

---------------------------

With such an issue you may think that it's always a bad GPU, however, in this specific case it was simply a video driver issue that was solved with an update. Update those video drivers!

Hope you enjoyed reading!

No comments:

Post a Comment