Razer (not Microsoft) Borked My New $5,000 PC

Telesphoreo

System Administrator
Forum Administrator
Executive
Developer
384
IGN
Telesphoreo
TL;DR: My new Alienware Area-51 AAT2265 was hard freezing with Gen5 NVMe drives. No BSOD, just a complete lockup. I spent my entire weekend debugging this. It's not the drive, it's not the motherboard. Microsoft's stornvme.sys driver is broken with PCIe Gen5 NVMe drives. Switching to the native nvmedisk.sys driver fixed it. Three registry keys. That's literally it. I want my weekend back.



So I waited two weeks for my Area-51 AAT2265 to arrive. It has a Ryzen 9 9950X3D, RTX 5090, SK Hynix 2TB Gen5 NVMe, and 64GB of RAM. It showed up, I unboxed it, and the very first thing I did was run UUP dump to build a bootable Windows 11 Enterprise ISO. Keep in mind, I did this before installing a single program. Aaaaaand the system deadlocked. The mouse didn't move, Ctrl+Alt+Del did nothing, and there was no blue screen or no errors. It was just frozen. I had to hold the power button to force reset it. And there's one important thing to mention here: the Windows 11 Home install this thing shipped with was actually clean. No McAfee, no Norton, no bloatware to blame. Dell didn't screw this up on the software side for once. The install was genuinely clean, and it froze on the very first thing I tried to do. I ran UUP dump a second time and it froze again, so I gave up and ran UUP dump on my old computer instead, got my Enterprise ISO that way, and installed 25H2 on the new machine. I set everything up thinking it was just a fluke. It was very much not a fluke. My Enterprise install froze twice more. Once while I was downloading all my JetBrains IDEs, another time just browsing the web in Brave. I ran UUP dump again on 25H2 just to confirm and it froze without fail. At this point UUP dump became my reliable way to trigger a deadlock.

Going down the hardware rabbit hole​


I checked Event Viewer and found WHEA-Logger Event ID 17 errors, which are corrected hardware errors on a PCI Express Root Port. I traced the root port and it mapped directly to the M.2 slot my SK Hynix drive was in. The Kernel-Power Event ID 41 entries showed BugcheckCode 0, meaning Windows wasn't even crashing. The system was just locking up below the OS. So naturally I assumed this was a hardware problem. I started running everything I could think of. MemTest86 with multiple passes, clean. Prime95 and FurMark for extended periods, no crash, temps fine. CrystalDiskInfo said 100% health with zero media errors. CrystalDiskMark ran 9 passes at 64GB without a freeze. Everything looked perfectly healthy except for the part where my computer kept freezing. I reseated the drive twice. The RAM had been loose from shipping, so I figured maybe the NVMe was too. No dice. The second time I pulled the NVMe out, I took my electric dust blower through the M.2 slot hoping maybe there was just some tiny piece of factory debris causing a bad connection. Put it back in, ran UUP dump again, and it still froze.

Is it the drive or the slot?​


I installed Windows 11 on my Samsung 980, a Gen3 drive I was using for games. I figured I'd just take the L and reinstall my games later. It's annoying but worth it for troubleshooting. Either way, I put it in the Gen5 slot and ran UUP dump, and it actually worked. No deadlocks. This was promising because the Gen3 drive worked in the same Gen5 slot the SK Hynix was freezing in, so the slot itself wasn't dead. I was now pretty convinced the SK Hynix drive was just defective out of the box. So I bought a Samsung 9100 Pro 2TB Gen5 to test with. Put it in, installed 25H2, ran UUP dump, and guess fucking what? It froze again. Two different Gen5 drives from two different manufacturers, both freezing. My Gen3 drive in the same slot worked fine. At this point I emailed Dell and demanded a motherboard replacement because I figured the Gen5 PCIe implementation on my board was just broken.

Wait, is it actually the OS?​


After sending out the email, I decided to do even more troubleshooting. I realized I hadn't fully ruled out the OS. I tried Windows 11 LTSC, which is based on 24H2. This time I actually got a blue screen instead of a dead lockup, which was weirdly exciting because I could finally look at something. BlueScreenView showed a SYSTEM_SERVICE_EXCEPTION in ntoskrnl.exe. Nothing specifically useful, just the Windows kernel itself, but the fact that it crashed differently on a different build was interesting. Then I tried Windows 10 Enterprise LTSC 2021. I forgot how way less bloated 10 is as it only used 4GB of RAM at idle instead of 6GB. Interestingly though, the NVMe performance was worse on Windows 10. UUP dump ran once on Win10 LTSC. I had to leave for a few hours, came back, and it had completed. But I hadn't watched the whole thing so I ran it again, and this time it froze. So Windows 10 was better but not totally immune either.

Finally figured it out​


I did a ton of reading online and went back to Windows 11 25H2, this time actually understanding the NVMe driver stack. The default driver on both Windows 10 and 11 is stornvme.sys. I tried Samsung Magician hoping it would install a custom NVMe driver but it didn't. I ended up with the stock stornvme.sys driver with and without Samsung Magician. Then I found out about the native NVMe driver, nvmedisk.sys. It's technically meant for Windows Server 2025 but you can enable it on Windows 11 25H2 through some registry keys. Instead of routing NVMe commands through an old SCSI abstraction layer like stornvme.sys does, it talks to the drive natively. I enabled it and my first thought was "holy crap the difference was immediately obvious." UUP dump ran way faster than it ever had, and I'd watched it run like 10 times by this point so I knew exactly how long it should take. The big thing I noticed was that the drive wasn't constantly dropping to 0KB/s during writes anymore. On stornvme.sys it was like a rollercoaster, writing and then dropping to idle over and over and over. On nvmedisk.sys the writes actually stayed sustained and the throughput graph looked completely different. More importantly though, it didn't freeze. UUP dump completed on the Samsung 9100 Pro with the new driver without any issues. Then I put my original SK Hynix drive back in with my original Windows 11 25H2 install that I hadn't touched in days since I'd been testing with the Samsung drives. I enabled the new driver, ran UUP dump, and it just... worked. There were no crashes or lockups, it completed successfully (like it should have on day one).

Why the WHEA errors sent me on a wild goose chase​


The corrected PCIe errors in Event Viewer were real, but they weren't caused by bad hardware. I guess it turns out a buggy driver can cause the PCIe controller to report errors. If stornvme.sys was sending bad commands or mismanaging power states on the link, the hardware would log corrected errors even though the actual problem was software. The WHEA errors pointed at the right root port, sure, but the fault was in the software driving it, not the hardware itself. That's also why I never got a BSOD on my main install. It wasn't a hardware failure that Windows could catch and report. The driver was just getting stuck and the whole system froze waiting on I/O that would never complete.

The fix​


Enable the native NVMe driver on Windows 11 25H2:
Code:
reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides /v 735209102 /t REG_DWORD /d 1 /f
reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides /v 1853569164 /t REG_DWORD /d 1 /f
reg add HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Policies\Microsoft\FeatureManagement\Overrides /v 156965516 /t REG_DWORD /d 1 /f

Then you'll need to reboot. After this, your NVMe drive should move to "Storage disks" in Device Manager and the driver should show nvmedisk.sys instead of stornvme.sys. This isn't officially supported on Windows 11 client yet so use it at your own risk. I haven't done long term testing yet but UUP dump was triggering deadlocks without fail before and that's not happening anymore. I'll test the long term stability of the driver. However, since this install is basically brand new, I imagine the edge cases should be less.

Thanks a lot Microsoft for wasting my entire weekend and not letting me enjoy my new $5000 PC.
 
So a quick update. I actually was still getting freezes even after switching to the new NVMe driver. The new driver is great and does have a significant boost to performance for NVMe drives, but definitely wasn't the root cause. I (accidentally) figured it out though. It was my Razer mouse. More specifically, the HyperSpeed dongle. The computer only comes with one USB 3.0 port. And so I was plugging in my Razer mouse through a USB 2.0 port because I figured what mouse needs a USB 3.0 port... Well, it turns out the Razer does. The more interesting thing is I have a Dell monitor as well and it has a USB hub on it. I ran it for a week with the hub disconnected and the Razer dongle connected. What I noticed was that it would stutter and freeze, but recover. When I plug the Razer mouse into the monitor USB ports (and obviously hook up the USB cable from my monitor to the desktop) there are no freezes at all. It's been a good while now with absolutely no freezes or stuttering. It seems like the PCIe support with this motherboard is a bit flaky or something. I imagine Alienware could fix this with a future BIOS update. But for now, I have found a solution that is working. I have not switched back to the default NVMe driver because I feel no need to. I tried many other things with Modern Standby and a bunch of other registry tweaks, but none of them really made a difference. It was a goddamn peripheral. I imagine it worked really well on my old motherboard on my prebuilt because that was a really expensive high end motherboard. This Alienware technically has an X870E chipset, but I imagine the firmware is not as high end for the Alienware.