More on NX Insanity

This article was supposed to be published about two years ago but got forgotten and ignored until now. It’s not the only such article. Perhaps it will start a new “better published late than never” series.

After looking more closely into the nonsense surrounding the implementation and usage of the NX feature, I ended up with more questions than answers. And it’s definitely not the fault of AMD, the people who first defined and implemented NX.

A big contributor is Microsoft, too. The code to detect and enable NX in Windows 10 (analyzing the original 10240 build here) is, to put it mildly, weird. That is very much at odds with Server 2003 (WRK) which has completely straightforward code to detect and, if present and requested, enable NX.

First let’s consider 64-bit Windows 10 10240 because it’s simpler. Given that NX was introduced in and defined as a non-optional part of the AMD64 architecture, a 64-bit OS should be able to query bit 20 (NX) in CPUID leaf 80000001h, register EDX. But that’s not what Windows 10 does.

That is to say, Windows 10 does check said CPUID bit, but does not trust the result. The pseudocode looks roughly like this:

bool KiIsNXSupported(void)
{
  bool NxPresent;
 // Check NX bit in CPUID
  if (CPUID(0x80000001).EDX & BIT(20))
    NxPresent = true;
  else
    if (KiGetCpuVendor() == 1)
      NxPresent = true;
    else
      NxPresent = false;
  return NxPresent;
}

In other words, if CPU vendor is 1 aka Intel, NX is considered to be present regardless of what CPUID says. How does that make sense? I don’t know. Perhaps the (incorrect) assumption is that any 64-bit capable Intel CPU can enable NX support if it’s currently disabled.

But it’s causing trouble because if NX cannot be enabled on an Intel CPU for whatever reason (such as running in a VM on a host with disabled NX, thanks Intel!), Windows 10 doesn’t believe the CPUID information and tries to enable NX anyway, which blows up.

The 32-bit variant of Windows 10 has a more difficult job. It might encounter a CPU with NX enabled, a CPU where NX is disabled but can be enabled, or a CPU entirely without NX. Note that both Intel and AMD made 32-bit CPUs with NX (some Intel P4 and Pentium M models, some 32-bit AMD Semprons). Here’s what Windows 10 does:

void KiTryForceEnableNx(void)
{
  // Check SSE2 bit in CPUID
  if ((CPUID(1).EDX & BIT(26)) == 0) {
    if (KiGetCpuVendor() == 1)
      KiNxForceEnable = true;
  }
}

That’s right. If SSE2 is reported, KiTryForceEnableNx does nothing. How does that make sense? I don’t know. It is true that all AMD processors which have SSE2 also have NX (and it can’t be disabled). But all Intel Pentium 4 CPUs made before late 2004 (and some made after) have SSE2 yet no NX. It’s not at all obvious what problem Microsoft was trying to solve.

Much like it’s not obvious what problem Intel was trying to solve when it made NX optional (in the sense that it could be made to “vanish”, as opposed to leaving the feature present but simply not enabled in EFER). It is plausible that Intel was solving an actual problem, but it’s also plausible that Intel did that out of abundance of caution; both approaches have been known to happen.

Even if it was done to solve some obscure problem, the original issue is long gone but the replacement problem—software failing because NX is disabled—remains even now. Not a new problem in the IT industry.

This entry was posted in AMD, Bugs, Intel, Microsoft. Bookmark the permalink.

13 Responses to More on NX Insanity

  1. Yuhong Bao says:

    To be honest, the Intel CPUs that supports 64-bit but don’t support NX are uncommon.

  2. rasz_pl says:

    >not obvious what problem Intel was trying to solve

    Historically it was almost always the problem of extracting maximum $ out of clients by forced market segmentation (ECC, AVX, VT-x, VT-d, AES, HT, etc).
    Even brand new i9-10980XE, capable of handling 256GB ram, doesnt support ECC ;-o

  3. Richard Wells says:

    MS producing code that is incorrect but works anyway on supported hardware seems to be a common affliction of late.

    Why turn off NX bit? HP back in 2005 prepared a document on XD/NX bit. It lists 3 pages of applications that fail if the NX bit was enabled. HP turned the bit off with the consumer i915 systems while leaving it on with the i945 systems though HP let users switch the value. Other vendors made sure it could never be active. Several of the applications that failed with the NX bit were DVD players. The support costs of explaining why movies can’t be watched after the user turns on security exceeds the value of being able to install a new OS 10 years after the warranty runs out.

    Software failing because the NX bit is disabled is on the software. Virtual machine software that can run on non-NX systems that intend to run Windows 10 need to emulate NX even at the great cost to performance that would entail.

  4. Michal Necasek says:

    Actually, what are those? Some old Noconas?

  5. Michal Necasek says:

    Yes, Intel is the master of segmenting the market. AMD keeps things simple.

  6. Michal Necasek says:

    No. There are two things, NX present and not enabled and NX completely disabled (gone). The latter is the problem.

    Yes, there were absolutely numerous applications that would not run with NX enabled, and for the most part the fixing was trivial once operating systems provided the APIs needed to mark memory as executable. But it’s not like all those incompatible apps immediately failed on Opterons. Those CPUs had NX and if the OS didn’t enable it, NX wasn’t active and paging worked exactly like on CPUs without NX.

    The problem is with systems that have NX disabled in the BIOS, which means that when an OS comes along, NX is not there. I don’t know what problem that was supposed to solve.

  7. Richard Wells says:

    Disabling NX in the BIOS and not permitting it to be enabled solved the problem of the careless user reading about NX in the trades, enabling it, and then having software not work. Preventing a 100ドル support call on a computer which only generated 50ドル in profit makes sense to me.

    There was a second set of NX bit elimination in the BIOS. IIRC, some overclock friendly motherboards disabled NX bit permanently since in some cases, a seemingly stable overclock generated errors when NX was enabled. That was likely a memory issue caused by the overclock and it would have been better to reduce the clock and get stability. But higher clock speed on the cheap always sold better than stodgy reliability.

  8. Yuhong Bao says:

    The IBM Socket 478 64-bit Pentium 4 CPU lack NX also.

  9. Michal Necasek says:

    Right, but how many people have those? 🙂

  10. Michal Necasek says:

    Sorry, no. If a user enables NX in the BIOS, it still does nothing until the OS turns it on. NX is enabled by a control bit in the EFER MSR and that is fully under OS control.

    To reiterate: The NX control in the BIOS does not enable NX, but it can prevent NX from being enabled. It’s just like the VT-x and other BIOS controls in that regard.

  11. vbdasc says:

    @Michal Necasek

    “Yes, Intel is the master of segmenting the market. AMD keeps things simple.”

    I’m not a fan of Intel, but… remember the socket 754/939 Semprons which had AMD64 and/or SSE3 disabled?

  12. Yuhong Bao says:

    I don’t believe that AMD has ever disabled SSE3, and only early Semprons before mid-2005 had AMD64 disabled.

  13. vbdasc says:

    @Yuhong Bao

    You’re right about AMD and SSE3. My mistake. But it’s true that AMD disabled AMD64 in some Semprons to create some market segmentation.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.