Archives
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
Failing to Fail
The other day I was going over various versions of the venerable DOS/16M DOS extender from Rational Systems (later Tenberry Software). The DOS/16M development kit comes with a utility called PMINFO.EXE which is meant to give the user some idea about the performance of a system running in protected mode.
I know that the utility has trouble on faster CPUs and I expected it to fail about like this:
But running the utility on an older laptop with an Intel Haswell processor, I instead got this:
Rather than cleanly exiting after catching a floating-point division by zero, the program crashed with a general protection fault. That looks like a bug, but why would it be happening? And where is the bug?
To get a better sense of the problem, I used the Instant-D debugger shipped with DOS/16M. There I could see the faulting code:
It didn’t take me too long to determine that the code is part of the floating-point exception handling logic, and that it comes from the EMOEM.OBJ file shipped with DOS/16M.
Now, the EMOEM module is provided in source form with many Microsoft compilers, including Microsoft C 5.1 and 6.0 (one of those was likely used to build PMINFO.EXE). But the crashing code fragment is not in the code provided by Microsoft. So why is it there and what is it supposed to do?
It took me a little while to understand what the code is doing, but once I did, it was obvious why it’s there. The problem the code is trying to solve is caused by the fact that the x87 environment differs between real and protected mode. The original real-mode only format used by the 8087 stores a linear 20-bit address of the floating-point instruction (because the 8087 does not know what the original segmented 16:16 address was!) plus 11 bits of the FPU instruction opcode (five bits are always the ESC opcode). The 20-bit linear address and the 11-bit opcode are stored in two consecutive 16-bit words, with one bit left unused.
In protected mode on the 80287, a 20-bit linear address isn’t enough. Intel changed the x87 environment format to store the full 16:16 segmented address, and the FPU opcode is no longer stored.
DOS/16M was designed to work with compilers producing real-mode DOS code. Hence libraries shipped with those compilers expect the original 8087 environment format when handling floating-point exceptions. But because DOS/16M applications in fact run in protected mode, the FPU will be storing the x87 environment in the newer, protected-mode format.
The extra code in the DOS/16M EMOEM.OBJ is clearly meant to read the opcode from the stored CS:IP address, possibly skip one byte of a prefix, and then modify the stored environment, writing the 11 opcode bits right where real-mode exception handling code expects to find them. (Note that the code makes no attempt to produce a 20-bit linear address, since that wouldn’t work anyway.)
So why does this code not work on my Haswell laptop? Because the CPU is not quite backwards compatible.
99% Backward Compatibility
The original 8087 always kept the FPU environment up to date, including the FPU opcode as well as instruction and data addresses. That reflected the internal working of the 8087.
The 287 already changed things from a software perspective, which was a result of the different interface between the CPU and FPU. On the 8087, the stored instruction address points to the ESC opcode. On the 287 and later, it points to any prefixes that might precede the ESC opcode. This change was clearly an improvement, and although it had the potential to upset existing floating-point exception handlers, in practice probably didn’t because most FPU instructions that are likely to fault (division, multiplication, transcendental instructions) aren’t used with prefixes anyway.
The FXSAVE instruction added in the later Pentium II models subtly changed how the processor saves the last FP instruction opcode and code/data addresses. Rather than saving these data items every time, they’re only saved when there is a pending floating-point exception. This reflects the actual usage, since only FP exception handlers are likely to need this information.
In the P4 microarchitecture, Intel added a (presumably) performance optimization called “fopcode compatibility mode”. Bit 2 in the multi-purpose IA32_MISC_ENABLE MSR determines whether the CPU tracks the FP opcode (aka fopcode) for every instruction as before, or whether it’s updated only upon encountering an exception. Newer Intel CPUs no longer support constant updating of the FP opcode at all and only update it when exceptions occur.
None of that is a problem for PMINFO.EXE. But the next step that Intel took to reduce x87 backward compatibility actually is.
In the Haswell and later CPUs, Intel introduced a new CPUID bit. When (in Intel parlance) CPUID.(EAX=07H,ECX=0H):EBX[bit 13] is set, the processor still tracks the last FP instruction code and data addresses, but no longer saves segment register values; that is, the code and data segment values are always stored as zeroes.
This problem most visibly impacts segmented protected-mode exception handlers, such as the one in DOS/16M.
While earlier changes, such as not always tracking the last FP opcode, are easily visible by software, they do not cause trouble in practice. But not saving the segment registers does in fact upset legacy off-the-shelf software. Not often, but it does. PMINFO.EXE is one of the victims, but far from the only one.
Possible Workarounds?
Working around the deficient CPUs is quite difficult. A naive approach would be to intercept the #MF (math fault) exception and record the current CS and DS, but that would be only sometimes correct.
The reason why the FPU separately tracks the instruction and data pointers is that, historically, the FPU was a completely separate chip running in parallel with the CPU. Math exceptions were reported asynchronously through the interrupt controller. The CPU could be doing more or less anything when the math interrupt arrived; the FPU itself had to provide the instruction pointer so that the math error handler could find out what actually faulted.
Even on modern CPUs where everything is one piece of silicon and floating-point errors are reported via #MF exceptions, the problem remains. The #MF exception is reported at some point after the instruction which caused it, namely on the next floating-point instruction or a WAIT instruction. But such an instruction could be executed in a different segment, or in a multi-tasking OS, in a different task.
That is in fact the case with the DOS/16M PMINFO.EXE. The #MF exception is triggered on a WAIT instruction in a floating-point emulator segment, which is different from the segment where the instruction causing the FP exception is.
The upshot is that by the time the #MF happens, it is too late to record the code and data segment values. The only possibility might be to force math instruction emulation with the CR0.EM bit, and track the current code and data pointers, but that would be quite intrusive and slow. At that point it may be simpler to just run the legacy code through software emulation.
Fortunately the impact of this problem is fairly limited. It is rare for software to handle math exceptions during normal operation; more often than not, math exceptions cause a fatal error, and in such cases the practical difference between terminating a program due to a math fault versus a general protection fault isn’t significant. While failing to fail properly is annoying, the program still fails either way.
There is a possible workaround that users may apply in some cases. Once upon a time, Microsoft provided a package called WINFLOAT.EXE described in KB article Q97265. Said package includes a utility called HIDE87.COM which hides a math co-processor from Windows 3.x applications, and possibly from some DOS applications. This forces software emulation built into Windows to be used, avoiding the deficiency of newer Intel CPUs.
Note that the WINFLOAT package can be used to get some sense of whether math exception handling works at all in a given setup. Here it is not working (as expected) on a Haswell CPU:
For comparison, here it is running on a non-crippled CPU:
To date, AMD processors provide better backward compatibility and do not suffer from this particular problem.
Addendum: Same Symptom, Different Cause
Around 2013, users of several virtualization products (VMware, VirtualBox, KVM, XP mode in Windows 7) complained of crashes in WIN87EM.DLL and similar. The symptom was identical, a math fault handler crashing because the code segment of a faulting FPU instruction was zero. Such reports can be found here, here, or here.
But the cause was quite different. It specifically affected 64-bit hypervisors running 32-bit or 16-bit guest software. In the course of normal operation, a hypervisor often needs to save and restore the FPU state, using FXSAVE/FXRSTOR or similar instructions.
The instructions all can save the FPU state in different formats; the two relevant formats are 64-bit with no segments and 64-bit offsets, or 32-bit with 16-bit segment and 32-bit offset.
A hypervisor can save the state twice, once in 32-bit and once in 64-bit format. That way it is possible to recover both the segments and 64-bit offsets. But when restoring state, the hypervisor is faced with a binary choice: Either restore the 64-bit format, zeroing the segment registers, or restore the 32-bit format, keeping the segment values but zeroing any high bits of 64-bit offsets.
It should now be apparent that if a 64-bit hypervisor only uses the 64-bit form of FPU save/restore instructions, the segment register contents stored in the FPU state will be lost after saving and restoring the FPU state. Depending on the hypervisor and guest combination, this loss can be rare and unpredictable, or it can happen with 100% reproducibility.
Hypervisors were fixed to selectively save and restore either 32-bit or 64-bit state. One possible approach is as follows: Save the 64-bit FPU state. If the high DWORD of either the code or data pointer is non-zero, keep this state and restore 64-bit state again. Otherwise save the FPU state again in 32-bit format, and restore it as 32-bit. This approach works well in practice and adapts to the software running in the guest.
As usual, the devil is in the details.
Update: Real Mode Is Broken Too
Readers pointed out that in real mode, recent Intel CPUs also save the state incorrectly, and do not save the full 20-bit (or 32-bit) linear address. This fact is not clearly documented by Intel, but the behavior has been confirmed on at least Haswell and Skylake CPUs.
Experimentation shows that the behavior in real mode is somewhat logical. The processor simply does not keep track of the segment register, ever. When in real mode, FSAVE simply saves the 16-bit IP value as the code pointer. Note that this is usually not the same value as the low 16 bits of the linear address would be.
In real mode, the consequences of not properly storing the FP code and data pointers aren’t as obvious. An exception handler will end up reading some more or less random memory location; it won’t crash, but it may not handle the exception correctly. This failure mode is, in a way, even worse–because it isn’t apparent that things are failing.
13 Responses to Failing to Fail
There are likely many x86 FPU related devils. I suspect that the optimization to throw away segment selectors in the x87 state is done only for FXSAVE. Can you try with fstenv?
Thanks,
Ruik
If that were the case, old software (which has no idea about FXSAVE) would have no trouble. But it does, because the CPU internally does not track the segments. Which is also what Intel’s documentation claims.
> This does not affect real-mode code (since only linear addresses are stored)
I just tried this on a skylake cpu in virtualized real mode and it didn’t store the full linear address just the ip and data offset. Maybe it’s different in real real mode?
It is. There is are separate 16-bit and 32-bit layouts of the FPU state for real and protected mode.
Again, the original 8087 format (16-bit real mode) stored the linear address because that was the only information the FPU had.
Sure, but I meant it’s running in vmx mode with the virtual machine in real mode so the fsave fpu state would be the 16bit real mode one.
OK, then yes. And you’re presumably really talking about FSAVE, not FXSAVE. The Intel documentation does not indicate that the FPU instruction/data offsets would be chopped to 16 bits in real mode on Haswell and later, but maybe they are?
OK, looking at what my Haswell CPU stores, yes, in real mode it’s messed up too. Only the address corresponding to the 16-bit offset is stored, and the segment is lost. This is not clearly documented.
In real mode it just doesn’t cause crashes. It almost certainly does cause subtle failures.
Hmmm … from the people that brought you the Pentium FDIV bug …
To be clear, the FDIV bug was not exactly well documented π This behavior, although it does break backward compatibility, was well documented by Intel. Which, in a way, makes it worse because they’re definitely not going to fix it!
It’s kind of a preview of x86-s when they intend to take a sledgehammer to backward compatibility.
x86-s may be the response to problems like this. There may be obscure pieces of backward compatibility that haven’t been relevant for the last 20 years and therefore missed in testing. If one doesn’t have to test for something because it doesn’t exist, one can’t fail the test.
The FDIV and the Sandybridge SATA bug were both caused by similar late in development changes that had no benefit to being rushed through. Yes, the next revision of the Pentium could have been smaller and new motherboards could have been built with fewer layers but taking advantage of the changes wasn’t going to happen for another year. I hope the Haswell bug was caused by not realizing the intricacies of x87 instead of altering at the last minute a design that should have been locked down.
I suspect they could have fixed it in microcode had they really wanted to. You’re right that it probably would be a large reduction in qa testing and it might permit them to toss a bunch of microcode in critical places where bases and limits have to be checked for every memory access when not in long mode.
Would like to play with DOS/16M development kit, where can I get it?
Hello:
Where can I download DOS/16M development kit? I need it to test a vintage application.
thanks
This site uses Akismet to reduce spam. Learn how your comment data is processed.