Archives
- October 2025
- September 2025
- August 2025
- July 2025
- June 2025
- May 2025
- April 2025
- March 2025
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- January 2011
- November 2010
- October 2010
- August 2010
- July 2010
Learn Something Old Every Day, Part VI: Backward Buffer Overwrite
A few days ago I spent far too much time debugging a largish piece of 16-bit Windows code written in assembler. I found a scenario where (fortunately fairly reproducibly) Windows crashed because the internal state of a DLL got corrupted.
I could see that the state made no sense and a supposedly small rectangle was suddenly tens of thousands of pixels large, causing segment overruns when it was copied. The internal state of the DLL was corrupted, and it very much looked like a buffer overflow.
I added size checks to make sure nearby buffers weren’t being overwritten, but the checks never fired. Or rather they only fired when the state was already corrupted.
Then I tried reshuffling the data so that the buffer which I suspected of overflowing was at the very end of the data segment, hoping that it would cause a protection fault when the buffer overrun happened. But the fault never happened, and the state was still getting corrupted.
And then it finally hit me. The DLL’s internal state was getting corrupted not because a data copy ran past the end of a buffer, but because it was copying data in the wrong direction. In some situations, a REP MOVSB instruction could be executed with the correct source and destination address and the correct count, but with the CPU direction flag set (rather of clear), causing the copy to decrement addresses instead of incrementing.
This is a situation programmers nowadays just don’t think of. Backward copying is generally avoided because CPUs are not good at it and it’s therefore slow. In a language like C, buffer overflows can happen, but only running past the end of a buffer, not overwriting data located in memory before the start of a buffer (it’s possible to decrement an array index so that it points below the first element, but that is rare). The cause of an overflow is invariably a size problem, with the amount of data copied being larger than the size of the destination buffer. That’s what I was looking for, but the bug was something different.
ABIs generally assume that the CPU direction flag must be clear, and if some code sets it, then it must clear the flag again before returning to the caller or before calling other functions. String move instructions thus always work in the “up” direction unless someone explicitly sets the direction flag (which implies assembler code), and such buffer “underflows” normally can’t happen. But the CPU is still capable of executing them and with old and sketchy code, it is something to be aware of.
24 Responses to Learn Something Old Every Day, Part VI: Backward Buffer Overwrite
Nasty.
One could argue, though, that backwards operations are as natural as
forwards ones. It’s just that wetware doesn’t generally think that way
(and then some wetware smells “optimization!” and makes backwards
operations needlessly more expensive…).
In 6502 assembly, it’s extremely common to run loops “backwards”, since it’s more efficient. For example, a simple (only supporting copies of up to a page), forward memcpy() would look like this in 6502 (apologies if this is wrong, my 6502 is rusty) :
LDX #00ドル
:Loop
CPX Length
BEQ Exit
LDA Source,X
STA Dest,X
INX
JMP Loop
:Exit
But by running it backwards (I’m almost certain this is wrong in some way, don’t come crying to me if your C64 locks up running this :D) :
LDX Length
:Loop
BEQ Exit
LDA Source,X
STA Dest,X
DEX
JMP Loop
:Exit
You can save a few bytes and a few processor cycles because the branch instructions run off the zero flag, which is cleared by a lot of instructions – so it behaves like there’s an implicit CPX #00ドル before the BEQ. I’ve done similar things to this occasionally on x86, in extremely constrained environments where every byte counts (e.g. boot sectors, binary patches, etc).
Prefetching, read-ahead, etc. has to “guess” the direction it’s going to use. You can either work with it, or against it. One of those will be a lot faster. That does not make the “wrong direction” needlessly slower, it just makes the “right direction” a lot faster.
That makes sense. I guess on x86 where CX is used as a separate counter there’s no inherent advantage in going in one direction or another.
Having the direction of copying controlled by a separate flag (rather than making the direction part of the opcode as, for example, the Z80 does) feels like forcing the programmer / compiler to use global state to solve a local problem.
Yes, in retrospect, having separate instructions or prefixes for backward/forward direction would have been soooo much cleaner. The direction flag is pretty much an accident waiting to happen.
There are a lot of instructions that use the direction flag. Adding a second version for wrong way usage would consume a lot of opcodes. Having two otherwise identical long routines for copying or comparison except every instruction indicates direction would make code that much more complex.
This all would seem to be a downside of automagic indexing.
Even so, me’d argue that wasting a bit of opcode space for directional
generality is not that great an issue these days. (Unless the design is
wasteful in other ways, of course.)
This should make it obvious that there *are* limits to the scalability
of an architecture: cramp the opcode space to sacrifice orthogonality
(and thereby shove complexity towards the software end of things), or
go for a clean, orthogonal (one might even say “agnostic”) instruction
set that takes up, say, an “extra” nibble per instruction?
Me can’t think of a universal answer. The specific answer is that in the
past, we didn’t have quite as much of a choice in the matter, as we
simply couldn’t make dense enough circuits yet.
In that light, having an external direction flag could indeed very well
make sense, but only for limited (to our current standards) designs.
Either way, essentially fixing the direction in hardware (“go forwards,
or suffer)” is not necessarily justified.
A similar problem existed in Windows 95 and NT 4.0 when PeekMessage32() was invoked with the direction flag set, causing General Protection Faults. This was documented in a short article in the German c’t magazine, issue 9/1997, page 272. Unfortunately, this article is paywalled (https://www.heise.de/select/ct/archiv/1997/9/seite-272).
Interesting. Strictly speaking, I don’t think that is a bug in Windows. It is a bug in an application that violates Windows calling conventions.
A violation of Windows calling conventions?
This makes me wonder about differences on Windows versions, as Windows 3.x was co-operative multi-tasking, while NT/9x are pre-emtive multi-tasking.
When running on a co-operative multi-tasking OS, the application gives control back to the OS, and has to make sure the flag is set to the correct direction prior to doing so.
What happens when such application runs on Windows 9x or NT. I know NT runs 16 bit application in a Virtual Machine (NTVDM), but I am not sure what happens on 9x.
It’s not that simple. Even in Windows 3.1, where applications are not really separated from each other in a meaningful way, they don’t just pass control back and forth. Windows still handles the task switching, and it is entirely plausible that some central piece of code always clears the direction flag before handing control over to the next task. The problem I debugged was related to hardware interrupts, which is a somewhat different case. Though I did not expect that the direction flag would just be left as it happened to be at the moment the interrupt occurred.
On NT, processes are fully separated and the direction flag is strictly local to a process. Win9x I’m not entirely sure about, but I suspect 32-bit and 16-bit applications might behave differently. Win95 is in some ways extremely complex because it’s a weird mish-mash of everything.
Note that it’s not actually true that copying in forward directions is more efficient and thus backward copying is not used.
There was huge uproar when memcpy in glibc on Linux started copying bytes going from the end to the beginning: https://lwn.net/Articles/416821/
This broke Flash, but, well... it’s wrong to say that "backward copying is generally avoided because CPUs are not good at it and it’s therefore slow" when in reality it’s employed to make things faster.
There are certainly cases where forward copying is noticeably faster: https://stackoverflow.com/questions/57137074/movsd-performance-depends-on-arguments
If you end up copying entire cache lines then yes, it really shouldn’t matter which way the copying goes.
Me supposes that a generic copy routine would detect the correct
direction to go in.
But yes, breaking the specification of an established routine such as
memcpy(3) just leads to a world of trouble.
That glibc thing is an interesting philosphical discussion. The fundamentalists will always say that software not written according to the spec deserves to broken, no matter how much trouble it might cause. More practically oriented people will argue that breaking things that used to work is not progress and should be avoided.
This fundamentalist believes that when an interface specification has
turned out to be sufficiently unclear, that interface should be
deprecated (not necessarily removed!) in favour of an (likely new)
unambigious interface.
But that’s me.
You are missing one important fact: not only memcpy was never supposed to be used with objects that overlap, C also included memmove, evil cousin which is supposed to be used when they do, in fact, overlap! And it was there since the very beginning!
Thus it’s not problem of "sufficiently unclear" specification, but rather an attempt to use memcpy in a way which is very clearly unsupported!
This being said the final solution was kinda sane: since GLibC supported versioning since version 2.0 they simply made memcpy synonym for the memmove when used with old programs (but kept new implementation for new, recompiled, programs).
That way old Flash binaries were working again but new programs should be written sensibly to work.
C doesn’t include any routine. You must be confusing it with the
standard library (which, despite often being called ‘libc’, is not
C-specific).
“Unsupported”. Me’ll offer another one: “must be used instead”. Nice ‘n
vague, don’t you think? Manual pages that hint at “wrong” use, but don’t
just specify what the damn routine actually does (especially in case of
simple operations), should be rewritten. In the same vein, me’d very
much argue that memmove(3) itself has an ambigious and somewhat
unexpected name: one might reasonably expect data moves to be
necessarily destructive to the original (while w/ memmove(3), that is
only the case for the overlapping area, if any).
(Tune in another time for me latest developing pet peeve, “transfer” for
copy.)
Versioned symbols embedded in versioned monolithic libraries (as opposed
to, say, a ‘file per object’ scheme) are a special hell that the
inventor should forever be locked up in.
BTW, the OpenBSD memmove(3) manual page says that it appeared in
4.3BSD-Reno. Do you have contrary information…? (Or just a liberal
idea of “the very beginning” :?)
From the C89 standard: “The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.”
Also from the C89 standard: “The memmove function copies n characters from the object pointed to by s2 into the object pointed to by s1. Copying takes place as if the n characters from the object pointed to by s2 are first copied into a temporary array of n characters that does not overlap the objects pointed to by s1 and s2, and then the n characters from the temporary array are copied into the object pointed to by s1.”
Once again, some people understand undefined behavior as “must be broken at any cost” and others as “if it works that way, leave it be”. As I said, it is a philosophical question.
Indeed, the C… `standards’ specify many things not really part of C
(and often at odds with the very premise of C; this started at least as
early as [0]).
But your point is well-taken 🙂
Leaves me to note that even defined behaviour sometimes changes,
especially when there’s no observable change to the caller. OpenBSD
memmove(3), while in itself somewhat convoluted, certainly does *not*
use an intermediate array; instead, it follows the approach of
determining the non-destructive direction to copy in.
[0] https://www.lysator.liu.se/c/dmr-on-noalias.html
The “as if” language is not meant to suggest how things should be implemented, it’s just a shorthand for defining the expected results.
You’re right — me plainly read over that.
This site uses Akismet to reduce spam. Learn how your comment data is processed.