6505 – Wrong code for expression involving 8 floats, only with -O

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.

Issue 6505 - Wrong code for expression involving 8 floats, only with -O

Summary: Wrong code for expression involving 8 floats, only with -O

Keywords:
Status:	RESOLVED FIXED
Alias:	None
Product:	D
Classification:	Unclassified
Component:	dmd (show other issues)
Version:	D2
Hardware:	Other Windows
Importance :	P2 blocker
Assignee:	No Owner
URL:
Depends on:
Blocks:

See Also:
Reported:	2011年08月16日 04:33 UTC by Don
Modified:	2015年06月09日 05:11 UTC (History)
CC List:	2 users (show)

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Don 2011年08月16日 04:33:20 UTC

Found this when I removed a reference to std.random from the DMD test suite. This bug is in test22.d. It was originally described in:
http://www.digitalmars.com/d/archives/digitalmars/D/bugs/4766.html
but wasn't reduced correctly -- it has never been fixed.
It only applies when compiling with -O: the result is -nan.
----
real randx()
{
 return 1.2;
}
void test1()
{
 float x10=randx();
 float x11=randx();
 float x20=randx();
 float x21=randx();
 float y10=randx();
 float y11=randx();
 float y20=randx();
 float y21=randx();
 float tmp=(
 x20*x21 + y10*y10 + y10*y11 + y11*y11 + 
 y11*y20 + y20*y20 + y10*y21 + y11*y21 +
 y21*y21);
 assert(tmp > 0);
}

Comment 1 bearophile_hugs 2011年08月16日 06:37:50 UTC

Simplified a little:
double foo() {
 return 1.0;
}
void main() {
 double a = foo();
 double b = foo();
 double x = a*a + a*a + a*a + a*a + a*a + a*a + a*a +
 a*b + a*b;
 assert(x > 0);
}
---------------------------
Asm normal compilation:
_D4test3fooFZd	comdat
		fld1
		ret
__Dmain	comdat
L0:		enter	024h,0
		call	near ptr _D4test3fooFZd
		fstp	qword ptr -018h[EBP]
		call	near ptr _D4test3fooFZd
		fstp	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fdecstp
		fstp	qword ptr -024h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		fld	qword ptr -024h[EBP]
		faddp	ST(1),ST
		fstp	qword ptr -8[EBP]
		fld	qword ptr -8[EBP]
		ftst
		fstsw	AX
		sahf
		fstp	ST
		ja	L7B
		mov	EAX,9
		call	near ptr _D4test8__assertFiZv
L7B:	xor	EAX,EAX
		leave
		ret
----------------------------
Asm compilation with -O:
_D4test3fooFZd	comdat
		fld	qword ptr FLAT:_DATA[00h]
		ret
__Dmain	comdat
L0:		sub	ESP,034h
		call	near ptr _D4test3fooFZd
		fstp	qword ptr 0Ch[ESP]
		call	near ptr _D4test3fooFZd
		fld	qword ptr 0Ch[ESP]
		fld	qword ptr 0Ch[ESP]
		fxch	ST2
		fstp	qword ptr 014h[ESP]
		fmul	qword ptr 014h[ESP]
		fxch	ST1
		fld	qword ptr 0Ch[ESP]
		fxch	ST1
		fmul	qword ptr 014h[ESP]
		fxch	ST1
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fdecstp
		fld	qword ptr 0Ch[ESP]
		fxch	ST1
		fstp	qword ptr [ESP]
		fmul	ST,ST(0)
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		fld	qword ptr [ESP]
		faddp	ST(1),ST
		ftst
		fstsw	AX
		fstp	ST
		sahf
		ja	L84
		mov	EAX,9
		call	near ptr _D4test8__assertFiZv
L84		add	ESP,034h
		xor	EAX,EAX
		ret
------------------------

Comment 2 Don 2011年08月23日 01:52:36 UTC

Thanks bearophile, that reduction is very helpful.
The code loads 8 values on the x87 stack. The eighth one would cause an overflow, so in cg87.c, push87(), it inserts:
fdecstp;
fstp [EBP - 24];
The next line is:
fld [EBP - 18];
The code passes to the scheduler, where there's a bug: it converts the code to:
fdecstp;
fld [EBP - 18];
fxch ST(1);
fstp [EBP - 24];
This is an x87 stack overflow, since the load happens before space has been made in the save. The x87 returns a NaN.

Comment 3 Don 2011年08月23日 12:19:44 UTC

Here's the problem. The scheduler keeps track of the number of
used x87 registers via the "fpustackused" variable.
Each instruction has a "fpuadjust" value which says how many x87
registers it uses.
The problem is CALL instructions. They have fpuadjust = 0.
But, a function returning float returns it in ST(0) --- so fpuadjust
should be +1.
And if it's a complex result, fpuadjust should be +2.
I think it should be possible to put an assert in cgsched.c, line 2491:
 if (tbl[j])
 {
 fpu += tbl[j]->fpuadjust;
+ assert(fpu >= 0);
 if (fpu >= 8) // if FPU stack overflow
but this would fail at the moment, because the total goes negative
after a call followed by an FSTP.
So it needs to distinguish between each type of call, depending on how
it affects the FPU stack.
I think the bug lies in funccall() in cod1.c; it should be setting
something in 'code' to record how many x87 registers are returned.
Then, there'll be another minor problem: the number of FPU values CAN
reach 8. This is because of the code in push87() in cg87.c, which does
fdecstp; fstp; when the stack is full. So the condition above will
need to change to:
 if (fpu > 8) // if FPU stack overflow
This is too invasive a change for me to make and test.

Comment 4 Walter Bright 2011年08月23日 15:23:43 UTC

https://github.com/D-Programming-Language/dmd/commit/628640f45bbc2bed3e84e835e19cf03ee2c4eafe
https://github.com/D-Programming-Language/dmd/commit/a92f6bf42e547c6978e4e317b4feb2c6610cf518