6505 – Wrong code for expression involving 8 floats, only with -O

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 6505 - Wrong code for expression involving 8 floats, only with -O
Summary: Wrong code for expression involving 8 floats, only with -O
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D2
Hardware: Other Windows
: P2 blocker
Assignee: No Owner
URL:
Keywords:
Depends on:
Blocks:
Reported: 2011年08月16日 04:33 UTC by Don
Modified: 2015年06月09日 05:11 UTC (History)
2 users (show)

See Also:


Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.
Description Don 2011年08月16日 04:33:20 UTC
Found this when I removed a reference to std.random from the DMD test suite. This bug is in test22.d. It was originally described in:
http://www.digitalmars.com/d/archives/digitalmars/D/bugs/4766.html
but wasn't reduced correctly -- it has never been fixed.
It only applies when compiling with -O: the result is -nan.
----
real randx()
{
 return 1.2;
}
void test1()
{
 float x10=randx();
 float x11=randx();
 float x20=randx();
 float x21=randx();
 float y10=randx();
 float y11=randx();
 float y20=randx();
 float y21=randx();
 float tmp=(
 x20*x21 + y10*y10 + y10*y11 + y11*y11 + 
 y11*y20 + y20*y20 + y10*y21 + y11*y21 +
 y21*y21);
 assert(tmp > 0);
}
Comment 1 bearophile_hugs 2011年08月16日 06:37:50 UTC
Simplified a little:
double foo() {
 return 1.0;
}
void main() {
 double a = foo();
 double b = foo();
 double x = a*a + a*a + a*a + a*a + a*a + a*a + a*a +
 a*b + a*b;
 assert(x > 0);
}
---------------------------
Asm normal compilation:
_D4test3fooFZd	comdat
		fld1
		ret
__Dmain	comdat
L0:		enter	024h,0
		call	near ptr _D4test3fooFZd
		fstp	qword ptr -018h[EBP]
		call	near ptr _D4test3fooFZd
		fstp	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -010h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		fdecstp
		fstp	qword ptr -024h[EBP]
		fld	qword ptr -018h[EBP]
		fmul	qword ptr -018h[EBP]
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		fld	qword ptr -024h[EBP]
		faddp	ST(1),ST
		fstp	qword ptr -8[EBP]
		fld	qword ptr -8[EBP]
		ftst
		fstsw	AX
		sahf
		fstp	ST
		ja	L7B
		mov	EAX,9
		call	near ptr _D4test8__assertFiZv
L7B:	xor	EAX,EAX
		leave
		ret
----------------------------
Asm compilation with -O:
_D4test3fooFZd	comdat
		fld	qword ptr FLAT:_DATA[00h]
		ret
__Dmain	comdat
L0:		sub	ESP,034h
		call	near ptr _D4test3fooFZd
		fstp	qword ptr 0Ch[ESP]
		call	near ptr _D4test3fooFZd
		fld	qword ptr 0Ch[ESP]
		fld	qword ptr 0Ch[ESP]
		fxch	ST2
		fstp	qword ptr 014h[ESP]
		fmul	qword ptr 014h[ESP]
		fxch	ST1
		fld	qword ptr 0Ch[ESP]
		fxch	ST1
		fmul	qword ptr 014h[ESP]
		fxch	ST1
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fld	qword ptr 0Ch[ESP]
		fmul	ST,ST(0)
		fdecstp
		fld	qword ptr 0Ch[ESP]
		fxch	ST1
		fstp	qword ptr [ESP]
		fmul	ST,ST(0)
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		faddp	ST(1),ST
		fld	qword ptr [ESP]
		faddp	ST(1),ST
		ftst
		fstsw	AX
		fstp	ST
		sahf
		ja	L84
		mov	EAX,9
		call	near ptr _D4test8__assertFiZv
L84		add	ESP,034h
		xor	EAX,EAX
		ret
------------------------
Comment 2 Don 2011年08月23日 01:52:36 UTC
Thanks bearophile, that reduction is very helpful.
The code loads 8 values on the x87 stack. The eighth one would cause an overflow, so in cg87.c, push87(), it inserts:
fdecstp;
fstp [EBP - 24];
The next line is:
fld [EBP - 18];
The code passes to the scheduler, where there's a bug: it converts the code to:
fdecstp;
fld [EBP - 18];
fxch ST(1);
fstp [EBP - 24];
This is an x87 stack overflow, since the load happens before space has been made in the save. The x87 returns a NaN.
Comment 3 Don 2011年08月23日 12:19:44 UTC
Here's the problem. The scheduler keeps track of the number of
used x87 registers via the "fpustackused" variable.
Each instruction has a "fpuadjust" value which says how many x87
registers it uses.
The problem is CALL instructions. They have fpuadjust = 0.
But, a function returning float returns it in ST(0) --- so fpuadjust
should be +1.
And if it's a complex result, fpuadjust should be +2.
I think it should be possible to put an assert in cgsched.c, line 2491:
 if (tbl[j])
 {
 fpu += tbl[j]->fpuadjust;
+ assert(fpu >= 0);
 if (fpu >= 8) // if FPU stack overflow
but this would fail at the moment, because the total goes negative
after a call followed by an FSTP.
So it needs to distinguish between each type of call, depending on how
it affects the FPU stack.
I think the bug lies in funccall() in cod1.c; it should be setting
something in 'code' to record how many x87 registers are returned.
Then, there'll be another minor problem: the number of FPU values CAN
reach 8. This is because of the code in push87() in cg87.c, which does
fdecstp; fstp; when the stack is full. So the condition above will
need to change to:
 if (fpu > 8) // if FPU stack overflow
This is too invasive a change for me to make and test.


AltStyle によって変換されたページ (->オリジナル) /