5717 – 1.067 regression: appending Unicode char to string broken

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 5717 - 1.067 regression: appending Unicode char to string broken
Summary: 1.067 regression: appending Unicode char to string broken
Status: RESOLVED FIXED
Alias: None
Product: D
Classification: Unclassified
Component: dmd (show other issues)
Version: D1 (retired)
Hardware: x86 Windows
: P2 regression
Assignee: No Owner
URL:
Keywords: patch, wrong-code
Depends on:
Blocks:
Reported: 2011年03月07日 17:18 UTC by Vladimir Panteleev
Modified: 2011年03月11日 08:35 UTC (History)
3 users (show)

See Also:


Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.
Description Vladimir Panteleev 2011年03月07日 17:18:53 UTC
void main()
{
	string s, s2; 
	s = "Привет";
	foreach(c; s)
		s2 ~= c;
	assert(s == s2);
}
DMD now seems to consider each individual char a whole code point (as if it was automatically promoted to dchar).
Comment 1 Sohgo Takeuchi 2011年03月09日 04:35:00 UTC
Same problem happens on FreeBSD 8.2 with DMD 1.067 too.
But the problem does not happen with DMD 1.066.
Comment 2 Don 2011年03月10日 01:14:55 UTC
I think this is a foreach problem.
Probably triggered by the fix to bug 4389.
Comment 3 Vladimir Panteleev 2011年03月10日 01:17:37 UTC
It doesn't look like a foreach problem. This fails too:
void main()
{
 string s, s2; 
 s = "Привет";
 for (int i=0; i<s.length; i++)
 s2 ~= s[i];
 assert(s == s2);
}
Comment 4 Don 2011年03月10日 04:26:27 UTC
(In reply to comment #3)
> It doesn't look like a foreach problem. This fails too:
Hmm. You're right. And yet it works fine on D2. 
It's inserting a call to _d_arrayappendcd, which means the append has been changed into char[] ~ dchar.
Comment 5 Don 2011年03月10日 07:13:37 UTC
It was indeed caused by the fix to bug 4389, which wasn't tight enough. 
s~= c shouldn't turn c into a dchar, if both s and c are the same type. (ie, char[]~=char should go through unaltered). That leaves wchar[] ~ char, which I think is inevitably a mess if c is outside the ASCII range.
expression.c, line 8593. CatAssignExp::semantic()
 { // Append array
 e2 = e2->castTo(sc, e1->type);
 type = e1->type;
 e = this;
 }
 else if (tb1->ty == Tarray &&
 (tb1next->ty == Tchar || tb1next->ty == Twchar) &&
+ e2->type->ty != tb1next->ty &&
 e2->implicitConvTo(Type::tdchar)
 )
 { // Append dchar to char[] or wchar[]
 e2 = e2->castTo(sc, Type::tdchar);
 type = e1->type;
 e = this;
 /* Do not allow appending wchar to char[] because if wchar happens
 * to be a surrogate pair, nothing good can result.
 */
Comment 6 Sohgo Takeuchi 2011年03月10日 19:27:38 UTC
(In reply to comment #5)
I've tried Don's patch, it works good in my environment.
That's great.
Thank you.
Comment 8 Vladimir Panteleev 2011年03月11日 08:35:49 UTC
Thanks - not sure what the second commit has to do with it, though.


AltStyle によって変換されたページ (->オリジナル) /