1865 – Escape sequences are flawed.

D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.

Issue 1865 - Escape sequences are flawed.

Summary: Escape sequences are flawed.

Keywords:
Status:	RESOLVED FIXED
Alias:	None
Product:	D
Classification:	Unclassified
Component:	dmd (show other issues)
Version:	D1 (retired)
Hardware:	x86 Linux
Importance :	P1 critical
Assignee:	Walter Bright
URL:
Depends on:
Blocks:

See Also:
Reported:	2008年02月24日 15:32 UTC by Aziz Köksal
Modified:	2014年02月24日 15:33 UTC (History)
CC List:	0 users

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description Aziz Köksal 2008年02月24日 15:32:06 UTC

The specs state (http://www.digitalmars.com/d/1.0/lex.html):
"Although string literals are defined to be composed of UTF characters, the octal and hex escape sequences allow the insertion of arbitrary binary data."
This holds true for normal string literals (e.g. "abc") but not for escape string literals. For instance:
auto str = \xDB;
pragma(msg, typeof(str).stringof); // Should be char[1u] but prints: char[2u]
auto str2 = "\xDB";
pragma(msg, typeof(str2).stringof); // Prints: char[1u]
static assert(\xDB == "\xDB"); // Should be equal, but aren't.
I also found out that octal escape sequences are fundamentally flawed.
The highest possible octal value is 0777 which equals 0x1FF in hex. It seems like dmd doesn't know this.
pragma(msg, '777円'.stringof); // Prints: '\xff'
static assert('777円' == 0x1FF); // Shouldn't fail.
static assert('777円' == 0xFF); // Shouldn't pass.
static assert('377円' == 0xFF); // Passes as they are really equal.
As we can see values from 0400 to 0777 need two bytes to be represented correctly. Therefore, when the lexer encounters string literals like 400円 to 777円 or "400円" to "777円" then it must use two bytes to encode it into the string value. Example:
char[2] str = 777円;
static assert(str[0] == 1 && str[1] == 0xFF);
I think it's appropriate to mark this bug report as critical.

Comment 1 Aziz Köksal 2008年02月24日 16:43:29 UTC

I changed my mind regarding the octal escape sequences. I looked at how Python deals with it and also asked in the #python channel. In Python "777円" also results in "\xFF". I was told that 0ooo and \ooo are two different kind of things, the first one being an integer and the second one being a character. So never mind anymore the second part of my original posting.

Comment 2 Janice Caron 2008年02月24日 16:48:26 UTC

On 24/02/2008, d-bugmail@puremagic.com <d-bugmail@puremagic.com> wrote:
> The highest possible octal value is 0777 which equals 0x1FF in hex. It seems
> like dmd doesn't know this.
Wait, wait, wait. Shouldn't the highest possible octal value be 0377?
That is, shouldn't we just /disallow/ 0400 to 0777 inclusive?
The whole point is to define a BYTE, after all.

Comment 3 Aziz Köksal 2008年02月25日 11:21:35 UTC

(In reply to comment #2)
> On 24/02/2008, d-bugmail@puremagic.com <d-bugmail@puremagic.com> wrote:
> The whole point is to define a BYTE, after all.
Good objection. I think we could compare this to Unicode escape sequences. The compiler complains when you specify values higher than \U0010FFFF (highest codepoint.) Likewise, the compiler should probably give an error for octal escape sequences higher than 377円.
At the moment, it doesn't feel quite right that anything higher than 377円 is silently treated as 0xFF. Other languages apparently don't report an error or throw an exception, but I vote that a D compiler should report one.

Comment 4 Walter Bright 2008年03月07日 00:34:36 UTC

Fixed dmd 1.028 and 2.012