Compiler deal with source code as strings so in C++ for example when it encourage statement like unsigned char x = 150; it knows from type limits that unsigned char must be in range between 0 and 255.
My question is while the number 150 remain string what algorithm compiler use to compare digit sequence - 150 in this case - against type limits?
I made a simple algorithm to do that for type 'int' for decimal, octal, hexadecimal and little endian binary but i don't think compiler do such thing like that to detect overflow in numbers.
the algorithm i made are coded in C++:
typedef signed char int8;
typedef signed int int32;
#define DEC 0
#define HEX 1
#define OCT 2
#define BIN 3
bool isOverflow(const char* value, int32 base)
{
// left-most digit for maximum and minimum number
static const char* max_numbers[4][2] =
{
// INT_MAX INT_MIN
{ "2147483647", "2147483648" }, // decimal
{ "7fffffff", "80000000" }, // hexadecimal
{ "17777777777", "20000000000" }, // octal
{ "01111111111111111111111111111111", "10000000000000000000000000000000" } // binary
};
// size of strings in max_numbers array
static const int32 number_sizes[] = { 10, 8, 11, 32 };
// input string size
int32 str_len = strlen(value);
// is sign mark exist in input string
int32 signExist = ((base == DEC || base == OCT) && *value == '-');
// first non zero digit in input number
int32 non_zero_index = signExist;
// locate first non zero index
while(non_zero_index < str_len && value[non_zero_index] == 0) non_zero_index++;
// if non_zero_index equal length then all digits are zero
if (non_zero_index == str_len) return false;
// get number of digits that actually represent the number
int32 diff = str_len - non_zero_index;
// if difference less than 10 digits then no overflow will happened
if (diff < number_sizes[base]) return false;
// if difference greater than 10 digits then overflow will happened
if (diff > number_sizes[base]) return true;
// left digit in input and search strings
int8 left1 = 0, left2 = 0;
// if digits equal to 10 then loop over digits from left to right and compare
for (int32 i = 0; non_zero_index < str_len; non_zero_index++, i++)
{
// get input digit
left1 = value[non_zero_index];
// get match digit
left2 = max_numbers[signExist][i];
// if digits not equal then if left1 is greater overflow will occurred, false otherwise
if (left1 != left2) return left1 > left2;
}
// overflow won't happened
return false;
}
This algorithm can be optimized to work with all integers types but with float-point i have to make new one to work with IEEE float-point representation.
i think compilers use efficient algorithm to detect overflow other than mine, don't you?
-
Comparing numbers in string form is not an efficient method for most computers; they prefer their numbers not in text form. In general, most applications convert numerical text into internal numbers, then process the internal numbers. Processors like numbers in internal format and are expecially good at processing them in this fashion.Thomas Matthews– Thomas Matthews2011年06月01日 23:28:13 +00:00Commented Jun 1, 2011 at 23:28
-
the lexer has detect a number so it knows its type from its suffix, now its store the literal form and convert it to its numeric form, my question is what type of storage it will save number in it?, and how it detect if the number that converted match the one in literal form?Muhammad– Muhammad2011年06月01日 23:37:22 +00:00Commented Jun 1, 2011 at 23:37
5 Answers 5
Compilers handle it pretty much the easiest possible way: they convert the number to an integer or float as appropriate. There's no law that says the compiler can't convert from strings to some other representation as appropriate.
But now, consider your original problem; what about if you took the digits and just built routines to treat them as numbers? Say, for example, an algorithm that could take
6+5
and compute the sum as a two-digit string 11? Extend that to other operations and you could compute whether 32769 is greater than 32768 directly.
2 Comments
int as long as there no suffix so if i perform INT_MAX + INT_MAX, what is storage that compiler will use to store the result before it truncate it to destination type limit?INT_MAX+INT_MAX > 'INT_MAX`. There are lots of options, and some of the decisions may depend on the underlying hardware; for example, is there a way to detect overflow? You could, if you insist, us a BigNum implementation of some sort, trading space and performance for a guarantee of no realistic chance of overflow. Also, in C++ you aren't guaranteed the compiler will even detect overflow -- one way the compiler can handle it is by leaving the responsibility to you.It seems simplest for the compiler to convert the string representation into an integer in one step, and then compare against upper and lower bounds of the type in a secondary step.
I can't imagine why it would be better to compare strings.
For floats, the problem is harder due to precision and rounding.
Comments
I'm not sure what particular algorithms most compliers employ to do this, but here are a few options that could work:
The compiler could try using an existing library (for example, in C++, a
stringstream) to try to convert the string into the number of the appropriate type. This could then be used to check for errors.The compiler could convert the string into a very high-precision number format (for example, a 128-bit integer) and then check, whenever an assignment is made from a numeric literal to a primitive type, whether the value could fit in that range without a cast.
1 Comment
Seeing that compilers will have to convert to the integral/numeric type anyway, they can just as well let their atoi, atol, atof functions raise an error when the destination capacity gets exceeded.
There is no need to operate on strings beforehand, and convert in a separate step.
Most likely, I'd think, compilers will convert to integral types directly in their (highly optimized) parser's semantic actions.
Comments
In most compiler theory, the text of a program (translation unit) is converted into tokens and values. For example, the text "150" would be converted into a token of constant integer with a value of 150. This is of course, after the preprocessor has run.
The compiler then begins the process of syntax and semantic checking. So an assignment statement is evaluated for syntax (correct spelling and format), then checked for semantics.
The compiler can either complain about a value that is out of range (such as -150 for unsigned char) or apply some transformations. In the case of -150, this would be transformed into an 8-bit value (the Most Significant Bit that indicated negativity is now the value 128). I am not a language lawyer, so I don't exactly know the freedom the compiler has in this respect, nor whether a warning is required or not.
In summary, the compiler has some freedoms when evaluating statements and checking the semantics. All text is converted into an internal representation for tokens and values (a more compact data structure). Checking for whether a constant integer literal is within range for an assignment statement takes place during the semantics stage of the compilation process. Semantics are decided from the language standard or company policy. Some semantics are turned into compiler options and left for the programmer.