Converting a string to an Integer in C

Question 1

I'm learning C and one of the questions I've been asked is converting a string to an integer. The code I've written supports converting from a string in any base up to 16/hex to an integer. There is no over/underflow checking and it does not support lowercase hex.

int ASCIIToInteger(char *x, int base)
{
 //Each element is the ASCII for it's index. 
 int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; _Bool negative = 0;
 int count = 0, output = 0; 
 if (x[0] == '-')
 {
 negative = 1;
 count = 1;
 }
 else if (x[0] == '+')
 {
 count = 1;
 }
 do
 {
 for (int i = 0; i < base; i++)
 { 
 if (ASCIICodes[i] == (int)x[count])
 {
 output = output * base + i;
 break;
 }
 }
 count++;
 } while (x[count] != 0);
 if (negative == 1)
 {
 return ~output + 1;
 }
 return output;
}

Can I please get some advice regarding any possible issues or "rookie mistakes" I may have made?

Question 2

Is this intended to be reinventing the strtol (2) function?

Question 3

@Mast I wasn't aware of the function at the time but looking at the description I guess it is.

Question 4

There is also atoi (string to integer) and atof (string to double) for plain old string-to-numeric conversion.

Question 5

for readability and understandability by us humans, please follow the axiom: only one statement per line and (at most) one variable declaration per statement

Question 6

Consider writing unit tests for a function like this. There are a lot of special cases (no digits, too many digits, non-digits, leading zeroes, etc.) that aren't handled.

Question 7

Avoid mysterious bit-shifting

return ~output + 1;

Is the same as:

return - (output + 1) + 1;

And as:

return - output;

But the last one is way more obvious than the first one.

Do not hide variable declarations

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; _Bool negative = 0;

If I look at the above line, the least thing in my mind is that a variable is defined after that list.

If a user has a small screen or a big font size he will be very puzzled when he will see you use negative without seeing its definition.

Just use a newline, they are free:

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' };
_Bool negative = 0;

Make variables as local as possible

I used x++ to express that I will skip the first character:

if (x[0] == '-')
{
 negative = true;
 x++;
}
else if (x[0] == '+')
{
 x++;
}

This allows me to make count local.

Using for loops when practical

The outer of your loops is a do while, but looking around I found all three components of a for loop, just shuffled into the code.

A for loop will give it more organization:

for (int count = 0; x[count] != 0; count++)
{
 for (int i = 0; i < base; i++)
 { 
 if (ASCIICodes[i] == x[count])
 {
 output = output * base + i;
 break;
 }
 }
}

Simplify the initial conditional

Assigning a boolean to negative directly and using a || conditional looks shorter and simpler:

We go from:

_Bool negative = 0;
int count = 0, output = 0; 
if (x[0] == '-')
{
 negative = 1;
 count = 1;
}
else if (x[0] == '+')
{
 count = 1;
}

to:

bool negative = x[0] == '-';
if (x[0] == '-' || x[0] == '+')
{
 x++;
}

While preserving identical functionality.

`it's` vs `its`

... How to choose between it's and its?

This is actually really easy, do you mean "it is" or not?

Use it's when you mean it is.

// Each element is the ASCII for it's index.

Here you should write its because you mean possession, intended as a property of the element.

Saying that:

// Each element is the ASCII for it is index.

Makes no sense.

So the correct comment will be:

// Each element is the ASCII for its index.

Question 8

I love the grammar police in this answer. I get sick of reading that mistake over and over in forums, texts, IMs, etc. Now I know it's in one less program, thanks to you!

Question 9

the other reason to avoid "mysterious bit shifting" is that (AFAICR) the use of two's complement representation of negative numbers is not actually mandated by the standard.

Question 10

@Alnitak kind of... non-two-complement machines are pretty arcane -> stackoverflow.com/questions/12276957/…

Question 11

Arcane, but legal ;) In any event it's better just to let the compile do the right thing and just express negation using the operator intended for that purpose.

Question 12

"it's" is also a contraction of "it has" -- it's been fun

Question 13

Write one statement per line

Avoid multiple statements on the same line, for example instead of:

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; _Bool negative = 0;

Break that up to multiple lines:

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' };
_Bool negative = 0;

This is because it's easier to read from top to bottom. If there are statements at the end of lines, that can be very distracting.

Eliminate pointless cast

No need to cast to int here:

if (ASCIICodes[i] == (int)x[count])

Simplify `ASCIICodes`

A simpler way to define ASCIICodes:

char ASCIICodes[] = "0123456789ABCDEF";

Simplify using a ternary operator

Instead of this:

if (negative)
{
 return ~output + 1;
}
return output;

You could simplify using the ternary operator:

return negative ? -output : output;

Or perhaps instead of a bool negative, you could use an int sign:

int sign = 1;
int count = 0, output = 0; 
if (x[0] == '-')
{
 sign = -1;
 count = 1;
}
// ....
return sign * output;

Use `bool` from `stdbool`

Although using _Bool is ok, it would be a good idea to #include <stdbool.h>, which defines bool (expands to _Bool) and true and false, for example:

#include <stdbool.h>
int ASCIIToInteger(char *x, int base)
{
 // ....
 bool negative = false;
 // ...
 if (x[0] == '-')
 {
 negative = true;
 count = 1;
 }
 // ...
 if (negative)
 {
 return ~output + 1;
 }

Question 14

You don't check for non-number characters. The sign is in the while-loop silently ignored, as any other symbol. What is the number of "HELLO" in hex? 14.

Question 15

I'm surprised that no-one has pointed out the O(m * base) complexity caused by the inner loop that performs a linear scan over the (unnecessary) ASCIICodes array.

This version of the outer loop removes the inner loop, supports lower cased characters, and also breaks the loop as soon as any illegal character is seen.

while ((n = x[count++])) {
 int ch = toupper((int)n);
 if ((ch >= '0' && ch <= '9') || (ch >= 'A' && ch <= 'F')) {
 int digit = ch - '0';
 if (digit > 9) {
 digit = 10 + ch - 'A'; // convert 'A' to 10, etc
 }
 if (digit < base) {
 output = output * base + digit;
 } else {
 break;
 }
 } else {
 break;
 }
}

Question 16

Accept both cases by folding input through toupper().

#include <ctype.h>
//...
 if (ASCIICodes[i] == toupper(x[count]))

But you can also define the set as a string and use strchr to do the searching.

Question 17

Note: (int) in toupper((int)x[count]) serves no purpose.

Question 18

True, but if toupper is a macro, this will suppress the occasional warning about indexing with a char type.

Question 19

The warning about "indexing with a char type" should not be suppressed by casting to int when using toupper(). That leads to UB when char is signed. Better to use toupper((unsigned char)x[count]). C only defines toupper() for values in the unsigned char range and EOF.

Question 20

Good point. Thanks. I have removed the cast from my answer, and will be more cautious about this in future.

Question 21

My version:

#include <ctype.h>
int atoib(const char *x, int base)
{
 unsigned s = 0, v = 0;
 unsigned char c = *x++;
 if (c == '-')
 s++, c = *x++;
 else if (c == '+')
 c = *x++;
 if (base <= 10) {
 if (base >= 2) {
 unsigned M = '0' + base - 1;
 while (c >= '0' && c <= M) {
 v = v * base + c - '0';
 c = *x++;
 }
 }
 } else {
 if (base <= ('Z' - 'A' + 11)) {
 unsigned M = 'A' + base - 11;
 for (;;) {
 if (c >= '0' && c <= '9') {
 v = v * base + c - '0';
 } else {
 c = toupper(c);
 if (c >= 'A' && c <= M)
 v = v * base + c - 'A' + 10;
 else
 break;
 }
 c = *x++;
 }
 }
 }
 return (v ^ -s) + s;
}

Key points:

allow base values greater than 16;
check for valid base values (from 2 up to 36);
allow lowercase and uppercase digits (from a|A up to z|Z);
bail out on digits invalid for the given base and any non-digit;
avoid terminating null checks, the digit validation does it for free;
use arithmetic instead of lookup table to find the digit value (by the way, if you want to use a lookup table, then define it as static, otherwise it will be created on the stack and initialised on every invocation of your function);
the funny return expression saves one branch for two's-complement integers, if you are afraid of ever running on a non-two's-complement platform or just strive for clarity, then do return s ? -v : v; instead;

The proper handling of cases 2 and 4 still need some improvement, perhaps with errno assignment or by changing the interface to something akin to strtol or, if C++ is allowed, by throwing an exception. Also it might make sense to handle integer overflow errors.

The test code:

#include <stdio.h>
int
main()
{
 printf("%d %d %d %d %d %d %d %d\n",
 atoib("10", 0), atoib("10", 1), atoib("10", 2), atoib("10", 9),
 atoib("10", 10), atoib("10", 11), atoib("10", 36), atoib("10", 37));
 printf("%d %d %d %d %d %d %d %d\n",
 atoib("8", 9), atoib("9", 9), atoib("9", 10), atoib("A", 10),
 atoib("A", 11), atoib("B", 11), atoib("Z", 36), atoib("[", 36));
 printf("%d %d\n", atoib("-2", 10), atoib("+2", 10));
 return 0;
}

Caridorc Caridorc 28.1k7 gold badges54 silver badges137 bronze badges · Accepted Answer · 2016-01-03 21:20:57Z

Avoid mysterious bit-shifting

return ~output + 1;

Is the same as:

return - (output + 1) + 1;

And as:

return - output;

But the last one is way more obvious than the first one.

Do not hide variable declarations

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' }; _Bool negative = 0;

If I look at the above line, the least thing in my mind is that a variable is defined after that list.

If a user has a small screen or a big font size he will be very puzzled when he will see you use negative without seeing its definition.

Just use a newline, they are free:

int ASCIICodes[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F' };
_Bool negative = 0;

Make variables as local as possible

I used x++ to express that I will skip the first character:

if (x[0] == '-')
{
 negative = true;
 x++;
}
else if (x[0] == '+')
{
 x++;
}

This allows me to make count local.

Using for loops when practical

The outer of your loops is a do while, but looking around I found all three components of a for loop, just shuffled into the code.

A for loop will give it more organization:

for (int count = 0; x[count] != 0; count++)
{
 for (int i = 0; i < base; i++)
 { 
 if (ASCIICodes[i] == x[count])
 {
 output = output * base + i;
 break;
 }
 }
}

Simplify the initial conditional

Assigning a boolean to negative directly and using a || conditional looks shorter and simpler:

We go from:

_Bool negative = 0;
int count = 0, output = 0; 
if (x[0] == '-')
{
 negative = 1;
 count = 1;
}
else if (x[0] == '+')
{
 count = 1;
}

to:

bool negative = x[0] == '-';
if (x[0] == '-' || x[0] == '+')
{
 x++;
}

While preserving identical functionality.

`it's` vs `its`

... How to choose between it's and its?

This is actually really easy, do you mean "it is" or not?

Use it's when you mean it is.

// Each element is the ASCII for it's index.

Here you should write its because you mean possession, intended as a property of the element.

Saying that:

// Each element is the ASCII for it is index.

Makes no sense.

So the correct comment will be:

// Each element is the ASCII for its index.

I love the grammar police in this answer. I get sick of reading that mistake over and over in forums, texts, IMs, etc. Now I know it's in one less program, thanks to you!
the other reason to avoid "mysterious bit shifting" is that (AFAICR) the use of two's complement representation of negative numbers is not actually mandated by the standard.
@Alnitak kind of... non-two-complement machines are pretty arcane -> stackoverflow.com/questions/12276957/…
Arcane, but legal ;) In any event it's better just to let the compile do the right thing and just express negation using the operator intended for that purpose.

Stack Exchange Network

Converting a string to an Integer in C

6 Answers 6

`it's` vs `its`

Write one statement per line

Eliminate pointless cast

Simplify `ASCIICodes`

Simplify using a ternary operator

Use `bool` from `stdbool`

Hot Network Questions

6 Answers 6

Write one statement per line

Eliminate pointless cast

Simplify ASCIICodes

Simplify using a ternary operator

Use bool from stdbool

Related

Simplify `ASCIICodes`

Use `bool` from `stdbool`