I was given a school assignment to implement atoi()
. Please review my code.
int my_atoi(char *ato)
{
int res;
int sign;
res = 0;
sign = 1;
while (*ato == ' ' || *ato == '\t' || *ato == '\n' || *ato == '\f' || *ato == '\r')
ato++;
if (*ato == '-')
sign = -1;
if (*ato == '-' || *ato == '+')
ato++;
while (*ato >= '0' && *ato <= '9')
{
res = res * 10 + *ato - '0';
ato++;
}
return (sign * res);
}
-
\$\begingroup\$ You should probably show us your test suite too - the overflow bugs in this code suggest that you are missing the tests for very large (positive or negative) values. \$\endgroup\$Toby Speight– Toby Speight2016年06月13日 16:48:59 +00:00Commented Jun 13, 2016 at 16:48
3 Answers 3
Be consistent with stdlib.h
declaration of atoi()
The standard declaration is int atoi(const char *str);
. Your declaration should match (i.e., take a const char *
instead of char *
).
What happens if a NULL pointer is passed?
As written, you dereference ato
without checking if it is NULL (0). (削除) That is a big no-no in C. At the very top of the function, you should check if it's 0 and return 0 if it is: (削除ここまで)
if (!ato) { return 0; }
Edit: Martin R points out that silently returning 0 when a NULL pointer is passed to your function, as I have suggested, is bad programming practice, because it hides the error. If there's a possibility of calling myatoi()
with a NULL pointer, there's no way for the function to notify the caller of the error. Therefore, it's incumbent upon the caller to check before calling.
No need for separate variable definition and initialization
Do:
int res = 0;
int sign = 0;
Don't:
int res;
int sign;
res = 0;
sign = 1;
Don't reinvent the wheel
Your first loop that skips over whitespace could use the standard library function isspace()
:
Edit: spot the bug in my original code... :/
Buggy (oops): while (isspace(*ato++));
Correct: while (isspace(*ato)) ato++;
The condition check in the second loop can use the stdlib function isdigit()
:
while (isdigit(*ato)) {
...
ato++;
}
If your instructor or assignment instructions prevent you from using isspace()
and isdigit()
, why not go ahead and define them yourself? You already have the logic!*
* Except you have a minor bug in your check for space characters. You also need to check for vertical tab (\v
).
Edit: Chux points out that calling isspace(*ato)
and isdigit(*ato)
when *ato < 0
invokes undefined behavior (UB). Specifically, the C11 standard, section 7.4, Character Handling, states,
The header
<ctype.h>
declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.
The correct usage, then, of the isspace()
and isdigit()
functions in this code would be:
while (isspace(*(unsigned char *)ato)) { atoi++; }
//...
while (isdigit(*(unsigned char *)ato)) {
//...
ato++;
}
Chux also pointed out that your original code as written did not have this problem; you weren't invoking undefined behavior. Moral of the story: be careful of the advice you receive from strangers on the Internet...? =)
Other than the issues mentioned, you have a decent implementation!
-
1\$\begingroup\$ It is convenient to be able to set a flag to compile a debug implementation where you do check for null pointer and emit a diagnostic (and abort) \$\endgroup\$user14393– user143932016年06月13日 10:07:06 +00:00Commented Jun 13, 2016 at 10:07
-
1\$\begingroup\$ @Hurkyl of course, what you're referring to is the use of
assert()
macro. When you no longer want the heavy debugging instrumentation thatassert()
typically carries along with it, you defineNDEBUG
, and theassert()
typically gets defined out with something like,#define assert(ignore)((void) 0)
. Note thatNDEBUG
is not specified by C or C++ standards (although most compilers provide for it in their standard library). \$\endgroup\$scottbb– scottbb2016年06月13日 12:14:23 +00:00Commented Jun 13, 2016 at 12:14 -
1\$\begingroup\$ Actually, the C standard does specify that
NDEBUG
is used in that fashion (e.g. in a c11 draft, the section titled7.2 Diagnostics <assert.h>
) \$\endgroup\$user14393– user143932016年06月13日 12:21:06 +00:00Commented Jun 13, 2016 at 12:21 -
1\$\begingroup\$
isspace(*ato)
is UB when*ato < 0
. Op's white-space (though incomplete) is not UB. Same forisdigit()
. \$\endgroup\$chux– chux2016年06月13日 14:17:34 +00:00Commented Jun 13, 2016 at 14:17 -
1\$\begingroup\$ @scottbb Such abstraction is very useful, especially when we need portability. There is no problem with digits, as far as I know, recall, however, there are encodings (e.g. EBCDIC) where alphabet does not make a continuous block. In such encodings
'a' <= c && c <= 'z'
is not necessarily equivalent toislower(c)
... \$\endgroup\$CiaPan– CiaPan2016年06月16日 16:54:50 +00:00Commented Jun 16, 2016 at 16:54
Overall, I think this is a really good straightforward implementation, which means it's easy to read and understand. Nice work! I only have a few minor things I'd suggest:
Don't Reinvent the Wheel (Even When Reinventing the Wheel!)
There's a function to tell if something is whitespace. It's called isspace()
. Your first loop could be rewritten as:
while (isspace(*ato))
ato++;
Use const
where appropriate
The argument to your function shouldn't be changed (at least from the perspective of someone calling it). As such, it should be marked as const
:
int my_atoi(const char* ato)
(削除) Note that this will mean you'll have to copy it and have an extra local variable. But the trade-off in readability and in the ability to reason about the function is worth it. (削除ここまで) Now when developers who are using your function see the prototype, they'll immediately know that the value is not changed by the function.
EDIT: As usual, I forgot my const
rules. const type*
means that the thing pointed to by the pointer is constant, not that the pointer itself is constant. So forget what I said above. You can just use const char*
and not copy the pointer.
Naming
I would also clarify the names of variable. They're not terrible, but they could be better. Rather than res
, why not at least result
? Or better yet, something like numeric_value
or something more descriptive? Likewise ato
does not really tell me what it is. Maybe something like string_value
or strVal
at least. (In my opinion, the name of the library function is terrible, too. I didn't know it existed for the longest time because it's so poorly named.)
Other than that, this is pretty decent, in my opinion.
-
\$\begingroup\$ Crap, didn't notice you beat me to the punch! =) \$\endgroup\$scottbb– scottbb2016年06月13日 02:50:18 +00:00Commented Jun 13, 2016 at 2:50
-
\$\begingroup\$ No worries! You definitely caught some things I didn't. :-) \$\endgroup\$user1118321– user11183212016年06月13日 02:53:10 +00:00Commented Jun 13, 2016 at 2:53
-
1\$\begingroup\$ I don't think that making the argument
const
would require copying it to a local variable. You can still move the pointer. You just can't write to it. \$\endgroup\$JS1– JS12016年06月13日 04:34:33 +00:00Commented Jun 13, 2016 at 4:34 -
\$\begingroup\$ Perhaps you're right. I always forget that
const
when used with pointers as above means the data pointed to is constant, not that the pointer is. I'll amend my answer. \$\endgroup\$user1118321– user11183212016年06月13日 04:56:04 +00:00Commented Jun 13, 2016 at 4:56 -
\$\begingroup\$ If you wanted to make
ato
itself const, it'd beconst char* const
instead — or evenchar* const
if you wanted to change the pointee but not the pointer. \$\endgroup\$Ruslan– Ruslan2016年06月13日 11:30:39 +00:00Commented Jun 13, 2016 at 11:30
Avoidable integer overflow.
When the text version forINT_MIN
is passed, what happens? Bothres * 10 + *ato - '0'
andsign * res
are experiencingint
overflow, which is undefined behavior. This means, even if you get the hoped for result today, you may not get it tomorrow or on another complier/platform. Best to avoid. One method is to accumulateres
amongst the negative side as there is the same or usually more negative numbers than positive one. (think +2147483647 vs. -2147483648).sign = *ato; // simple save candidate sign character if (*ato == '-' || *ato == '+') { ato++; } while (isdigit(*ato)) { res = res * 10 - (*ato - '0'); // - ato++; } if (sign != '-') } res = -res; } return res;
Unavoidable out of range.
atoi()
has "If the value of the result cannot be represented, the behavior is undefined." Yourmy_atoi()
could do better by preventing overflow. A simple method is to use a wider integer type. Of course that does not help if there is no wider type. If interested in code that prevents that without a wider type, let me know, as it is not technically needed, your function could have UB too if it is likeatoi()
.Proper use of
is...()
Recommend use ofisspace()
. The list of charcters used in OP's code lacks the standard'\v'
and other spaces characters that are locale dependable,is...()
functions are curious in that they are defined forunsigned char
values andEOF
. Avoidchar ch; is....(ch);
ifch
may have a negative value. Similar idea forisdigit()
.// while (*ato == ' ' || *ato == '\t' || *ato == '\n' || *ato == '\f' || *ato == '\r') // while (isspace(*ato)) // UB the *ato < 0 while (isspace((unsigned char) *ato))
Minor
No need to declare until the variable is needed.
// int res; // res = 0; int res = 0;
Note that this
my_atoi()
, likeatoi()
, does not define the result ofmy_atoi("")
,my_atoi("xyz")
,my_atoi("123xyz")
.Note that handling
my_atoi(NULL)
is not needed, if likeatoi()
, asNULL
is not a valid argument. It is popular to check for this though.
-
\$\begingroup\$ Should be
res * -10
? To make the entire value negative \$\endgroup\$smac89– smac892016年06月13日 15:00:50 +00:00Commented Jun 13, 2016 at 15:00 -
1\$\begingroup\$ @Smac89 No - still should be
res * 10 - (*ato - '0')
. It is still base 10, not base -10.res
at that point is negative (or zero). \$\endgroup\$chux– chux2016年06月13日 15:03:49 +00:00Commented Jun 13, 2016 at 15:03 -
1\$\begingroup\$ If one could use
isdigit
from a standard library, why not use the standardatoi
and doint my_atoi(const char *s) { return atoi(s); }
...? \$\endgroup\$CiaPan– CiaPan2016年06月13日 15:06:08 +00:00Commented Jun 13, 2016 at 15:06 -
\$\begingroup\$ @CiaPan Interesting point, or use
(int)strtol(nptr, (char **)NULL, 10)
? IAC, it is an exercise and the typical implications are that such trivial substitutions are not allowed. But to your point, code should then call/implementmy_isspace()
. \$\endgroup\$chux– chux2016年06月13日 15:08:48 +00:00Commented Jun 13, 2016 at 15:08 -
\$\begingroup\$ @chux Yes – and that's what OP, julekgwa, tried to do, performing explicit comparision to a space, tab, newline.. characters. \$\endgroup\$CiaPan– CiaPan2016年06月16日 16:40:19 +00:00Commented Jun 16, 2016 at 16:40