This is my implementation of itoa()
(Integer to Alpha), which converts an integer to a string. Memory management and optimization is important. The caller is not responsible for allocation of the string.
The caller is responsible for freeing the string, as my implementation does not allow to call free()
. The function handles everything internally, except freeing. The function allocates memory twice, for the string to be returned and the int array where the digits are stored.
I am concerned about memory leaks with the allocation of the int array which stores the digits. Is there any way to optimize / secure the memory allocation so I do not get leaks?
The code:
#include <stdlib.h>
static int digitcount(int n)
{
int i = 0;
if (n == 0)
return (1);
while (n != 0)
{
n /= 10;
i++;
}
return (i);
}
static int *revarray(int *a, int size)
{
int tmp = 0;
int i = 0;
while (i < size / 2)
{
tmp = a[i];
a[i] = a[size - i - 1];
a[size - i - 1] = tmp;
i++;
}
return (a);
}
static int *getdigits(int n, int size)
{
int *a;
int r = 0;
int i = 0;
a = malloc(size * sizeof(int));
if (a == NULL)
return (NULL);
if (n == 0)
a[i] = 0;
while (n != 0)
{
r = n % 10;
if (r < 0)
r *= -1;
a[i] = r;
i++;
n /= 10;
}
a = revarray(a, size);
return (a);
}
char *itoa(int n)
{
char *nbr;
int *digits;
int digits_size;
int i;
int j;
digits_size = digitcount(n);
digits = getdigits(n, digits_size);
if (n < 0)
digits_size++;
nbr = malloc(digits_size + 1 * sizeof(char));
if (nbr == NULL || digits == NULL)
return (NULL);
i = 0;
j = 0;
if (n < 0)
nbr[i++] = '-';
while (i < digits_size)
{
nbr[i] = digits[j] + '0';
i++;
j++;
}
nbr[i] = 0;
return (nbr);
}
EDIT:
For practice and understanding I wanted to implement my own itoa()
instead of just calling sprintf()
or using external libraries.
6 Answers 6
Memory management and optimization is important.
I will avoid any stylistic remark and focus on this particular aspect of your query.
Memory
First of all, the first rule of optimization is to avoid allocating memory. In general, memory allocation is slow-ish, and memory de-allocation even slower. In particular, most malloc implementation will perform "consolidation" work during the call to free
which can take a few milliseconds. (The average case isn't that slow, but the worst case can be bad).
As a result, not allocating is generally much faster than allocating. And if you do insist on allocating, at the very least you should only allocate once for the final result, and not allocate for any intermediate value.
Hence the ideal API:
char* my_itoa(char* buffer, int i);
Number of digits
Your method is naive, and therefore slow.
In general, division and modulo are the slowest arithmetic operations on a CPU. For x64, a 64-bits division is 30 to 90 cycles when an addition is 1 cycle and a multiplication 3 cycles. 32-bits divisions fare a bit better, but not by much.
Of course, the compiler will perform strength reduction, and replace the division/modulo by a constant with a series of additions/multiplications/shifts... but even then it'll still be a slow-ish operation. (Especially for signed int
)
Luckily for you, there was a recent spree on Internet on the fastest way to compute the number of digits.
I have myself benchmarked this method, and its performance is truly good. It takes 3 CPU cycles (just like a single multiplication) regardless of the value of the integer. I would encourage simply using it:
/* Note: GCC builtin, other compilers should have an equivalent method
to count the number of leading zeroes */
int int_log2(uint32_t x) { return 31 - __builtin_clz(x|1); }
int numberDigits(uint32_t x)
{
static uint64_t const TABLE[] = {
4294967296, 8589934582, 8589934582, 8589934582,
12884901788, 12884901788, 12884901788, 17179868184,
17179868184, 17179868184, 21474826480, 21474826480,
21474826480, 21474826480, 25769703776, 25769703776,
25769703776, 30063771072, 30063771072, 30063771072,
34349738368, 34349738368, 34349738368, 34349738368,
38554705664, 38554705664, 38554705664, 41949672960,
41949672960, 41949672960, 42949672960, 42949672960
};
return (int)((x + TABLE[int_log2(x)]) >> 32);
}
Reversal of Array
This is mutually exclusive with computing the number of digits.
In order to obtain the digits, you have to use % 10
(or % 100
to get them 2 by 2), and this means getting the lowest digits first.
There are 3 strategies to deal with this:
- Write the digits in reverse order in a temporary array:
- Either reverse them in place and bulk-copy them to the destination.
- Or copy them one at a time to the destination, reversing as you go.
- Write the digits (as ASCII
char
, notint
) in order in a large enough array, then bulk-copy them to the destination. - Compute the number of digits, then write the digits in order in the destination, starting from the accurate final place.
In general, the reversal strategy is the worse performance wise. Element-by-element manipulation loses out quickly as the number of elements go up.
The difference between the last two hinges on the performance of bulk-copying vs computing the number of digits, and the latter should win if well-implemented.
At the moment, you are using the worst possible combination: computing number of digits (slowly), discarding the result, and using the reversal strategy.
That's a lot of unnecessary work; performance suffers accordingly.
Performance Goal
I was, in fact, just this week benchmarking our number-formatting routines, after implementing a variant of the faster "number of digits" computation.
On my computer (4.6 GHz), formatting a 64 bits unsigned integer with the algorithm I use takes:
- 6 ns for
1
. - 14 ns for
UINT64_MAX
Note: for reference sprintf
takes 74 ns to do the same on my computer.
There's nothing magic there:
- Compute number of digits: 1 ns/1.2 ns.
- Loop until 0,
% 100
at a time, writing in-place in the destination buffer. - Handle last digit, if any.
The destination buffer is user-provided, of course, to avoid any memory allocation.
I would argue this is the order of magnitude you should aim for, if performance matters to you.
-
1\$\begingroup\$ How do I print an integer in Assembly Level Programming without printf from the c library? has some links at the bottom to blogs about fast itoa. (e.g. doing
x /= 100
steps and splitting that up, to gain some instruction-level parallelism.) pvk.ca/Blog/2017/12/22/… / tia.mat.br/posts/2014/06/23/integer_to_string_conversion.html \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:08:55 +00:00Commented Jul 15, 2021 at 17:08 -
\$\begingroup\$ Storing directly into the destination does nicely avoid store-forwarding stalls which costs latency. Note that out-of-order exec will overlap the num_digits latency with the multiply latency.
bsr
-> load ->add
->shr
is at least 10 cycle latency on an Intel CPU (3 + 5 + 1 + 1), assuming L1d hit for the table. So not 1 ns, but performance of tiny sequences of instructions must be considered as latency and front-end / back-end throughput costs separately, not one-dimensional ns or cycles to add up. (I'd believe that it costs ~1 ns tput vs. a func that formats from the end of a buffer.) \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:24:10 +00:00Commented Jul 15, 2021 at 17:24 -
\$\begingroup\$ You might actually use an API like
char *atoi_end(char *buff_end, unsigned x);
to avoid needing to count at all (like in the asm Q&A I linked). glibc actually does that internally. But that's probably only useful if you're going to feed it to a syscall (like in the linked asm question) or a function like puts. If you were going to copy without truncating to a fixed length, you'd want to just do a normal atoi to write the digits to their final location. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:26:23 +00:00Commented Jul 15, 2021 at 17:26 -
\$\begingroup\$ I was trying to figure out if there was some use for a rougher inexact count, but if you keep the mallocing API it probably only makes sense if the string does start at the front of the buffer. So you need an exact count, either via a tmp buffer (length for free) and memcpy or ahead of allocation. Perhaps getting some multiply latency into the pipeline before the call to malloc could help some with latency. Using the calculated exact length as a loop termination condition could allow the loop-exit branch prediction to be checked earlier than the
x /= 10
dep chain.. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:31:03 +00:00Commented Jul 15, 2021 at 17:31 -
\$\begingroup\$ "fair a bit better" -> "fare a bit better" :) \$\endgroup\$Dev– Dev2021年07月15日 18:08:33 +00:00Commented Jul 15, 2021 at 18:08
Counting the digits doubles the amount of work, and the division/modulo is indeed very hard work for the CPU.
You know the maximum size of the resulting string, because it would be the largest or most negative integer you can be passed. So use a local array as a buffer, and then copy the result to allocated memory.
You have a separate function to reverse the string. Instead, fill your buffer from the right, decrementing the pointer as you write each digit.
Instead of checking for a negative remainder each time through the loop, check for negative input once at the beginning. If it's negative, remember that and change to positive. Then when done, finish off with a -
character if your is-negative flag is set.
-
\$\begingroup\$ "If it's negative, remember that and change to positive." is a problem when
n == INT_MIN
, as-INT_MIN
is undefined behavior with 2's compliment. \$\endgroup\$chux– chux2021年07月14日 07:09:46 +00:00Commented Jul 14, 2021 at 7:09 -
\$\begingroup\$ @chux-ReinstateMonica I didn't want to digress with details. Either notice that as a special singular case, or have the base function use unsigned or wider types (I've seen C standard library in MSVC where the only implementation was the longest unsigned type and other types were thin wrappers that called it. That's slower than a modern template version, BTW) \$\endgroup\$JDługosz– JDługosz2021年07月14日 14:03:42 +00:00Commented Jul 14, 2021 at 14:03
-
\$\begingroup\$
INT_MIN
does not need to be a special case or resort to other types or suffer performance. Simply fold positive values to negative ones and use then%10
result of [-9...0] to form the digit character. \$\endgroup\$chux– chux2021年07月14日 14:37:14 +00:00Commented Jul 14, 2021 at 14:37 -
\$\begingroup\$ Until very recent standards of C++, the behavior of
%
for negative numbers was implementation-defined. I don't know if that's still true in C compilers. Historically, we avoided the negative value when implementingitoa
etc. In any case, my comment was on the conditional branch needed to form the absolute value being in the middle of the loop. \$\endgroup\$JDługosz– JDługosz2021年07月14日 15:39:37 +00:00Commented Jul 14, 2021 at 15:39 -
1\$\begingroup\$ @PeterCordes You may enjoy this collection of such asserts: Detecting unicorn and dinosaur compilers - only cost 0ドル.05. \$\endgroup\$chux– chux2021年07月15日 18:27:39 +00:00Commented Jul 15, 2021 at 18:27
Besides lots of optimizations other people have suggested, you have two memory leaks.
The first is that, just before the final return in itoa(), you need to write free(digits);
The second is in the error handling. (A common place for errors BTW.) In particular, if getdigits
returns successfully, but the malloc
fails, you leak digits
. The fix: Test digits
before the malloc
and just return. Test nbr
after the malloc
, and free(digits);
and return if it failed.
-
1\$\begingroup\$ I think this answer addresses the question better than the others, which are more focused on the implementation rather than the memory leaks the poster is concerned about. \$\endgroup\$Andrakis– Andrakis2021年07月14日 03:03:54 +00:00Commented Jul 14, 2021 at 3:03
-
\$\begingroup\$ You don't need an array of
int
digits in the first place, though, and if you do, it can be a local in automatic storage because the max digit-length of anint
is small. Automatic storage is fast and immune to leaks. A localchar buf[sizeof(int)*CHAR_BIT / 3]
or something would make even more sense (the number of bits in an int / 3 is a loose bound on the max length). (edit: chux's answer already suggests that.) \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:42:45 +00:00Commented Jul 15, 2021 at 17:42 -
1\$\begingroup\$ @PeterCordes Well, I did say "lots of optimizations". I wrote my answer just to spell out the leaks in the current code. Personally, for this functionality, I would make a local buffer as a
char
array, flll it from the right, and returnstrdup()
of my final pointer. (I also wouldn't write this prototype&semantics.) As for the array size, I recommend adding 2, one for the trailing nul, and one for the leading minus sign. Especially since on a 16 bit int platform, your array would be 5 characters and there could be 5 digits, and similarly for a 32 bit int platform, 10 and 10. \$\endgroup\$David G.– David G.2021年07月15日 22:53:00 +00:00Commented Jul 15, 2021 at 22:53
You have used static
for the helper functions, to give them internal linkage. That's good.
return (1);
The parentheses don't add any value here. Just write return 1;
like everyone else.
We can avoid the need to special-case n == 0
by always performing at least one division; use do
/while
in place of while
:
int i = 0;
do {
n /= 10;
++i;
} while (n);
The similar test in getdigits()
can be eliminated in the same way.
int i = 0; while (i < size / 2) { ... i++; }
We have a construct that expresses that better:
for (int i = 0; i < size / 2; ++i) {
...
}
The variable tmp
doesn't need to be at function scope; it can be declared within the loop:
static int *revarray(int *a, int size)
{
for (int i = 0; i < size / 2; ++i) {
int tmp = a[i];
a[i] = a[size - i - 1];
a[size - i - 1] = tmp;
}
return a;
}
a = malloc(size * sizeof(int));
A convention that's often helpful is to write the size in terms of the pointer being assigned to. That helps readers immediately see that it's consistent, without having to look for the declaration of a
:
a = malloc(sizeof *a * size);
I've written the sizeof
expression first - it makes no difference here, but can avert overflow when there's a multiplication for the number of elements: size_t * int * int
is safer than int * int * size_t
.
a = revarray(a, size);
That's a pointless assignment, since revarray()
returns the unchanged value of a
. We should probably make revarray()
return void
.
digits = getdigits(n, digits_size); nbr = malloc(digits_size + 1 * sizeof(char)); if (nbr == NULL || digits == NULL) return (NULL);
Oops - what if digits
is a valid pointer, but nbr
is null? Then we leak digits
. We need a call to free()
in there (remember that free(NULL)
is defined, so it can be an unconditional free(digits)
).
Multiplying by sizeof (char)
is a no-op: since sizeof
yields values in units of char
, it follows that sizeof (char)
must be 1.
while (i < digits_size)
Again, this should be a for
loop:
for (int i = 0, j = 0; i < digits_size; ++i, ++j)
We forgot to free(digits)
before we return.
As other reviewers have pointed out, the algorithm is inefficient, with the extra measuring and copying, and the reversal of digit order. I'll not repeat those observations here.
Leak
digits
is never free'd.
If either one of digits
or nbr
return NULL
, the other allocation is lost.
digits = getdigits(n, digits_size);
...
nbr = malloc(digits_size + 1 * sizeof(char));
if (nbr == NULL || digits == NULL)
return (NULL);
Concept error
sizeof(char)
is 1 so not much of a issue, yet sizeof(char)
should multiply by the sum.
// malloc(digits_size + 1 * sizeof(char));
malloc((digits_size + 1) * sizeof(char));
Simplification
Use a do { } while();
//if (n == 0) return (1);
//while (n != 0) {
// n /= 10;
// i++;
//}
do {
n /= 10;
i++;
} while (n != 0);
Allocate to the refenced object
Rather than code the type, use the referenced object. Easy to code right, review and maintain.
// a = malloc(size * sizeof(int));
a = malloc(size * sizeof *a);
Good use of static
functions
Use a local buffer for the reverse
No need to free it then either. INT_STRING_SIZE
details
#define INT_STRING_SIZE ((sizeof(int)*CHAR_BIT - 1)*28/93 + 3)
// int *digits;
int digits[INT_STRING_SIZE];
digits
is now local storage and will be reclaimed on function exit. It's OK if it is more than needed for a given n
, just as long as it can handle the worst case n == INT_MIN
.
Simplified approach
Is there any way to optimize / secure the memory allocation so I do not get leaks?
Yes. Form the string in a local buffer of worst case size (not an allocated one). Fill from right to left so no need for a later reverse. Now armed with the known size, allocate and strcpy()
.
// Pseudo code
char *itoa_alloc(int n) {
int n_origin = n;
char buf[INT_STRING_SIZE];
char *p = Address of end of buf
*p = null character;
do
p--;
*p = |n%10| + '0'; // Various ways to avoid repeated sign test exist
n /= 10;
} while (n != 0);
if (n_origin < 0)
*(--p) = '-';
char *s = malloc(end_of buf - p + 1);
if (s == NULL) return NULL // Handle_error
strcpy(s, p);
// **OR**
memcpy(s, p, end_of buf - p + 1);
return s;
}
With C2x, looks like strdup()
may arrive and so could replace
char *s = malloc(end_of buf - p + 1);
if (s == NULL) return NULL // Handle_error
strcpy(s, p);
return s;
// **OR** C2x or select implementations today
return strdup(p);
-
\$\begingroup\$ You have the length so you can use
memcpy
instead ofstrcpy
. If you're going to str functions to favour simplicity over speed, you could just usestrdup
(although that POSIX, not ISO C). \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:48:06 +00:00Commented Jul 15, 2021 at 17:48 -
\$\begingroup\$ You can do unsigned absolute value safely in C with
unsigned n = n<0 ? 0U - n : n;
, allowing you to hoist that work out of the loop and use more-efficient unsigned division. The implicit(unsigned)n
conversion is guaranteed to use modulo-reduction to the range ofunsigned
without signed-overflow, correctly handlingINT_MIN
even on 2's complement systems. \$\endgroup\$Peter Cordes– Peter Cordes2021年07月15日 17:51:00 +00:00Commented Jul 15, 2021 at 17:51 -
\$\begingroup\$ @PeterCordes Note posted code, although C-like, is
// Pseudo code
. I did not want to give OP a cut/paste solution here, but a guide, to allow OP the specific coding experience. Agree about bring the test out of the loop andmemcpy()
and that is what is done in linked sample withreturn memcpy(dest, p, size_used);
\$\endgroup\$chux– chux2021年07月15日 18:04:46 +00:00Commented Jul 15, 2021 at 18:04
The standard functions in string.h
are not using malloc
and free
almost all. Therefore, the interface of itoa
might be changed like this:
/* NG - because itoa returns a new memory */
char *itoa(int n) { ... }
/* OK - because itoa uses an existing memory */
void itoa(char *s, int n) { ... }
-
5\$\begingroup\$ Completely rewriting OP's code does not constitute a code review. \$\endgroup\$Woodford– Woodford2021年07月13日 22:14:44 +00:00Commented Jul 13, 2021 at 22:14
-
1\$\begingroup\$ The suggested two parameter form is bad. What if somebody writes:
char s[2]; itoa(s,1000);
? You need to either provide a length parameter or enforce a length, perhaps with a structure, like:struct itoa_param { char buf[14]; }; extern void itoa(struct itoa_param *s, int n);
It also might be good to return the start of the string. This can simplify things be making the returned string not required to be the start of the buffer. \$\endgroup\$David G.– David G.2021年07月14日 15:02:52 +00:00Commented Jul 14, 2021 at 15:02 -
\$\begingroup\$ Not so bad, I think. It's the same as
char s[2]; sprintf(s, "%d", n);
. Almost C standard functions are unsafe. For real safe programming, we know about the potential unsafeness of C language. In this case, we talk about memory leaks. Array boundary problem is also important but not now's topic. \$\endgroup\$Haruo Wakakusa– Haruo Wakakusa2021年07月15日 11:10:30 +00:00Commented Jul 15, 2021 at 11:10 -
\$\begingroup\$ @HaruoWakakusa: That's why there is
snprintf
. \$\endgroup\$Ben Voigt– Ben Voigt2021年07月15日 15:09:28 +00:00Commented Jul 15, 2021 at 15:09
Explore related questions
See similar questions with these tags.