I wanted to write a function that would take a long long int
argument as well as a base and it would convert that number into an equivalent number in a different base, and return the result as a string (er, char
array). For example, the call,
convertBase(9, 5)
should return "140円"
. For negative values, the returned string should simply have a negative sign, '-'
in front.
The code is as follows:
#include <stdio.h>
#include <stdlib.h>
#include "convert.h"
#define MIN_BASE 2
#define MAX_BASE 36
#define MAX_LLI_REP 63
char toCharacter(int v)
{
v = abs(v);
return (v < 10)
? ('0' + v)
: ('a' + v - 10);
}
char * convertBase(long long int value, int base)
{
if (base < MIN_BASE || base > MAX_BASE) {
fprintf(stderr, "The base must be within [%d, %d].\n", MIN_BASE, MAX_BASE);
return NULL;
}
char c[MAX_LLI_REP];
int i, j = 0;
long long int quotient;
for (i = 0, quotient = value; quotient != 0; i++, quotient /= base)
{
c[i] = toCharacter(quotient % base);
}
int negative = value < 0;
char * result = (char *) malloc(sizeof(char) * (i + 1 + negative));
if (negative) {
result[0] = '-';
j = 1;
}
while (i) {
result[j++] = c[(i--) - 1];
}
result[j] = '0円';
return result;
}
I've ran a variety of test cases, such as both LLONG_MIN
and LLONG_MAX
from limits.h
, in various bases, as well as trying to convert a decimal number into a decimal number (expecting the same answer). It passed in all of these cases. However, I was wondering if additional sets of eyes could spot potential errors. Also, any stylistic or performance improvement suggestions would be appreciated.
3 Answers 3
Off by one
Your MAX_LLI_REP
is 63, but since you are accepting long long
inputs, you could have a 64 character representation if you pass in LONG_LONG_MIN
which is 0x8000000000000000
and in binary would be 10000...000
which is one 1
and 63 0
's.
When I ran your program passing in LONG_LONG_MIN
(or -9223372036854775808
), the first character in the output which should be a 1
was a random character instead, because it was the character that fell out of bounds and got overwritten by something else. For example, I got:
-h000000000000000000000000000000000000000000000000000000000000000
When I set MAX_LLI_REP
to 64, the problem went away.
Simplified expression
This expression:
result[j++] = c[(i--) - 1];
Could be simplified to:
result[j++] = c[--i];
Modified interface
Currently, you allocate a string and return it from your function. This creates some potential awkwardness because someone has to free that string later on.
It might be nicer to pass in a buffer to your function and have the function fill it in instead. The buffer size can be documented to be a minimum of 66 bytes, or you could pass in a buffer length argument as well and have the function fill up to the buffer length.
Alternatively, you could create a static buffer in your function and return a pointer to it. This has the drawback that you must use the return value before you call the function again. Some C library functions such as ctime()
do this, so it isn't unprecedented.
-
\$\begingroup\$ The output that you posted is interesting because when I ran it (I just pasted the number), I got:
The number -9223372036854775808 in base 2 is: -1000000000000000000000000000000000000000000000000000000000000000.
I wonder what's responsible for this discrepancy. \$\endgroup\$Martin Tuskevicius– Martin Tuskevicius2016年01月13日 18:44:08 +00:00Commented Jan 13, 2016 at 18:44 -
1\$\begingroup\$ It really depends on what variable your compiler placed after the last character of the buffer. I once got
-8000...
and once got-h000...
, depending on what debug printfs I added to the code. If the variable after the buffer isn't written to after your loop, then the1
character will remain there. You can verify that the buffer is being overrun by checking the value ofi
, though. \$\endgroup\$JS1– JS12016年01月13日 18:46:21 +00:00Commented Jan 13, 2016 at 18:46 -
\$\begingroup\$ I checked the value of
i
which was 64, so the last iteration would have accessedc[63]
which would have gone out of bounds, so your recommendation was spot on! \$\endgroup\$Martin Tuskevicius– Martin Tuskevicius2016年01月14日 01:31:50 +00:00Commented Jan 14, 2016 at 1:31 -
\$\begingroup\$ @MartinTuskevicius Actually, now that I think about it, any negative number with a base of 2 should trigger the same problem. For example,
-1
would exhibit the same behavior. \$\endgroup\$JS1– JS12016年01月14日 04:07:19 +00:00Commented Jan 14, 2016 at 4:07
Remove unnecessary code. Neither the cast nor the multiplication by
sizeof(char) *
needed (it is always 1). It you want to show the scaling by the type usesizeof *result
instead.// char * result = (char *) malloc(sizeof(char) * (i + 1 + negative)); char * result = malloc(i + 1 + negative); // or char * result = malloc(sizeof *reuslt * (i + 1 + negative));
Little helper functions like
char toCharacter(int v)
should bestatic
as they are not meant to be called by outside functions.Agree with @JS1 that code should pass in the buffer. Also recommend passing in the buffer size. Return
NULL
when size is insufficient.char *convertBase(char *dest, size_t size, long long int value, int base);
MAX_LLI_REP 63
assumeslong long
is about 64 bits. C only requireslong long
to be at least 64 bits. Aside from the values being wrong for a 64-bitlong long
as pointed out by @JS1, the value should be based on the size oflong long
and not the assumption it is 64-bits.// #define MAX_LLI_REP 63 #define MAX_LLI_REP (sizeof(long long)*CHAR_BIT + 2)
Should you care about pre-C99 compatibility,
some_negative_int/base
andsome_negative_int%base
have implementation defined results that breaks this code. Converting all to positive values, except whenvalue
is 2's complementLLONG_MIN
, solves this.LLONG_MIN
needs special code - it just depends on what degree of portability you want.if (value < -LLONG_MAX) { // The details of this get into just how portable code needs to be. Special_TBD_Code(); } else { quotient = value >= 0 ? value: -value; ... }
A now rare concern is
-0
, possible whenlong long
is signed magnitude or 1's complement. I do not see a portable solution. The following offers some portability. See https://stackoverflow.com/q/19869976/2410359#if LLONG_MIN == -LLONG_MAX if (value == 0) { long long pz = 0; if (memcmp(&value, &pz, sizeof value) == 0) strcpy(c,"0"); else strcpy(c,"-0"); } #endif
One additional change I'd make would be to store quotient
as a positive number. This requires changing the type to unsigned long long int
. It avoids problems on systems where -1 / 2 == -1
, and would allow you to remove the v = abs(v)
line in toCharacter
.
To handle a negative value, before you start conversion check the sign. If it is negative, note (to use later for the negative sign) and store the negated value in (an unsigned long long) quotient.
-
\$\begingroup\$ Huh, I never knew about this. Does storing a negative value in an
unsigned
type automatically negate it and make it positive? \$\endgroup\$Martin Tuskevicius– Martin Tuskevicius2016年01月14日 20:30:19 +00:00Commented Jan 14, 2016 at 20:30 -
\$\begingroup\$ @MartinTuskevicius No, (in a two's complement representation) the bits are copied unchanged, giving a value that is the original value plus pow(2,N) where N is the number of bits in the type. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2016年01月15日 03:10:30 +00:00Commented Jan 15, 2016 at 3:10
-
\$\begingroup\$ OP's method is sound. "be to store quotient as a positive number" is vague as
value
may be negative and this answer has not detailed how to handle a negativevalue
. Note: In C,ULLONG_MAX
is not specified to be greater thanLLONG_MAX
so-LLONG_MIN
may be greater thanULLONG_MAX
. \$\endgroup\$chux– chux2016年01月16日 20:30:41 +00:00Commented Jan 16, 2016 at 20:30 -
\$\begingroup\$ When converting from signed to unsigned: "(in a two's complement representation) the bits are copied unchanged" is not specified by the standard. Is is very common practice though. What is specified is "... because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." The problem occurs with obscure platforms that uses 1 bit for the sign position in 2's complement for
long long
simple always clear that bit withunsigned long long
. \$\endgroup\$chux– chux2016年01月16日 20:30:49 +00:00Commented Jan 16, 2016 at 20:30 -
\$\begingroup\$ @chux I was looking at the C++ spec which says, "In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation)." (section 4.7, Integral Conversions, paragraph 2, note). I added some additional detail on handling a negative value. \$\endgroup\$1201ProgramAlarm– 1201ProgramAlarm2016年01月17日 01:10:08 +00:00Commented Jan 17, 2016 at 1:10