Convert a number into a different base and return as string

Question 1

I wanted to write a function that would take a long long int argument as well as a base and it would convert that number into an equivalent number in a different base, and return the result as a string (er, char array). For example, the call,

convertBase(9, 5)

should return "140円". For negative values, the returned string should simply have a negative sign, '-' in front.

The code is as follows:

#include <stdio.h>
#include <stdlib.h>
#include "convert.h"
#define MIN_BASE 2
#define MAX_BASE 36
#define MAX_LLI_REP 63
char toCharacter(int v)
{
 v = abs(v);
 return (v < 10)
 ? ('0' + v)
 : ('a' + v - 10);
}
char * convertBase(long long int value, int base)
{
 if (base < MIN_BASE || base > MAX_BASE) {
 fprintf(stderr, "The base must be within [%d, %d].\n", MIN_BASE, MAX_BASE);
 return NULL;
 }
 char c[MAX_LLI_REP];
 int i, j = 0;
 long long int quotient;
 for (i = 0, quotient = value; quotient != 0; i++, quotient /= base)
 {
 c[i] = toCharacter(quotient % base);
 }
 int negative = value < 0;
 char * result = (char *) malloc(sizeof(char) * (i + 1 + negative));
 if (negative) {
 result[0] = '-';
 j = 1;
 }
 while (i) {
 result[j++] = c[(i--) - 1];
 }
 result[j] = '0円';
 return result;
}

I've ran a variety of test cases, such as both LLONG_MIN and LLONG_MAX from limits.h, in various bases, as well as trying to convert a decimal number into a decimal number (expecting the same answer). It passed in all of these cases. However, I was wondering if additional sets of eyes could spot potential errors. Also, any stylistic or performance improvement suggestions would be appreciated.

Question 2

Off by one

Your MAX_LLI_REP is 63, but since you are accepting long long inputs, you could have a 64 character representation if you pass in LONG_LONG_MIN which is 0x8000000000000000 and in binary would be 10000...000 which is one 1 and 63 0's.

When I ran your program passing in LONG_LONG_MIN (or -9223372036854775808), the first character in the output which should be a 1 was a random character instead, because it was the character that fell out of bounds and got overwritten by something else. For example, I got:

-h000000000000000000000000000000000000000000000000000000000000000

When I set MAX_LLI_REP to 64, the problem went away.

Simplified expression

This expression:

result[j++] = c[(i--) - 1];

Could be simplified to:

result[j++] = c[--i];

Modified interface

Currently, you allocate a string and return it from your function. This creates some potential awkwardness because someone has to free that string later on.

It might be nicer to pass in a buffer to your function and have the function fill it in instead. The buffer size can be documented to be a minimum of 66 bytes, or you could pass in a buffer length argument as well and have the function fill up to the buffer length.

Alternatively, you could create a static buffer in your function and return a pointer to it. This has the drawback that you must use the return value before you call the function again. Some C library functions such as ctime() do this, so it isn't unprecedented.

Question 3

The output that you posted is interesting because when I ran it (I just pasted the number), I got: The number -9223372036854775808 in base 2 is: -1000000000000000000000000000000000000000000000000000000000000000. I wonder what's responsible for this discrepancy.

Question 4

It really depends on what variable your compiler placed after the last character of the buffer. I once got -8000... and once got -h000..., depending on what debug printfs I added to the code. If the variable after the buffer isn't written to after your loop, then the 1 character will remain there. You can verify that the buffer is being overrun by checking the value of i, though.

Question 5

I checked the value of i which was 64, so the last iteration would have accessed c[63] which would have gone out of bounds, so your recommendation was spot on!

Question 6

@MartinTuskevicius Actually, now that I think about it, any negative number with a base of 2 should trigger the same problem. For example, -1 would exhibit the same behavior.

Question 7

Remove unnecessary code. Neither the cast nor the multiplication by sizeof(char) * needed (it is always 1). It you want to show the scaling by the type use sizeof *result instead.

// char * result = (char *) malloc(sizeof(char) * (i + 1 + negative));
char * result = malloc(i + 1 + negative);
// or 
char * result = malloc(sizeof *reuslt * (i + 1 + negative));

Little helper functions like char toCharacter(int v) should be static as they are not meant to be called by outside functions.
Agree with @JS1 that code should pass in the buffer. Also recommend passing in the buffer size. Return NULL when size is insufficient.
```
char *convertBase(char *dest, size_t size, long long int value, int base);
```
MAX_LLI_REP 63 assumes long long is about 64 bits. C only requires long long to be at least 64 bits. Aside from the values being wrong for a 64-bit long long as pointed out by @JS1, the value should be based on the size of long long and not the assumption it is 64-bits.
```
// #define MAX_LLI_REP 63
#define MAX_LLI_REP (sizeof(long long)*CHAR_BIT + 2)
```
Should you care about pre-C99 compatibility, some_negative_int/base and some_negative_int%base have implementation defined results that breaks this code. Converting all to positive values, except when value is 2's complement LLONG_MIN, solves this. LLONG_MIN needs special code - it just depends on what degree of portability you want.
```
if (value < -LLONG_MAX) {
 // The details of this get into just how portable code needs to be.
 Special_TBD_Code();
}
else {
 quotient = value >= 0 ? value: -value;
 ...
}
```
A now rare concern is -0, possible when long long is signed magnitude or 1's complement. I do not see a portable solution. The following offers some portability. See https://stackoverflow.com/q/19869976/2410359
```
#if LLONG_MIN == -LLONG_MAX
if (value == 0) {
 long long pz = 0; 
 if (memcmp(&value, &pz, sizeof value) == 0) strcpy(c,"0");
 else strcpy(c,"-0");
}
#endif 
```

Question 8

One additional change I'd make would be to store quotient as a positive number. This requires changing the type to unsigned long long int. It avoids problems on systems where -1 / 2 == -1, and would allow you to remove the v = abs(v) line in toCharacter.

To handle a negative value, before you start conversion check the sign. If it is negative, note (to use later for the negative sign) and store the negated value in (an unsigned long long) quotient.

Question 9

Huh, I never knew about this. Does storing a negative value in an unsigned type automatically negate it and make it positive?

Question 10

@MartinTuskevicius No, (in a two's complement representation) the bits are copied unchanged, giving a value that is the original value plus pow(2,N) where N is the number of bits in the type.

Question 11

OP's method is sound. "be to store quotient as a positive number" is vague as value may be negative and this answer has not detailed how to handle a negative value. Note: In C, ULLONG_MAX is not specified to be greater than LLONG_MAX so -LLONG_MIN may be greater than ULLONG_MAX.

Question 12

When converting from signed to unsigned: "(in a two's complement representation) the bits are copied unchanged" is not specified by the standard. Is is very common practice though. What is specified is "... because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type." The problem occurs with obscure platforms that uses 1 bit for the sign position in 2's complement for long long simple always clear that bit with unsigned long long.

Question 13

@chux I was looking at the C++ spec which says, "In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation)." (section 4.7, Integral Conversions, paragraph 2, note). I added some additional detail on handling a negative value.

JS1 JS1 28.8k3 gold badges41 silver badges83 bronze badges · Accepted Answer · 2016-01-13 18:31:53Z

Off by one

Your MAX_LLI_REP is 63, but since you are accepting long long inputs, you could have a 64 character representation if you pass in LONG_LONG_MIN which is 0x8000000000000000 and in binary would be 10000...000 which is one 1 and 63 0's.

When I ran your program passing in LONG_LONG_MIN (or -9223372036854775808), the first character in the output which should be a 1 was a random character instead, because it was the character that fell out of bounds and got overwritten by something else. For example, I got:

-h000000000000000000000000000000000000000000000000000000000000000

When I set MAX_LLI_REP to 64, the problem went away.

Simplified expression

This expression:

result[j++] = c[(i--) - 1];

Could be simplified to:

result[j++] = c[--i];

Modified interface

Currently, you allocate a string and return it from your function. This creates some potential awkwardness because someone has to free that string later on.

It might be nicer to pass in a buffer to your function and have the function fill it in instead. The buffer size can be documented to be a minimum of 66 bytes, or you could pass in a buffer length argument as well and have the function fill up to the buffer length.

Alternatively, you could create a static buffer in your function and return a pointer to it. This has the drawback that you must use the return value before you call the function again. Some C library functions such as ctime() do this, so it isn't unprecedented.

The output that you posted is interesting because when I ran it (I just pasted the number), I got: The number -9223372036854775808 in base 2 is: -1000000000000000000000000000000000000000000000000000000000000000. I wonder what's responsible for this discrepancy.
It really depends on what variable your compiler placed after the last character of the buffer. I once got -8000... and once got -h000..., depending on what debug printfs I added to the code. If the variable after the buffer isn't written to after your loop, then the 1 character will remain there. You can verify that the buffer is being overrun by checking the value of i, though.
I checked the value of i which was 64, so the last iteration would have accessed c[63] which would have gone out of bounds, so your recommendation was spot on!
@MartinTuskevicius Actually, now that I think about it, any negative number with a base of 2 should trigger the same problem. For example, -1 would exhibit the same behavior.

Stack Exchange Network

Convert a number into a different base and return as string

3 Answers 3

Off by one

Simplified expression

Modified interface

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert a number into a different base and return as string

3 Answers 3

Off by one

Simplified expression

Modified interface

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions