Convert binary string to decimal number

Question 1

I'm finishing the book C Primer Plus by Stephen Prata and I wrote as a part of the homework program to convert binary string to decimal number:

#include <stdio.h>
#include <string.h>
typedef unsigned char byte;
typedef unsigned int uint;
int strbin_to_dec(const char *);
int main(void) {
 char * wbin = "01001001";
 printf("%s to dziesietnie %d.\n", wbin, strbin_to_dec(wbin));
 return 0;
}
int strbin_to_dec(const char * str) {
 uint result = 0;
 for (int i = strlen(str) - 1, j = 0; i >= 0; i--, j++) {
 byte k = str[i] - '0'; // we assume ASCII coding
 k <<= j;
 result += k;
 }
 return result;
}

Questions:

How's the efficiency of this program?
Are byte operations faster than normal?

Question 2

You can compile your code with certain flags. It goes like this gcc app.c --ansi --pedantic -Wall

Question 3

str[i] - '0'; does not need "assume ASCII coding". That is valid for all encodings.

Question 4

Well, first of all the description is wrong.
strbin_to_dec() should be strbin_to_i(), because it parses a base-2-number, instead of converting it to base-10.

Next, refrain from using your own defines as much as possible, if there is an appropriate type already.

Third, assuming correct input is generally naive. Assuming corrupted or malicious input is far wiser.
Your code breaks if any character is neither 0 nor 1, or if the number is too big for an unsigned.

Finally, it's more straight-forward to parse from beginning to end, only needs a single pass, and is probably due to both reasons slightly more efficient, if you really care. Still, what input you get might be important to the determination.

Regarding your question whether byte-operations are faster than wider operations, probably only on 8bit-microcontrollers or the like.
Still, micro-optimisations are hard to do correctly, and have hardly any significant effect, especially if you don't know what to optimise for: Source clarity (should be the default), source size, compiled code size, code throughput, memory bandwidth, ...
Let's hope you chose the right optimisation-options, or the point is moot anyway.

Question 5

Answering only your specific question here: "Efficiency" is really the wrong thing to be worrying about. For example, look at what your code generates when given to clang 5.0.0:

https://godbolt.org/g/esUA7q

It looks like a bunch of gobbledigook, but what it means is that clang very heavily vectorized the code. This makes the code extremely fast for long chains, but not as fast as it could be when handling 1 or 2 digits.

Performance is ALWAYS contextual and relative, and unless you know what context you are operating under, you can't really know wether it's efficient or not.

Question 6

Why are you tracking position in j and scaling k to suit? All you have to do is iterate left-to-right, looking at each character. On each loop iteration, just shift result 1 bit left and add the value of the current bit.

If you can assume that you'll always have valid input — a string of ASCII 0 and 1 characters, I'd write it something like this:

int strbin2i(const char* s) {
 register unsigned char *p = s;
 register unsigned int r = 0;
 while (p && *p ) {
 r <<= 1;
 r += (unsigned int)((*p++) & 0x01);
 }
 return (int)r;
}

If you want to be a good citizen and validate input, you might do something like this:

int strbin2i(const char* s) {
 unsigned char *p = s ;
 unsigned int r = 0 ;
 unsigned char c ;
 while (p && *p ) {
 c = *p++;
 if ( c == '0' ) { r = (r<<1) ; } // shift 1 bit left and add 0
 else if ( c == '1' ) { r = (r<<1) + 1 ; } // shift 1 bit left and add 1
 else { break ; } // bail on oinvalid character
 }
 return (int)r;
}

Question 7

Few things bother me about this code: 1) repeatedly checking for p in the while condition when it can be done once at the very beginning. Compiler should be able to optimise it but then...see later. 2) Use of register nowadays is probably completely useless (unless you use it to stop yourself to write &p). Any modern compiler is perfectly able to determine there isn't aliasing here (unless you're compiling with -O0 but they why bother?) 3) '0' is 91 then *p & 0x01 returns '0' and 0 for '1'.

Question 8

4) In C bitwise & is perfectly defined for signed integers (at least for two's complement). You MIGHT be writing code where signed integers have another representation (one's complement, for example) where a negative 0 is a trap but then...you're failing when casting to int to return a value. Either don't bother or do it well. << is UB for negative operands. 5) This is just my very personal opinion but I can't stand code formatted to look like columns like your 2nd example...it's a pain to write, read and maintain (with a very opinionated visual appearance...)

Question 9

6) When being a good citizen you may check for the number of bits in your input

Question 10

Um... no. Don't know what encoding you're talking about, but in Unicode and ASCII both, the code point 91 dec/0x5b hex is [, left square bracket, while the code point 0x91 hex/145 dec is Unicode's "PRIVATE USE 1" control character. The code point for Unicode/ASCII 0 is 0x30 hex/48 dec, while 1 is at 0x31, decimal 49. If you know your string contains only ASCII 0 or 1, all you care about it is the low-order bit, which masking the char off via c & 0x01 gives you.

Question 11

You're right, I wrongly used 91h as decimal. Please ignore point 3!

Deduplicator Deduplicator 19.9k1 gold badge32 silver badges65 bronze badges · Answer 1 · 2018-04-18 17:59:09Z

Well, first of all the description is wrong.
strbin_to_dec() should be strbin_to_i(), because it parses a base-2-number, instead of converting it to base-10.

Next, refrain from using your own defines as much as possible, if there is an appropriate type already.

Third, assuming correct input is generally naive. Assuming corrupted or malicious input is far wiser.
Your code breaks if any character is neither 0 nor 1, or if the number is too big for an unsigned.

Finally, it's more straight-forward to parse from beginning to end, only needs a single pass, and is probably due to both reasons slightly more efficient, if you really care. Still, what input you get might be important to the determination.

Regarding your question whether byte-operations are faster than wider operations, probably only on 8bit-microcontrollers or the like.
Still, micro-optimisations are hard to do correctly, and have hardly any significant effect, especially if you don't know what to optimise for: Source clarity (should be the default), source size, compiled code size, code throughput, memory bandwidth, ...
Let's hope you chose the right optimisation-options, or the point is moot anyway.

user128454user128454 · Answer 2 · 2018-04-18 17:59:29Z

Answering only your specific question here: "Efficiency" is really the wrong thing to be worrying about. For example, look at what your code generates when given to clang 5.0.0:

https://godbolt.org/g/esUA7q

It looks like a bunch of gobbledigook, but what it means is that clang very heavily vectorized the code. This makes the code extremely fast for long chains, but not as fast as it could be when handling 1 or 2 digits.

Performance is ALWAYS contextual and relative, and unless you know what context you are operating under, you can't really know wether it's efficient or not.

Nicholas Carey Nicholas Carey 1113 bronze badges · Answer 3 · 2018-04-18 23:38:07Z

1

\$\begingroup\$

Why are you tracking position in j and scaling k to suit? All you have to do is iterate left-to-right, looking at each character. On each loop iteration, just shift result 1 bit left and add the value of the current bit.

If you can assume that you'll always have valid input — a string of ASCII 0 and 1 characters, I'd write it something like this:

int strbin2i(const char* s) {
 register unsigned char *p = s;
 register unsigned int r = 0;
 while (p && *p ) {
 r <<= 1;
 r += (unsigned int)((*p++) & 0x01);
 }
 return (int)r;
}

If you want to be a good citizen and validate input, you might do something like this:

int strbin2i(const char* s) {
 unsigned char *p = s ;
 unsigned int r = 0 ;
 unsigned char c ;
 while (p && *p ) {
 c = *p++;
 if ( c == '0' ) { r = (r<<1) ; } // shift 1 bit left and add 0
 else if ( c == '1' ) { r = (r<<1) + 1 ; } // shift 1 bit left and add 1
 else { break ; } // bail on oinvalid character
 }
 return (int)r;
}

Share

edited Apr 19, 2018 at 23:55

answered Apr 18, 2018 at 23:38

Nicholas Carey's user avatar

Nicholas Carey Nicholas Carey

1113 bronze badges

\$\endgroup\$

8

1

\$\begingroup\$ Few things bother me about this code: 1) repeatedly checking for p in the while condition when it can be done once at the very beginning. Compiler should be able to optimise it but then...see later. 2) Use of register nowadays is probably completely useless (unless you use it to stop yourself to write &p). Any modern compiler is perfectly able to determine there isn't aliasing here (unless you're compiling with -O0 but they why bother?) 3) '0' is 91 then *p & 0x01 returns '0' and 0 for '1'. \$\endgroup\$

Adriano Repetti
– Adriano Repetti

2018年04月19日 08:17:29 +00:00
Commented Apr 19, 2018 at 8:17
\$\begingroup\$ 4) In C bitwise & is perfectly defined for signed integers (at least for two's complement). You MIGHT be writing code where signed integers have another representation (one's complement, for example) where a negative 0 is a trap but then...you're failing when casting to int to return a value. Either don't bother or do it well. << is UB for negative operands. 5) This is just my very personal opinion but I can't stand code formatted to look like columns like your 2nd example...it's a pain to write, read and maintain (with a very opinionated visual appearance...) \$\endgroup\$

Adriano Repetti
– Adriano Repetti

2018年04月19日 08:20:57 +00:00
Commented Apr 19, 2018 at 8:20
\$\begingroup\$ 6) When being a good citizen you may check for the number of bits in your input \$\endgroup\$

Adriano Repetti
– Adriano Repetti

2018年04月19日 08:31:27 +00:00
Commented Apr 19, 2018 at 8:31
\$\begingroup\$ Um... no. Don't know what encoding you're talking about, but in Unicode and ASCII both, the code point 91 dec/0x5b hex is [, left square bracket, while the code point 0x91 hex/145 dec is Unicode's "PRIVATE USE 1" control character. The code point for Unicode/ASCII 0 is 0x30 hex/48 dec, while 1 is at 0x31, decimal 49. If you know your string contains only ASCII 0 or 1, all you care about it is the low-order bit, which masking the char off via c & 0x01 gives you. \$\endgroup\$

Nicholas Carey
– Nicholas Carey

2018年04月20日 00:14:08 +00:00
Commented Apr 20, 2018 at 0:14
\$\begingroup\$ You're right, I wrongly used 91h as decimal. Please ignore point 3! \$\endgroup\$

Adriano Repetti
– Adriano Repetti

2018年04月20日 06:36:26 +00:00
Commented Apr 20, 2018 at 6:36

| Show 3 more comments

Stack Exchange Network

Convert binary string to decimal number

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Convert binary string to decimal number

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions