I'm finishing the book C Primer Plus by Stephen Prata and I wrote as a part of the homework program to convert binary string to decimal number:
#include <stdio.h>
#include <string.h>
typedef unsigned char byte;
typedef unsigned int uint;
int strbin_to_dec(const char *);
int main(void) {
char * wbin = "01001001";
printf("%s to dziesietnie %d.\n", wbin, strbin_to_dec(wbin));
return 0;
}
int strbin_to_dec(const char * str) {
uint result = 0;
for (int i = strlen(str) - 1, j = 0; i >= 0; i--, j++) {
byte k = str[i] - '0'; // we assume ASCII coding
k <<= j;
result += k;
}
return result;
}
Questions:
- How's the efficiency of this program?
- Are byte operations faster than normal?
3 Answers 3
Well, first of all the description is wrong.
strbin_to_dec()
should be strbin_to_i()
, because it parses a base-2-number, instead of converting it to base-10.
Next, refrain from using your own defines as much as possible, if there is an appropriate type already.
Third, assuming correct input is generally naive. Assuming corrupted or malicious input is far wiser.
Your code breaks if any character is neither 0
nor 1
, or if the number is too big for an unsigned
.
Finally, it's more straight-forward to parse from beginning to end, only needs a single pass, and is probably due to both reasons slightly more efficient, if you really care. Still, what input you get might be important to the determination.
Regarding your question whether byte-operations are faster than wider operations, probably only on 8bit-microcontrollers or the like.
Still, micro-optimisations are hard to do correctly, and have hardly any significant effect, especially if you don't know what to optimise for: Source clarity (should be the default), source size, compiled code size, code throughput, memory bandwidth, ...
Let's hope you chose the right optimisation-options, or the point is moot anyway.
Answering only your specific question here: "Efficiency" is really the wrong thing to be worrying about. For example, look at what your code generates when given to clang 5.0.0:
It looks like a bunch of gobbledigook, but what it means is that clang very heavily vectorized the code. This makes the code extremely fast for long chains, but not as fast as it could be when handling 1 or 2 digits.
Performance is ALWAYS contextual and relative, and unless you know what context you are operating under, you can't really know wether it's efficient or not.
Why are you tracking position in j
and scaling k
to suit? All you have to do is iterate left-to-right, looking at each character. On each loop iteration, just shift result
1 bit left and add the value of the current bit.
If you can assume that you'll always have valid input — a string of ASCII 0
and 1
characters, I'd write it something like this:
int strbin2i(const char* s) {
register unsigned char *p = s;
register unsigned int r = 0;
while (p && *p ) {
r <<= 1;
r += (unsigned int)((*p++) & 0x01);
}
return (int)r;
}
If you want to be a good citizen and validate input, you might do something like this:
int strbin2i(const char* s) {
unsigned char *p = s ;
unsigned int r = 0 ;
unsigned char c ;
while (p && *p ) {
c = *p++;
if ( c == '0' ) { r = (r<<1) ; } // shift 1 bit left and add 0
else if ( c == '1' ) { r = (r<<1) + 1 ; } // shift 1 bit left and add 1
else { break ; } // bail on oinvalid character
}
return (int)r;
}
-
1\$\begingroup\$ Few things bother me about this code: 1) repeatedly checking for
p
in thewhile
condition when it can be done once at the very beginning. Compiler should be able to optimise it but then...see later. 2) Use ofregister
nowadays is probably completely useless (unless you use it to stop yourself to write&p
). Any modern compiler is perfectly able to determine there isn't aliasing here (unless you're compiling with-O0
but they why bother?) 3)'0'
is 91 then*p & 0x01
returns'0'
and0
for'1'
. \$\endgroup\$Adriano Repetti– Adriano Repetti2018年04月19日 08:17:29 +00:00Commented Apr 19, 2018 at 8:17 -
\$\begingroup\$ 4) In C bitwise
&
is perfectly defined for signed integers (at least for two's complement). You MIGHT be writing code where signed integers have another representation (one's complement, for example) where a negative 0 is a trap but then...you're failing when casting toint
to return a value. Either don't bother or do it well.<<
is UB for negative operands. 5) This is just my very personal opinion but I can't stand code formatted to look like columns like your 2nd example...it's a pain to write, read and maintain (with a very opinionated visual appearance...) \$\endgroup\$Adriano Repetti– Adriano Repetti2018年04月19日 08:20:57 +00:00Commented Apr 19, 2018 at 8:20 -
\$\begingroup\$ 6) When being a good citizen you may check for the number of bits in your input \$\endgroup\$Adriano Repetti– Adriano Repetti2018年04月19日 08:31:27 +00:00Commented Apr 19, 2018 at 8:31
-
\$\begingroup\$ Um... no. Don't know what encoding you're talking about, but in Unicode and ASCII both, the code point 91 dec/
0x5b
hex is[
, left square bracket, while the code point0x91
hex/145 dec is Unicode's "PRIVATE USE 1" control character. The code point for Unicode/ASCII0
is0x30
hex/48 dec, while1
is at0x31
, decimal 49. If you know your string contains only ASCII0
or1
, all you care about it is the low-order bit, which masking the char off viac & 0x01
gives you. \$\endgroup\$Nicholas Carey– Nicholas Carey2018年04月20日 00:14:08 +00:00Commented Apr 20, 2018 at 0:14 -
\$\begingroup\$ You're right, I wrongly used 91h as decimal. Please ignore point 3! \$\endgroup\$Adriano Repetti– Adriano Repetti2018年04月20日 06:36:26 +00:00Commented Apr 20, 2018 at 6:36
gcc app.c --ansi --pedantic -Wall
\$\endgroup\$str[i] - '0';
does not need "assume ASCII coding". That is valid for all encodings. \$\endgroup\$