I have written a C function to parse an IP address that should match the regular expression ^\d+.\d+.\d+.\d+
and nothing else.
Additionally all bytes are limited to 0-255.
I tried to catch all invalid inputs, but I am unsure if there are any edge cases I missed or if it could be simplified.
The function is intended to be used in an embedded system. Therefore I don't want to use external libraries (or anything in stdio.h
) and code size is most important.
I am not really looking for stylistic advice, but rather a functional review.
// Returns the number of characters parsed or 0 if not successful
unsigned int StringToIp(uint8_t addr[4], const char* ipSrc, const size_t ipSrcLen) {
char* ipPtr = ipSrc;
int16_t current = -1;
uint8_t addrIdx = 0;
for (int i = 0; i < ipSrcLen, addrIdx < 4; i++) {
char c = *ipPtr++;
if (c == '.') {
if (current == -1) return 0; // 2 consecutive dots or dot at beginning
addr[addrIdx++] = current;
current = -1;
} else {
uint8_t digit = c - '0';
if (digit >= 10) {
// Invalid character
if (addrIdx == 3 && current >= 0) {
// Invalid character at the end is treated as the end of the address
addr[addrIdx++] = current;
ipPtr--;
break;
}
}
if (current == -1) {
// first digit of current byte
current = digit;
} else {
// further digits
current = current * 10 + digit;
if (current > 255) return 0; // current byte greater than 255 => invalid
}
}
}
if (addrIdx == 3 && current >= 0) {
// write last byte (necessary if at end if string)
addr[addrIdx++] = current;
}
if (addrIdx == 4) {
// valid IP address
return ipPtr - ipSrc;
} else {
// invalid IP address
return 0;
}
}
I also wrote the inverse function. Here I am quite sure I got all cases, but the divisions are suboptimal, because the microcontroller has no dedicated divider. Which means extra code for the divisions has to be generated and this bloats the code size and reduces the speed (which is somewhat less important to me).
Is there any why to avoid the divisions?
unsigned int IpToString(char ipDst[4*3+3+1], const uint8_t addr[4]) {
char* ipPtr = ipDst;
for (int i = 0; i < 4; i++) {
uint8_t current = addr[i];
uint8_t h = current / 100;
current -= h * 100;
uint8_t d = current / 10;
current -= d * 10;
uint8_t o = current;
if (h > 0) *ipPtr++ = '0' + h;
if (h > 0 || d > 0) *ipPtr++ = '0' + d;
*ipPtr++ = '0' + o;
if (i < 3) *ipPtr++ = '.';
}
*ipPtr = '0円';
return ipPtr - ipBuffer;
}
2 Answers 2
Potential bugs
I just spent a few minutes reviewing this, so I'm not sure these are bugs, but you should have a look at them:
The comma operator is not the same as
&&
. You have this code:for (int i = 0; i < ipSrcLen, addrIdx < 4; i++) {
but I think you meant to use
&&
instead of,
. I'm not sure it matters because I'm not sure what the chances are that you will get a mismatch between the string contents and the length, but it's a possible source of error.Incomplete error checking. In the "invalid digit" section of your loop, you have this check:
if (addrIdx == 3 && current >= 0) {
but you don't do anything if that condition is not true. If
addrIdx
is 0, 1, or 2 you don't handle the bad digit, but instead fall through. I think you need to catch those cases and fail gracefully.
IpToString
I mentioned this in the comments, but you know the range of values is small. So there's no reason not to replace your divisions with either a series of if
statements or a lookup table.
Unless you're writing a router or something, I don't expect the lookup table would pay for itself, so the if
statements seem to be the way to go. Something like this:
need_tens_0 = FALSE
// hundreds digit
if number >= 100:
need_tens_0 = TRUE
if number >= 200:
*ptr++ = '2'
number -= 200
else:
*ptr++ = '1'
number -= 100
// tens digit (binary search)
if number >= 50:
if number >= 80:
if number >= 90:
*ptr++ = '9'
number -= 90
else:
*ptr++ = '8'
number -= 80
// ... buncha cases omitted ...
else if number >= 10:
else:
// "6" could be 56 or 106 or 216. Check if we need to insert a padding 0
if need_tens_0:
*ptr++ = '0'
// Ones digit
*ptr++ = '0' + number
-
\$\begingroup\$ I think you are right about the comma operator. The compiler output is quite different if I use
&&
instead and does not change if I removei < ipSrcLen,
. So it seemsi < ipSrcLen
was being ignored completely. Thank you very much. \$\endgroup\$LittleEwok– LittleEwok2019年11月08日 16:18:59 +00:00Commented Nov 8, 2019 at 16:18 -
\$\begingroup\$ The second point is not true though, because anything else than
addrIdx >= 3
should be treated as an error and return 0, which it does. I ended up checking the hundreds with if and the tens with division, because that yielded the smallest code size. On a different architecture the binary search may be more efficient but not on mine. \$\endgroup\$LittleEwok– LittleEwok2019年11月08日 16:23:13 +00:00Commented Nov 8, 2019 at 16:23 -
\$\begingroup\$ If you hadn't answered this question, it might have been closed due to Lack of Concrete Context. \$\endgroup\$2019年11月08日 16:55:27 +00:00Commented Nov 8, 2019 at 16:55
-
\$\begingroup\$ There is no need to replace a division by an integer constant with "more efficient" code, as the compiler will turn the division into a multiplication anyway, at all reasonable optimization levels. \$\endgroup\$Roland Illig– Roland Illig2024年02月17日 08:37:09 +00:00Commented Feb 17, 2024 at 8:37
Your compiler should be smart enough to replace the division by multiplication. Here's an experiment I did with CLang on x86_64:
#include <inttypes.h>
uint8_t div8_100(uint8_t a) { return a / 100; }
uint8_t div8_10(uint8_t a) { return a / 10; }
clang -O3 -Wall -Weverything -S div8.c
div8_100:
movzbl %cl, %eax
leal (%rax,%rax,4), %ecx
leal (%rax,%rcx,8), %eax
shrl 12,ドル %eax
retq
div8_10:
movzbl %cl, %eax
imull 205,ドル %eax, %eax
shrl 11,ドル %eax
retq
This is fairly efficient code, and it is not using any exotic opcodes, therefore I expect it to be available on every architecture.
In summary, as long as you divide by integer literals, there is no reason for the compiler to call any external division function.
-
\$\begingroup\$ I did not know that. Very good to know. That may be the reason why I did not get any benefit from replacing
/ 10
by @Austin Hastings proposed `if`` statements. \$\endgroup\$LittleEwok– LittleEwok2019年11月15日 08:03:44 +00:00Commented Nov 15, 2019 at 8:03
Explore related questions
See similar questions with these tags.
if
statements? Compare with 200, 100, and then a binary search of 10's digits. \$\endgroup\$inet_*
fromarpa/inet.h
, for example in muslinet_aton.c
andinet_ntoa.c
. (Keep in mind to respect musl's license.) \$\endgroup\$StringToIp(..., "2.0004年6月8日", ...)
function? (4 digits in a byte) \$\endgroup\$