I have a buffer that contains the HTTP header, which contains the Content-Length
string indicating how big the file is that needs to be downloaded.
I am looking to write a generic function for parsing a substring and eventually a value that needs to be converted to an int
from a string.
The following approach seems to work but seems more like a "brute-force" approach. Any better ideas?
void parse(char *src, char *dst, const char *firstKey, const char *secKey)
{
char *start = strstr(src, firstKey) + 2;
char *end = strstr(start, secKey);
size_t bytes = end - start;
memcpy(dst, start, bytes);
}
int main()
{
char httpHeader[] = {72, 84, 84, 80, 47, 49, 46, 49, 32, 50, 48, 49, 32, 67, 114, 101, 97, 116, 101, 100, 13, 10, 68, 97, 116, 101, 58, 32, 77, 111, 110, 44, 32, 49, 55, 32, 74, 97, 110, 32, 50, 48, 50, 50, 32, 50, 49, 58, 53, 56, 58, 52, 51, 32, 71, 77, 84, 13, 10, 83, 101, 114, 118, 101, 114, 58, 32, 65, 112, 97, 99, 104, 101, 47, 50, 46, 52, 46, 50, 57, 32, 40, 85, 98, 117, 110, 116, 117, 41, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 76, 101, 110, 103, 116, 104, 58, 32, 48, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 84, 121, 112, 101, 58, 32, 116, 101, 120, 116, 47, 104, 116, 109, 108, 13, 10, 13, 10};
/*
what httpHeader above stores:
HTTP/1.1 201 Created
Date: 2022年1月17日 21:58:43 GMT
Server: Apache/2.4.29 (Ubuntu)
Content-Length: 0
Content-Type: text/html
*/
char dst[100] = {0};
parse(httpHeader, dst, "Content-Length", "\n"); // dst => Content-Length: 0
char dst1[100] = {0};
parse(dst, dst1, ":", "\r");
// get the int value
int contentLenVal = atoi(dst1);
printf ("dst: %s\ndst1: %s\n\nContent value: %d\n", dst, dst1, contentLenVal);
}
1 Answer 1
Any better ideas?
Consume the buffer
Rather than leave the buffer "as-is" while parsing, adjust its start and end while parsing.
char *s = httpHeader;
s = parse(s, "Content-Length", "\n");
if (s == NULL) Handle_Error();
s = parse(s, ":", "\r");
if (s == NULL) Handle_Error();
long contentLenVal = strtol(s, ...); // See below for details.
Avoid bugs.
Avoid going off the end
Below may + 2 past the end of the string. Better to walk carefully.
// char *start = strstr(src, firstKey) + 2;
char *start = strstr(src, firstKey);
if (start == NULL || start[0] == '0円' || start[1] == '0円') {
Handle_Error();
}
start += 2;
....
if (end == NULL) {
Handle_Error();
}
IAC, the + 2
is strange. I'd expect:
char *start = strstr(src, firstKey);
if (start == NULL) {
Handle_Error();
}
start += strlen(firstKey);
dst
lacks a certain null character
Presently code relies on the caller to pre-fill with '0円
. Better to pass in buffer size and append a '0円'
explicitly in the function.
// void parse(char *src, char *dst, ...
void parse(const char *src, size_t sz, char *dst, ...
...
ptrdiff_t diff = end - start;
if (diff > sz) {
Handle_Error();
} else {
size_t bytes = (size_t) diff;
memcpy(dst, start, bytes);
dest[bytes] = '0円';
}
Use strtol()
Robust code looks for errors,
// int contentLenVal = atoi(dst1);
char *endptr;
errno = 0;
long contentLenVal = strtol(dst1, &endptr, 10);
if (dst1 == endptr || errno || contentLenVal < LEN_MIN || contentLenVal > LEN_MAX) {
Handle_Error();
}
Minor: Use const
// void parse(char *src, char *dst, ...
void parse(const char *src, char *dst, ...
If code is to consume src
, leave as is.
Minor: Wrap
// char httpHeader[] = {72, 84, 84, 80, 47, 49, 46, 49, 32, 50, 48, 49, 32, 67, 114, 101, 97, 116, 101, 100, 13, 10, 68, 97, 116, 101, 58, 32, 77, 111, 110, 44, 32, 49, 55, 32, 74, 97, 110, 32, 50, 48, 50, 50, 32, 50, 49, 58, 53, 56, 58, 52, 51, 32, 71, 77, 84, 13, 10, 83, 101, 114, 118, 101, 114, 58, 32, 65, 112, 97, 99, 104, 101, 47, 50, 46, 52, 46, 50, 57, 32, 40, 85, 98, 117, 110, 116, 117, 41, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 76, 101, 110, 103, 116, 104, 58, 32, 48, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 84, 121, 112, 101, 58, 32, 116, 101, 120, 116, 47, 104, 116, 109, 108, 13, 10, 13, 10};
Versus
char httpHeader[] = {72, 84, 84, 80, 47, 49, 46, 49, 32, 50, 48, 49, 32, 67,
114, 101, 97, 116, 101, 100, 13, 10, 68, 97, 116, 101, 58, 32, 77, 111, 110,
44, 32, 49, 55, 32, 74, 97, 110, 32, 50, 48, 50, 50, 32, 50, 49, 58, 53, 56,
58, 52, 51, 32, 71, 77, 84, 13, 10, 83, 101, 114, 118, 101, 114, 58, 32, 65,
112, 97, 99, 104, 101, 47, 50, 46, 52, 46, 50, 57, 32, 40, 85, 98, 117, 110,
116, 117, 41, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 76, 101, 110,
103, 116, 104, 58, 32, 48, 13, 10, 67, 111, 110, 116, 101, 110, 116, 45, 84,
121, 112, 101, 58, 32, 116, 101, 120, 116, 47, 104, 116, 109, 108, 13, 10,
13, 10};
Tip: Use an auto-formatter.
-
\$\begingroup\$ thanks! why is it better for the callee to take care of null terminator as opposed to the caller making sure the buffer is null terminated? \$\endgroup\$xyf– xyf2022年01月21日 01:51:22 +00:00Commented Jan 21, 2022 at 1:51
-
\$\begingroup\$ @xyf Callee know where to efficiently write it and less repeated code. Consider a buffer of 1,000s of bytes. Should the caller be obliged to pre-zeros thousands or let the callee set the one? \$\endgroup\$chux– chux2022年01月21日 02:41:11 +00:00Commented Jan 21, 2022 at 2:41
-
\$\begingroup\$ so perhaps just memset the entire buffer to 0 inside the caller? for which the caller would need to pass the size of the buffer too \$\endgroup\$xyf– xyf2022年01月22日 05:22:28 +00:00Commented Jan 22, 2022 at 5:22
-
\$\begingroup\$ @xyf The caller should pass the size anyways. Further, say user passed a buffer pointer size to 100,000 bytes yet the function only needs to set a few bytes and then a null character. There is no advantage yet weaknesses to pre memset-ing an arbitrary sized buffer by the caller or callee. \$\endgroup\$chux– chux2022年01月22日 05:36:11 +00:00Commented Jan 22, 2022 at 5:36
-
\$\begingroup\$ yeah. I reckon if the entire buffer isn't guaranteed to filled up with legit values, and if a buffer is quite large, it's a safer bet to not memset and rather append
0円
at the end after populating \$\endgroup\$xyf– xyf2022年01月22日 05:38:22 +00:00Commented Jan 22, 2022 at 5:38