I want to extract some values in i-commas(") from lines like this:
<P k="9,0,1" vt="191" v="100.99936" z="" />
Example:
getCharVal ( cp, char "k=\"", 9)
where cp is a pointer the line above should return "9,0,1"
The function:
#define ENDTAG "/>"
#define ICOMMAS '"'
char * getCharVal ( char *cp, char *att, size_t s)
{
char * val;
char * endTagP;
if (cp == NULL)return NULL;
cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;
char * endP = strchr(cp, ICOMMAS);
if (endP == NULL) return NULL;
endTagP = strstr(cp, ENDTAG);
if (endTagP == NULL) return NULL;
if (endP > endTagP) return NULL;
size_t valsize = endP - cp ;
if (valsize > s) return NULL;
val = malloc(valsize + 1);
memcpy (val, cp, valsize);
val[valsize]='0円';
cp = endTagP;
return val;
}
This code is ugly. Could anyone give me hints on how to write it better?
-
\$\begingroup\$ Use a regular expression! (seriously, don't!) \$\endgroup\$Mathieu Guindon– Mathieu Guindon2013年12月19日 03:04:17 +00:00Commented Dec 19, 2013 at 3:04
3 Answers 3
I've modified your code to make it more normal C99, including adding const
to parameters that your function does not change, improving (to my taste) the
variable names, moving variable definition to the point of their first use and
using strndup
to duplicate the tag (not universally available but easily
written).
char *getCharVal(const char *ch, const char *att, size_t size)
{
if (!ch) {
return NULL;
}
ch = strstr(ch, att);
if (!ch) {
return NULL;
}
ch += strlen(att);
char *end = strchr(ch, '"');
if (!end) {
return NULL;
}
char *endTag = strstr(ch, ENDTAG);
if (!endTag) {
return NULL;
}
if (end > endTag) {
return NULL;
}
size_t valSize = end - ch;
if (valSize > size) {
return NULL;
}
return strndup(ch, valSize);
}
Note also that your
cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;
will never fail because even if att
is not found you have added its length
to the NULL pointer that strstr
would return.
A beneficial change would be to omit the check for size (although perhaps your application is such that this check is more necessary than it appears).
Finally, I think your interface is ugly. It would be much cleaner to pass the the pure attribute "k". However, I can see that passing "k=\"" is an optimisation that makes the function much simpler to implement.
-
\$\begingroup\$ Didn't notice:
cp = strstr(cp, att)+strlen(att);
. Goo eye. \$\endgroup\$Fiddling Bits– Fiddling Bits2013年12月19日 04:07:05 +00:00Commented Dec 19, 2013 at 4:07 -
\$\begingroup\$ Do you also recommend the change it and use standard xml libraries? \$\endgroup\$MaMu– MaMu2013年12月19日 09:16:31 +00:00Commented Dec 19, 2013 at 9:16
-
\$\begingroup\$ Size is checked because it is stream and sometimes comes something wrong. \$\endgroup\$MaMu– MaMu2013年12月19日 09:25:44 +00:00Commented Dec 19, 2013 at 9:25
-
1\$\begingroup\$ Depends. If if you only ever want to extract "k" from your stream then using your own function might be sensible. But if you need to extract more and/or your XML can be badly formed a library is likely to make the job much easier. \$\endgroup\$William Morris– William Morris2013年12月19日 11:52:06 +00:00Commented Dec 19, 2013 at 11:52
The easiest code to maintain is code that doesn't exist. For parsing XML, use an XML-parsing library, preferably with XPath support. Sometimes it might be justified to whip up your own code to extract values from XML, but it clearly does not make sense in this case. Not only is the code ugly by your own admission, low-level string- and memory-manipulation also distracts you from thinking about the big picture. Furthermore, your code will be less robust than a proper XML parser, and it will fail if the text is encoded in a semantically equivalent variant (for example, if a value is escaped).
-
\$\begingroup\$ Definitely use a tried and true library, but as a exercise or for fun, try to write these yourself. You'll become a better programmer for the effort. \$\endgroup\$Fiddling Bits– Fiddling Bits2013年12月19日 01:11:03 +00:00Commented Dec 19, 2013 at 1:11
Work on the algorithm first before you think about error checking. You know you'll input information correctly. The basic algorithm follows:
char *getCharVal(char *cp, char *att, size_t s)
{
char *val, *pChStart, *pChEnd;
size_t valLen;
pChStart = strstr(cp, att); // Find value substring in source string
pChStart += strlen(att); // Go to beginning of value substring
pChEnd = strstr(pChStart, "\""); // Find end of value substring
valLen = pChEnd - pChStart; // Calculate length of value substring
val = malloc(valLen * sizeof(char) + 1); // Allocate memory for value substring
strncpy(val, pChStart, valLen); // Copy value in source string to value substring copy
val[valLen] = '0円'; // NULL terminate value substring copy
return val;
}
Once you get the algorithm down, you can add error checking. You know you'll be checking for NULL
a lot, so instead of using a lot of if
statements in the source code, which adds clutter and makes things less readable, add a macro to "hide" it:
#define CHECK_NULL(string) (if((string) == NULL) return NULL)
With this, you can check for NULL
s in a less obtrusive way:
CHECK_NULL(pChStart == strstr(cp, att));
Your code with the CHECK_NULL
macro is much more readable:
char * getCharVal ( char *cp, char *att, size_t s)
{
char * val;
char * endTagP;
char * endP;
CHECK_NULL(cp);
CHECK_NULL(strstr(cp, att)+strlen(att));
CHECK_NULL(endP = strchr(cp, ICOMMAS));
CHECK_NULL(strstr(cp, ENDTAG));
size_t valsize = endP - cp ;
if (valsize > s) return NULL;
CHECK_NULL(val = malloc(valsize + 1));
memcpy (val, cp, valsize);
val[valsize]='0円';
cp = endTagP;
return val;
}
-
1\$\begingroup\$ Note that
sizeof(char)
is 1 by definition,malloc
should not be cast, the OP is correct in usingmemcpy
instead of yourstrncpy
(he knows the length) and that multiple vars are usually best not declared on the same line but are often better defined at the point of first use. Encouraging the use of macros is also not something I would do and encouraging their use to hide error handling is arguably wrong. \$\endgroup\$William Morris– William Morris2013年12月19日 01:34:04 +00:00Commented Dec 19, 2013 at 1:34 -
1\$\begingroup\$ @WilliamMorris OP tagged this C, so variables at top is convention. Using old compiler... I forgot to take the cast that it requires out. Chars are not always one byte. The macro is simple and it's obvious what it does. \$\endgroup\$Fiddling Bits– Fiddling Bits2013年12月19日 01:41:41 +00:00Commented Dec 19, 2013 at 1:41
-
1\$\begingroup\$ sizeof(char) is 1. By definition. The macro call is only "obvious" here because the definition is right above it. In normal use it will not be, and so without looking up the definition there is no way to know what happens if its "check" turns out false. And what happens when you have another use case where the return needs to be -1 instead of NULL? Another macro with a similar name? Use of macros is discouraged for good reasons. \$\endgroup\$William Morris– William Morris2013年12月19日 02:21:46 +00:00Commented Dec 19, 2013 at 2:21
-
1\$\begingroup\$ @WilliamMorris stackoverflow.com/questions/20684056/… You are correct sir. My apologies. \$\endgroup\$Fiddling Bits– Fiddling Bits2013年12月19日 14:54:35 +00:00Commented Dec 19, 2013 at 14:54
-
1\$\begingroup\$ @Fiddling Bits i really like your solution. I really didn't know to whom I should give "accepted". I decided for Morris because of his further explanation. Anyhow big thanks and I'll use some of you ideas too. \$\endgroup\$MaMu– MaMu2013年12月20日 10:14:43 +00:00Commented Dec 20, 2013 at 10:14