3
\$\begingroup\$

I want to extract some values in i-commas(") from lines like this:

<P k="9,0,1" vt="191" v="100.99936" z="" />

Example:

getCharVal ( cp, char "k=\"", 9)

where cp is a pointer the line above should return "9,0,1"

The function:

#define ENDTAG "/>"
#define ICOMMAS '"'
char * getCharVal ( char *cp, char *att, size_t s)
{
 char * val;
 char * endTagP;
 if (cp == NULL)return NULL;
 cp = strstr(cp, att)+strlen(att);
 if (cp == NULL) return NULL;
 char * endP = strchr(cp, ICOMMAS);
 if (endP == NULL) return NULL;
 endTagP = strstr(cp, ENDTAG);
 if (endTagP == NULL) return NULL;
 if (endP > endTagP) return NULL;
 size_t valsize = endP - cp ;
 if (valsize > s) return NULL;
 val = malloc(valsize + 1);
 memcpy (val, cp, valsize);
 val[valsize]='0円';
 cp = endTagP;
 return val;
}

This code is ugly. Could anyone give me hints on how to write it better?

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Dec 18, 2013 at 17:02
\$\endgroup\$
1

3 Answers 3

4
\$\begingroup\$

I've modified your code to make it more normal C99, including adding const to parameters that your function does not change, improving (to my taste) the variable names, moving variable definition to the point of their first use and using strndup to duplicate the tag (not universally available but easily written).

char *getCharVal(const char *ch, const char *att, size_t size)
{
 if (!ch) {
 return NULL;
 }
 ch = strstr(ch, att);
 if (!ch) {
 return NULL;
 }
 ch += strlen(att);
 char *end = strchr(ch, '"');
 if (!end) {
 return NULL;
 }
 char *endTag = strstr(ch, ENDTAG);
 if (!endTag) {
 return NULL;
 }
 if (end > endTag) {
 return NULL;
 }
 size_t valSize = end - ch;
 if (valSize > size) {
 return NULL;
 }
 return strndup(ch, valSize);
}

Note also that your

cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;

will never fail because even if att is not found you have added its length to the NULL pointer that strstr would return.

A beneficial change would be to omit the check for size (although perhaps your application is such that this check is more necessary than it appears).

Finally, I think your interface is ugly. It would be much cleaner to pass the the pure attribute "k". However, I can see that passing "k=\"" is an optimisation that makes the function much simpler to implement.

answered Dec 19, 2013 at 2:07
\$\endgroup\$
4
  • \$\begingroup\$ Didn't notice: cp = strstr(cp, att)+strlen(att);. Goo eye. \$\endgroup\$ Commented Dec 19, 2013 at 4:07
  • \$\begingroup\$ Do you also recommend the change it and use standard xml libraries? \$\endgroup\$ Commented Dec 19, 2013 at 9:16
  • \$\begingroup\$ Size is checked because it is stream and sometimes comes something wrong. \$\endgroup\$ Commented Dec 19, 2013 at 9:25
  • 1
    \$\begingroup\$ Depends. If if you only ever want to extract "k" from your stream then using your own function might be sensible. But if you need to extract more and/or your XML can be badly formed a library is likely to make the job much easier. \$\endgroup\$ Commented Dec 19, 2013 at 11:52
5
\$\begingroup\$

The easiest code to maintain is code that doesn't exist. For parsing XML, use an XML-parsing library, preferably with XPath support. Sometimes it might be justified to whip up your own code to extract values from XML, but it clearly does not make sense in this case. Not only is the code ugly by your own admission, low-level string- and memory-manipulation also distracts you from thinking about the big picture. Furthermore, your code will be less robust than a proper XML parser, and it will fail if the text is encoded in a semantically equivalent variant (for example, if a value is escaped).

answered Dec 18, 2013 at 17:36
\$\endgroup\$
1
  • \$\begingroup\$ Definitely use a tried and true library, but as a exercise or for fun, try to write these yourself. You'll become a better programmer for the effort. \$\endgroup\$ Commented Dec 19, 2013 at 1:11
3
\$\begingroup\$

Work on the algorithm first before you think about error checking. You know you'll input information correctly. The basic algorithm follows:

char *getCharVal(char *cp, char *att, size_t s)
{
 char *val, *pChStart, *pChEnd;
 size_t valLen;
 pChStart = strstr(cp, att); // Find value substring in source string
 pChStart += strlen(att); // Go to beginning of value substring
 pChEnd = strstr(pChStart, "\""); // Find end of value substring
 valLen = pChEnd - pChStart; // Calculate length of value substring
 val = malloc(valLen * sizeof(char) + 1); // Allocate memory for value substring
 strncpy(val, pChStart, valLen); // Copy value in source string to value substring copy
 val[valLen] = '0円'; // NULL terminate value substring copy
 return val;
}

Once you get the algorithm down, you can add error checking. You know you'll be checking for NULL a lot, so instead of using a lot of if statements in the source code, which adds clutter and makes things less readable, add a macro to "hide" it:

#define CHECK_NULL(string) (if((string) == NULL) return NULL)

With this, you can check for NULLs in a less obtrusive way:

 CHECK_NULL(pChStart == strstr(cp, att));

Your code with the CHECK_NULL macro is much more readable:

char * getCharVal ( char *cp, char *att, size_t s)
{
 char * val;
 char * endTagP;
 char * endP;
 CHECK_NULL(cp);
 CHECK_NULL(strstr(cp, att)+strlen(att));
 CHECK_NULL(endP = strchr(cp, ICOMMAS));
 CHECK_NULL(strstr(cp, ENDTAG));
 size_t valsize = endP - cp ;
 if (valsize > s) return NULL;
 CHECK_NULL(val = malloc(valsize + 1));
 memcpy (val, cp, valsize);
 val[valsize]='0円';
 cp = endTagP;
 return val;
}
answered Dec 18, 2013 at 22:18
\$\endgroup\$
7
  • 1
    \$\begingroup\$ Note that sizeof(char) is 1 by definition, malloc should not be cast, the OP is correct in using memcpy instead of your strncpy (he knows the length) and that multiple vars are usually best not declared on the same line but are often better defined at the point of first use. Encouraging the use of macros is also not something I would do and encouraging their use to hide error handling is arguably wrong. \$\endgroup\$ Commented Dec 19, 2013 at 1:34
  • 1
    \$\begingroup\$ @WilliamMorris OP tagged this C, so variables at top is convention. Using old compiler... I forgot to take the cast that it requires out. Chars are not always one byte. The macro is simple and it's obvious what it does. \$\endgroup\$ Commented Dec 19, 2013 at 1:41
  • 1
    \$\begingroup\$ sizeof(char) is 1. By definition. The macro call is only "obvious" here because the definition is right above it. In normal use it will not be, and so without looking up the definition there is no way to know what happens if its "check" turns out false. And what happens when you have another use case where the return needs to be -1 instead of NULL? Another macro with a similar name? Use of macros is discouraged for good reasons. \$\endgroup\$ Commented Dec 19, 2013 at 2:21
  • 1
    \$\begingroup\$ @WilliamMorris stackoverflow.com/questions/20684056/… You are correct sir. My apologies. \$\endgroup\$ Commented Dec 19, 2013 at 14:54
  • 1
    \$\begingroup\$ @Fiddling Bits i really like your solution. I really didn't know to whom I should give "accepted". I decided for Morris because of his further explanation. Anyhow big thanks and I'll use some of you ideas too. \$\endgroup\$ Commented Dec 20, 2013 at 10:14

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.