Small function for getting character value

Question 1

I want to extract some values in i-commas(") from lines like this:

<P k="9,0,1" vt="191" v="100.99936" z="" />

Example:

getCharVal ( cp, char "k=\"", 9)

where cp is a pointer the line above should return "9,0,1"

The function:

#define ENDTAG "/>"
#define ICOMMAS '"'
char * getCharVal ( char *cp, char *att, size_t s)
{
 char * val;
 char * endTagP;
 if (cp == NULL)return NULL;
 cp = strstr(cp, att)+strlen(att);
 if (cp == NULL) return NULL;
 char * endP = strchr(cp, ICOMMAS);
 if (endP == NULL) return NULL;
 endTagP = strstr(cp, ENDTAG);
 if (endTagP == NULL) return NULL;
 if (endP > endTagP) return NULL;
 size_t valsize = endP - cp ;
 if (valsize > s) return NULL;
 val = malloc(valsize + 1);
 memcpy (val, cp, valsize);
 val[valsize]='0円';
 cp = endTagP;
 return val;
}

This code is ugly. Could anyone give me hints on how to write it better?

Question 2

Use a regular expression! (seriously, don't!)

Question 3

I've modified your code to make it more normal C99, including adding const to parameters that your function does not change, improving (to my taste) the variable names, moving variable definition to the point of their first use and using strndup to duplicate the tag (not universally available but easily written).

char *getCharVal(const char *ch, const char *att, size_t size)
{
 if (!ch) {
 return NULL;
 }
 ch = strstr(ch, att);
 if (!ch) {
 return NULL;
 }
 ch += strlen(att);
 char *end = strchr(ch, '"');
 if (!end) {
 return NULL;
 }
 char *endTag = strstr(ch, ENDTAG);
 if (!endTag) {
 return NULL;
 }
 if (end > endTag) {
 return NULL;
 }
 size_t valSize = end - ch;
 if (valSize > size) {
 return NULL;
 }
 return strndup(ch, valSize);
}

Note also that your

cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;

will never fail because even if att is not found you have added its length to the NULL pointer that strstr would return.

A beneficial change would be to omit the check for size (although perhaps your application is such that this check is more necessary than it appears).

Finally, I think your interface is ugly. It would be much cleaner to pass the the pure attribute "k". However, I can see that passing "k=\"" is an optimisation that makes the function much simpler to implement.

Question 4

Didn't notice: cp = strstr(cp, att)+strlen(att);. Goo eye.

Question 5

Do you also recommend the change it and use standard xml libraries?

Question 6

Size is checked because it is stream and sometimes comes something wrong.

Question 7

Depends. If if you only ever want to extract "k" from your stream then using your own function might be sensible. But if you need to extract more and/or your XML can be badly formed a library is likely to make the job much easier.

Question 8

The easiest code to maintain is code that doesn't exist. For parsing XML, use an XML-parsing library, preferably with XPath support. Sometimes it might be justified to whip up your own code to extract values from XML, but it clearly does not make sense in this case. Not only is the code ugly by your own admission, low-level string- and memory-manipulation also distracts you from thinking about the big picture. Furthermore, your code will be less robust than a proper XML parser, and it will fail if the text is encoded in a semantically equivalent variant (for example, if a value is escaped).

Question 9

Definitely use a tried and true library, but as a exercise or for fun, try to write these yourself. You'll become a better programmer for the effort.

Question 10

Work on the algorithm first before you think about error checking. You know you'll input information correctly. The basic algorithm follows:

char *getCharVal(char *cp, char *att, size_t s)
{
 char *val, *pChStart, *pChEnd;
 size_t valLen;
 pChStart = strstr(cp, att); // Find value substring in source string
 pChStart += strlen(att); // Go to beginning of value substring
 pChEnd = strstr(pChStart, "\""); // Find end of value substring
 valLen = pChEnd - pChStart; // Calculate length of value substring
 val = malloc(valLen * sizeof(char) + 1); // Allocate memory for value substring
 strncpy(val, pChStart, valLen); // Copy value in source string to value substring copy
 val[valLen] = '0円'; // NULL terminate value substring copy
 return val;
}

Once you get the algorithm down, you can add error checking. You know you'll be checking for NULL a lot, so instead of using a lot of if statements in the source code, which adds clutter and makes things less readable, add a macro to "hide" it:

#define CHECK_NULL(string) (if((string) == NULL) return NULL)

With this, you can check for NULLs in a less obtrusive way:

 CHECK_NULL(pChStart == strstr(cp, att));

Your code with the CHECK_NULL macro is much more readable:

char * getCharVal ( char *cp, char *att, size_t s)
{
 char * val;
 char * endTagP;
 char * endP;
 CHECK_NULL(cp);
 CHECK_NULL(strstr(cp, att)+strlen(att));
 CHECK_NULL(endP = strchr(cp, ICOMMAS));
 CHECK_NULL(strstr(cp, ENDTAG));
 size_t valsize = endP - cp ;
 if (valsize > s) return NULL;
 CHECK_NULL(val = malloc(valsize + 1));
 memcpy (val, cp, valsize);
 val[valsize]='0円';
 cp = endTagP;
 return val;
}

Question 11

Note that sizeof(char) is 1 by definition, malloc should not be cast, the OP is correct in using memcpy instead of your strncpy (he knows the length) and that multiple vars are usually best not declared on the same line but are often better defined at the point of first use. Encouraging the use of macros is also not something I would do and encouraging their use to hide error handling is arguably wrong.

Question 12

@WilliamMorris OP tagged this C, so variables at top is convention. Using old compiler... I forgot to take the cast that it requires out. Chars are not always one byte. The macro is simple and it's obvious what it does.

Question 13

sizeof(char) is 1. By definition. The macro call is only "obvious" here because the definition is right above it. In normal use it will not be, and so without looking up the definition there is no way to know what happens if its "check" turns out false. And what happens when you have another use case where the return needs to be -1 instead of NULL? Another macro with a similar name? Use of macros is discouraged for good reasons.

Question 14

@WilliamMorris stackoverflow.com/questions/20684056/… You are correct sir. My apologies.

Question 15

@Fiddling Bits i really like your solution. I really didn't know to whom I should give "accepted". I decided for Morris because of his further explanation. Anyhow big thanks and I'll use some of you ideas too.

William Morris William Morris 9,38419 silver badges42 bronze badges · Accepted Answer · 2013-12-19 02:07:04Z

I've modified your code to make it more normal C99, including adding const to parameters that your function does not change, improving (to my taste) the variable names, moving variable definition to the point of their first use and using strndup to duplicate the tag (not universally available but easily written).

char *getCharVal(const char *ch, const char *att, size_t size)
{
 if (!ch) {
 return NULL;
 }
 ch = strstr(ch, att);
 if (!ch) {
 return NULL;
 }
 ch += strlen(att);
 char *end = strchr(ch, '"');
 if (!end) {
 return NULL;
 }
 char *endTag = strstr(ch, ENDTAG);
 if (!endTag) {
 return NULL;
 }
 if (end > endTag) {
 return NULL;
 }
 size_t valSize = end - ch;
 if (valSize > size) {
 return NULL;
 }
 return strndup(ch, valSize);
}

Note also that your

cp = strstr(cp, att)+strlen(att);
if (cp == NULL) return NULL;

will never fail because even if att is not found you have added its length to the NULL pointer that strstr would return.

A beneficial change would be to omit the check for size (although perhaps your application is such that this check is more necessary than it appears).

Finally, I think your interface is ugly. It would be much cleaner to pass the the pure attribute "k". However, I can see that passing "k=\"" is an optimisation that makes the function much simpler to implement.

Didn't notice: cp = strstr(cp, att)+strlen(att);. Goo eye.
Do you also recommend the change it and use standard xml libraries?
Size is checked because it is stream and sometimes comes something wrong.
Depends. If if you only ever want to extract "k" from your stream then using your own function might be sensible. But if you need to extract more and/or your XML can be badly formed a library is likely to make the job much easier.

Stack Exchange Network

Small function for getting character value

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Small function for getting character value

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions