Parsing ARP cache in C

Question 1

How can I make this better?

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdlib.h>
#define ARP_CACHE "/proc/net/arp"
#define ARP_BUFFER_LEN 1024
#define ARP_DELIM " "
int readLine(int Fd, char *Buffer)
{
 if (Fd < 0)
 {
 return -1;
 }
 char ch;
 size_t Read = 0;
 while (read(Fd, (Buffer + Read), 1))
 {
 ch = Buffer[Read];
 if (ch == '\n')
 {
 break;
 }
 Read++;
 }
 if (Read)
 {
 Buffer[Read] = 0;
 return 0;
 }
 return -1;
}
char * getField(char * Line_Arg, int Field)
{
 char * Line = malloc(strlen(Line_Arg)), *ptr;
 memcpy(Line, Line_Arg, strlen(Line_Arg));
 ptr = Line;
 char * s; 
 s = strtok(Line, ARP_DELIM);
 while (Field && s)
 {
 s = strtok(NULL, ARP_DELIM);
 Field--;
 };
 char * ret;
 if (s)
 {
 ret = (char*)malloc(strlen(s) + 1);
 memset(ret, 0, strlen(s) + 1);
 memcpy(ret, s, strlen(s));
 }
 free(ptr);
 return s ? ret : NULL;
}
int main()
{
 int Fd = open(ARP_CACHE, O_RDONLY);
 if (Fd < 0)
 {
 fprintf(stdout, "Arp Cache: Failed to open file \"%s\"\n", ARP_CACHE);
 return 1;
 }
 char Buffer[ARP_BUFFER_LEN];
 /* Ignore first line */
 int Ret = readLine(Fd, &Buffer[0]);
 Ret = readLine(Fd, &Buffer[0]);
 int count = 0;
 while (Ret == 0)
 {
 char * Line;
 Line = &Buffer[0];
 /* Get Ip, Mac, Interface */ 
 char * Ip = getField(Line, 0);
 char * Mac = getField(Line, 3);
 char * IfaceStr = getField(Line, 5);
 fprintf(stdout, "%03d: Mac Address of [%s] on [%s] is \"%s\"\n",
 ++count, Ip, IfaceStr, Mac);
 free(Ip);
 free(Mac);
 free(IfaceStr);
 Ret = readLine(Fd, &Buffer[0]);
 }
 close(Fd);
 return Ret;
}

Question 2

Out of curiosity, where did you learn to program in C, and is it your first language?

Question 3

Leaned it in college.Haven't used it in about 10 years.Started coding a while back again.Why do you ask?

Question 4

Just plain ol' curiosity ;)

Question 5

Not at all! It makes you a perfect candidate for a Code Review!

Question 6

read can return more than two different values. It returns -1 on an error, 0 on EOF, and any value in (0,count] on success. Since your count is just 1, it means possible return values are -1, 0, and 1. Your code is treating a return value of -1 incorrectly.

Question 7

Concept

I don't believe that there is a cross-platform way to read the ARP cache. From the use of /proc filesystem, I deduce that you are targeting Linux. In Linux, you could read /proc/net/arp as you have done, or run the command ip neigh, which does something similar to your program. (On OS X, you could run arp -a -n instead.)

A comment worth a thousand words

You should include a comment like this in your program:

/**
 * /proc/net/arp looks like this:
 *
 * IP address HW type Flags HW address Mask Device
 * 192.168.12.31 0x1 0x2 00:09:6b:00:02:03 * eth0
 * 192.168.12.70 0x1 0x2 00:01:02:38:4c:85 * eth0
 */

That picture tells me everything I need to know about what you are trying to accomplish — you need hardly explain further. Conversely, not having that picture makes me work a lot harder to reverse-engineer your code.

Minutiae

&Buffer[0] is usually written as Buffer.
You repeat the readLine() call — once before entering the loop, once at the end of the loop. This is a good opportunity to make use of C's support for side-effects.
```
while (0 == (Ret = readLine(Fd, Buffer))) {
 ...
}
```
Since ARP_CACHE is a compile-time constant string, you can just write
```
fprintf(stdout, "Arp Cache: Failed to open file \"" ARP_CACHE "\"\n");
```
instead of doing a %s substitution at runtime.
Reading one byte at a time is wasteful.
Print errors to standard error; don't contaminate standard output.
I suggest naming your variables to be consistent with the /proc/net/arp header fields. For example, device instead of IfaceStr. Also, be consistent with capitalization: count starts with lowercase, which is more common.

Big-picture issues

The function int readLine(int Fd, char *Buffer) is kind of susceptible to buffer overflow. I can deduce that fact from the function signature: you pass a pointer to a buffer without also passing the size, so it seems unlikely that readLine would know how to stop when the buffer fills up. You could hard-code it to use ARP_BUFFER_LEN as a limit, but it would be unfortunate to cripple an otherwise generic function like that. It would be better to pass in the buffer size explicitly. That's a general pattern you'll see in C APIs: a pointer to a buffer is frequently accompanied by the buffer's size.

In practice, /proc/net/arp should never contain a line long enough to overflow a 1024-byte buffer, so you're safe. Still, you should follow idiomatic C conventions.
Once you modify readLine() to take a size parameter, you'll find that you've just reinvented fgets().
In C, passing strings from a function to its caller is usually avoided, since it involves malloc(), which introduces the potential for memory leaks. Rather, have the caller pass a buffer and size, like how fgets() and scanf() work.

The only good reason to return a string that was allocated using malloc() would be to support arbitrary-length results. At first glance, your getField() accomplishes that, as it allocates a buffer as large as strlen(Line_Arg). (You forgot to add a byte to accommodate the terminating NUL character, by the way.) But, that turns out not to be the case, since Line_Arg itself is not a string of arbitrary length. It's either less than ARP_BUFFER_LEN bytes long (if you got "lucky") or a buffer overflow (as discussed in (1) above).
You seem to be working very hard at I/O and string processing. Why not just use fscanf()?

The whole program, then, can be simplified to

#include <stdio.h>
/**
 * Macros to turn a numeric macro into a string literal. See
 * https://gcc.gnu.org/onlinedocs/cpp/Stringification.html
 */
#define xstr(s) str(s)
#define str(s) #s
#define ARP_CACHE "/proc/net/arp"
#define ARP_STRING_LEN 1023
#define ARP_BUFFER_LEN (ARP_STRING_LEN + 1)
/* Format for fscanf() to read the 1st, 4th, and 6th space-delimited fields */
#define ARP_LINE_FORMAT "%" xstr(ARP_STRING_LEN) "s %*s %*s " \
 "%" xstr(ARP_STRING_LEN) "s %*s " \
 "%" xstr(ARP_STRING_LEN) "s"
int main()
{
 FILE *arpCache = fopen(ARP_CACHE, "r");
 if (!arpCache)
 {
 perror("Arp Cache: Failed to open file \"" ARP_CACHE "\"");
 return 1;
 }
 /* Ignore the first line, which contains the header */
 char header[ARP_BUFFER_LEN];
 if (!fgets(header, sizeof(header), arpCache))
 {
 return 1;
 }
 char ipAddr[ARP_BUFFER_LEN], hwAddr[ARP_BUFFER_LEN], device[ARP_BUFFER_LEN];
 int count = 0;
 while (3 == fscanf(arpCache, ARP_LINE_FORMAT, ipAddr, hwAddr, device))
 {
 printf("%03d: Mac Address of [%s] on [%s] is \"%s\"\n",
 ++count, ipAddr, device, hwAddr);
 }
 fclose(arpCache);
 return 0;
}

Question 8

+1 for the excellent advice on always documenting the expected input format in front of any parse code.

Question 9

A few notes:

Overall, you are doing a lot of unnecessary work. You could just use getifaddrs() to get the IP address, MAC address, and interface name. There are plenty of code examples online on how you can use it to obtain all of that information, including an example in the function documentation itself.

You could also read directly from the file /sys/class/net/eth0/address for the MAC address (but that isn't very portable).
strtok() is limited to tokenizing only one string (with one set of delimiters) at a time, and it can't be used while threading. Therefore, strtok() is considered deprecated.

Instead, use strtok_r() or strtok_s() which are threading-friendly versions of strtok(). The POSIX standard provided strtok_r(), and the C11 standard provides strtok_s(). The use of either is a little awkward, because the first call is different from the subsequent calls.
1. The first time you call the function, send in the string to be parsed as the first argument.
2. On subsequent calls, send in NULL as the first argument.
3. The last argument is the scratch string. You don't have to initialize it on first use; on subsequent uses it will hold the string as it is parsed so far.
To demonstrate its use, I've written a simple line counter (of only non-blank lines) using the POSIX standard one. I'll leave the choice of what version to use and implementation into your program up to you.
```
#include <string.h> // strtok_r
int countLines(char* instring)
{
 size_t counter = 0;
 char *scratch, *txt;
 char *delimiter = "\n";
 for (; (txt = strtok_r((!counter ? instring : NULL), delimiter, &scratch)); counter++);
 return counter;
}
```
fopen(), a widely-used file I/O functions that you are using, got a facelift in C11. It now supports a new exclusive create-and-open mode ("...x"). The new mode behaves like O_CREAT|O_EXCL in POSIX and is commonly used for lock files. The "...x" family of modes includes the following options:
- wx create text file for writing with exclusive access.
- wbx create binary file for writing with exclusive access.
- w+x create text file for update with exclusive access.
- w+bx or wb+x create binary file for update with exclusive access.
Opening a file with any of the exclusive modes above fails if the file already exists or cannot be created. Otherwise, the file is created with exclusive (non-shared) access. Additionally, a safer version of fopen() called fopen_s() is also available. That is what I would use in your code if I were you, but I'll leave that up for you to decide and change.
You aren't abiding by common variable naming practices for C by using PascalCase. Most people either write in camelCase or in snake_case.
You probably shouldn't use the read() function anymore; instead prefer fread().

C99 & C11 §7.19

Many implementations of the C runtime environment, most notably the UNIX operating system, provide, aside from the standard I/O library’s fopen(), fclose(), fread(), fwrite(), and fseek(), a set of unbuffered I/O services, open(), close(), read(), write(), and lseek(). The C89 Committee decided not to standardize the latter set of functions.

In addition, buffered I/O is always faster than unbuffered. There are cases where you might want to use unbuffered, such as whenever you want to ensure that the output has been written before continuing. But in this case you will want to use fread().
Declare your function parameters as void if you don't take in any arguments.
```
int main(void)
```
Initialize variables upon declaration when you can.

Question 10

@testcoder If you like them, please consider upvoting and accepting the answer! :)

Question 11

In case of suggestions for getifaddrs, it can be only used for local interfaces, while "/proc/net/arp" has a cache of remote network addresses.The other way will be to send out an ARP request. +1 for strtok and read suggestions

Question 12

@testcoder Understood. I only looked at what the program did and optimized it's functionality. Hopefully I'll be reviewing more of your code in the future.

Question 13

You seem to be implying that using fread is generally better than using read. I strongly disagree with that. They are different. One need to know the difference in order to know which one is most suitable for each situation. On Linux and other POSIX systems, fread is usually implemented as a simple buffering layer on top of read. On such systems the choice of which to use depends on whether you need the buffering layer. If buffering helps, you use fread, if buffering would cause problems, you use read.

Question 14

@kasperd I am not implying, I am stating it as a fact based on tests I've run. In this certain case, it would be better to use fread() instead of read().

Question 15

In addition to @syb0rg's answer:

You do this to copy a string:
```
ret = (char*)malloc(strlen(s) + 1);
memset(ret, 0, strlen(s) + 1);
memcpy(ret, s, strlen(s));
```
Issues:
- This code calls strlen three times while it should really only call it once. The code is not performance critical but you still repeat yourself.
- The idiomatic way to copy a string into another buffer is to use strcpy or strncpy. strncpy is frowned upon because it has inconsistent behaviour regarding handling the terminating 0円.
- The easiest way to clone a string is to use strdup in which case your three lines can be shortened to one:
```
ret = strdup(s);
```
Same holds for cloning the Line_Arg parameter.
Your implementation of readLine is re-inventing the wheel. See getline.
Your parse the line over and over again to extract a specific filed from it. Instead you should break the line up into the individual fields once and return the array of fields. You can then read the field you want from the array.

Question 16

You can use strndup() to create a copy of the string more safely, however the function not standardized. +1 for catching what I missed.

Question 17

@syb0rg: hm, I fail to see why strndup would be any safer in this context. Care to give an example?

Question 18

In this context, perhaps not; I just like forming good habits. I would probably have done ret = strndup(s, strlen(s));, and then the 0円 is added on by the function.

Question 19

I'm not a fan of using a specific function just for the sake of using it. It doesn't make the code better or safer. It's more important to understand the standard library functions - what they do and how they work and what their limitations are - and use the appropriate one in the right place. strdup works just fine in this case and does the right thing. If you don't trust the string (i.e. hacking attempt) then you have already lost when calling strlen.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Accepted Answer · 2014-07-26 08:39:29Z

Concept

I don't believe that there is a cross-platform way to read the ARP cache. From the use of /proc filesystem, I deduce that you are targeting Linux. In Linux, you could read /proc/net/arp as you have done, or run the command ip neigh, which does something similar to your program. (On OS X, you could run arp -a -n instead.)

A comment worth a thousand words

You should include a comment like this in your program:

/**
 * /proc/net/arp looks like this:
 *
 * IP address HW type Flags HW address Mask Device
 * 192.168.12.31 0x1 0x2 00:09:6b:00:02:03 * eth0
 * 192.168.12.70 0x1 0x2 00:01:02:38:4c:85 * eth0
 */

That picture tells me everything I need to know about what you are trying to accomplish — you need hardly explain further. Conversely, not having that picture makes me work a lot harder to reverse-engineer your code.

Minutiae

&Buffer[0] is usually written as Buffer.
You repeat the readLine() call — once before entering the loop, once at the end of the loop. This is a good opportunity to make use of C's support for side-effects.
```
while (0 == (Ret = readLine(Fd, Buffer))) {
 ...
}
```
Since ARP_CACHE is a compile-time constant string, you can just write
```
fprintf(stdout, "Arp Cache: Failed to open file \"" ARP_CACHE "\"\n");
```
instead of doing a %s substitution at runtime.
Reading one byte at a time is wasteful.
Print errors to standard error; don't contaminate standard output.
I suggest naming your variables to be consistent with the /proc/net/arp header fields. For example, device instead of IfaceStr. Also, be consistent with capitalization: count starts with lowercase, which is more common.

Big-picture issues

The function int readLine(int Fd, char *Buffer) is kind of susceptible to buffer overflow. I can deduce that fact from the function signature: you pass a pointer to a buffer without also passing the size, so it seems unlikely that readLine would know how to stop when the buffer fills up. You could hard-code it to use ARP_BUFFER_LEN as a limit, but it would be unfortunate to cripple an otherwise generic function like that. It would be better to pass in the buffer size explicitly. That's a general pattern you'll see in C APIs: a pointer to a buffer is frequently accompanied by the buffer's size.

In practice, /proc/net/arp should never contain a line long enough to overflow a 1024-byte buffer, so you're safe. Still, you should follow idiomatic C conventions.
Once you modify readLine() to take a size parameter, you'll find that you've just reinvented fgets().
In C, passing strings from a function to its caller is usually avoided, since it involves malloc(), which introduces the potential for memory leaks. Rather, have the caller pass a buffer and size, like how fgets() and scanf() work.

The only good reason to return a string that was allocated using malloc() would be to support arbitrary-length results. At first glance, your getField() accomplishes that, as it allocates a buffer as large as strlen(Line_Arg). (You forgot to add a byte to accommodate the terminating NUL character, by the way.) But, that turns out not to be the case, since Line_Arg itself is not a string of arbitrary length. It's either less than ARP_BUFFER_LEN bytes long (if you got "lucky") or a buffer overflow (as discussed in (1) above).
You seem to be working very hard at I/O and string processing. Why not just use fscanf()?

The whole program, then, can be simplified to

#include <stdio.h>
/**
 * Macros to turn a numeric macro into a string literal. See
 * https://gcc.gnu.org/onlinedocs/cpp/Stringification.html
 */
#define xstr(s) str(s)
#define str(s) #s
#define ARP_CACHE "/proc/net/arp"
#define ARP_STRING_LEN 1023
#define ARP_BUFFER_LEN (ARP_STRING_LEN + 1)
/* Format for fscanf() to read the 1st, 4th, and 6th space-delimited fields */
#define ARP_LINE_FORMAT "%" xstr(ARP_STRING_LEN) "s %*s %*s " \
 "%" xstr(ARP_STRING_LEN) "s %*s " \
 "%" xstr(ARP_STRING_LEN) "s"
int main()
{
 FILE *arpCache = fopen(ARP_CACHE, "r");
 if (!arpCache)
 {
 perror("Arp Cache: Failed to open file \"" ARP_CACHE "\"");
 return 1;
 }
 /* Ignore the first line, which contains the header */
 char header[ARP_BUFFER_LEN];
 if (!fgets(header, sizeof(header), arpCache))
 {
 return 1;
 }
 char ipAddr[ARP_BUFFER_LEN], hwAddr[ARP_BUFFER_LEN], device[ARP_BUFFER_LEN];
 int count = 0;
 while (3 == fscanf(arpCache, ARP_LINE_FORMAT, ipAddr, hwAddr, device))
 {
 printf("%03d: Mac Address of [%s] on [%s] is \"%s\"\n",
 ++count, ipAddr, device, hwAddr);
 }
 fclose(arpCache);
 return 0;
}

+1 for the excellent advice on always documenting the expected input format in front of any parse code.

Stack Exchange Network

Parsing ARP cache in C

3 Answers 3

Concept

A comment worth a thousand words

Minutiae

Big-picture issues

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Parsing ARP cache in C

3 Answers 3

Concept

A comment worth a thousand words

Minutiae

Big-picture issues

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions