I'm trying to become more familiar with C (been working through the MIT Practical Programming in C course and K&R) so decided to write a simple command-line linear interpolator. You can feed it a vector (where missing values are represented by 0n) and it should print an interpolated vector. Any comments on style etc are much appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NULLA "0n"
#define NULLF -1000
#define MAXLEN 1000
int main(int argc, char * argv[])
{
if(argc < 2)
{
printf("missing vector for interpolation\n");
return 1;
}
if(argc - 1 > MAXLEN)
{
printf("vector is too long\n");
return 2;
}
float start, end, delta; //bound values and change for interpolation
int num_nulls = 0; //number of nulls in a given interpolation range
float filled[MAXLEN]; //vector where we will put our values
int i, j; //indices, for argv[] and filled[] vectors, correspondingly
//we want all values in filled to be initalized to zero, to avoid junk vals
for(j = 0; j < MAXLEN; j++)
filled[j] = 0.;
//let's go through any initial nulls, which just get put back in
//to the vector, since we can't interpolate (no starting/ending values yet)
for(i = 1, j = 0; i < argc && strcmp(argv[i], NULLA) == 0; i++)
filled[j++] = NULLF;
//the meat of the interpolations take places in the remaing parts
//of the vector
for( ; i < argc; i++)
{
if(strcmp(argv[i], NULLA) != 0 && num_nulls == 0)
{
start = atof(argv[i]);
filled[j++] = start;
}
else if(strcmp(argv[i], NULLA) != 0 && num_nulls > 0)
{
end = atof(argv[i]);
delta = (end - start) / (1 + num_nulls);
while(num_nulls-- > 0)
{
start+=delta;
filled[j++] = start;
}
filled[j++] = end;
start = end; //we should be ready for another interpolation
num_nulls = 0;
}
else
num_nulls++;
}
//add in any trailing nulls, since interpolation can't be performed
while(num_nulls-- > 0)
filled[j++] = NULLF;
for(i = 0; i < j; i++)
{
if(filled[i] != NULLF)
printf("%.2f ", filled[i]);
else
printf("%s ", NULLA);
}
putchar('\n');
return 0;
}
4 Answers 4
Program Usage:
Your usage string is a little...aenimic. I didn't read the initial
part of your post about needing to use 0n as a "missing" value, and
had to squint a little bit before I figured out what is was doing.
This sort of information should ideally be printed out (either when
the user doesn't input enough arguments, or if a flag is passed;
generally this would be -h
or --help
):
void print_usage()
{
printf("Usage: lin_interp <list of floating point values>\n");
printf("lin_interp linearly interpolates pairwise. To input "
"values to be interpolated, use 0n. For example:\n"
"lin_interp 100 0n 200\n"
"will produce 100.0 150.0 200.0, where 0n has been "
"replaced by the interpolated value.");
exit(1);
}
Initialization:
There's a little trick for array initialization in C that comes in handy. Instead of having to loop over an array, initializing it with the same value:
for(j = 0; j < MAXLEN; j++)
filled[j] = 0.;
You can simply write:
float filled[MAXLEN] = {0.0f};
This will initialize each value of your array to 0.0.
Functions:
When you start writing programs, they're generally quite small, and
a lot of the time everything fits neatly into main
. However, when
you start writing slightly larger programs, this starts to become
inconvenient. It makes your program harder to follow, lets variables
live for longer than they should, and means parts of a program cannot
easily be reused (and further along in your programming career, it
makes things harder to test, too). A large part of programming
is breaking things down into small pieces, writing those small
pieces separately, and composing them together into a solution.
Here, there are a few things that should exist as functions. The
first one we've already shown; print_usage()
. The code that
performs the interpolation should also be inside its own function:
size_t interpolate(float* filled, size_t len)
{
// Interpolation Code
}
Braces:
It's tempting to omit braces when you're only using a single-line command:
for(i = 1, j = 0; i < argc && strcmp(argv[i], NULLA) == 0; i++)
filled[j++] = NULLF;
Although this is a point of some contention, it's (generally) not seen as very good practice, especially in a language like C. It is all too easy for you (or someone else) to come back to the code to make a minor modification, and forget that there are no braces:
int starting_nulls = 0;
for(i = 1, j = 0; i < argc && strcmp(argv[i], NULLA) == 0; i++)
filled[j++] = NULLF;
++starting_nulls; //Uh-oh!
Prefer to wrap even single line statements in { }. It is almost zero effort, and will stop these kinds of mistakes.
const vs #define:
Prefer to use const
instead of #define
whenever you can. Because
of the quirks of C, this is not always possible:
// This has to be a #define to be used in an array length
#define MAXLEN 1000
// These can both be const
const char NULLA[] = "0n"
const float NULLF = -1000.0f;
#define
is a very crude mechanism that can cause a lot of headaches.
A lot of older C code uses it (because there was no other option),
but it is still a good idea to keep its use to a minimum.
-
\$\begingroup\$ +1 good suggestions - I also didn't know about pretty multi-line printing in
printf
in C :) I do have a question though - coming from here, it doesn't seem likereturn 2
actually has any meaning... I know about0
and1
,EXIT_SUCCESS
andEXIT_FAILURE
(fromstdlib.h
), butreturn 2
seems odd. I know this wasn't in your code, but rather the OP's - care to remark on its usage? \$\endgroup\$Chris Cirefice– Chris Cirefice2014年08月18日 14:35:07 +00:00Commented Aug 18, 2014 at 14:35 -
\$\begingroup\$ @ChrisCirefice Any non-zero exit code is taken to mean that the process terminated "abnormally", in general. Technically, to be completely portable, you should use
EXIT_SUCESS
(which is guaranteed to be0
), orEXIT_FAILURE
(which is not guaranteed to be1
- it is an implementation-defined value). The reason other values may be used is to perhaps indicate why the program exited, although this will be completely program specific. \$\endgroup\$Yuushi– Yuushi2014年08月18日 14:50:52 +00:00Commented Aug 18, 2014 at 14:50 -
\$\begingroup\$ @Yuushi Ah, I see. I haven't done too much coding in C/C++, so I've always just used
EXIT_FAILURE
in the event of an abnormal termination. I'll have to look more into those macros, thanks :) \$\endgroup\$Chris Cirefice– Chris Cirefice2014年08月18日 14:54:28 +00:00Commented Aug 18, 2014 at 14:54 -
1\$\begingroup\$ @Chris Cirefice, my goal with other non-zero return values was exactly as Yuushi has mentioned, to provide a value associated with why the program exited. If this were more than a practice program, I'd have included a header file outlining each value and the associated interpretation. Yuushi would this be standard practice? \$\endgroup\$JPC– JPC2014年08月18日 15:52:00 +00:00Commented Aug 18, 2014 at 15:52
The primary shortcoming that occurs to me is in the basic design of the code, particularly the use of a fixed-size array.
I'd use either a VLA (variable length array):
int main(int argc, char **argv) {
float filled[argc] = {0};
...or else (usually preferable) a design that doesn't store all the data at all. When doing your interpolation, you only really need the two values you're interpolating. You can simply walk through the values in argv
, and produce output from them relatively directly, with no need to store all the inputs and results before printing them out.
In particular, when you encounter a "real" number in argv
, you can save it and print out. When you encounter a string of 0n
inputs, you count them until you get to another "real" number. You can then compute all the linearly interpolated values based on the previously saved value and the count of intermediate values you need to produce. Write all them out, and then repeat the cycle by saving and printing out the second real input you received.
This reduces storage from O(N) to O(1) with a small constant factor (you basically only store the start value, end value, and count of intermediate values). This is likely to be negligible if the number of values you process is small, but could obviously be much more significant if the number of values being processed is larger.
You are using the float value -1000.0 as a sentinel to represent null values. In a real project this is a very dangerous design choice! Your program will give unexpected result if -1000.0 is a value coming from the interpolation or from user input.
To overcome this issue you could use two varibles first_valid
and last valid
to mark the interval of non null values.
Otherwise use a FLT_MAX instead of -1000.0 as a sentinel. This is very unlikely to cause problems.
-
1\$\begingroup\$ Sometimes NaN is a good sentinel value. \$\endgroup\$Vi.– Vi.2014年08月18日 13:40:15 +00:00Commented Aug 18, 2014 at 13:40
-
\$\begingroup\$ @Vi I initially wanted to suggest NaN too, however it seems that not every platform supports it. \$\endgroup\$Emanuele Paolini– Emanuele Paolini2014年08月18日 15:15:58 +00:00Commented Aug 18, 2014 at 15:15
-
\$\begingroup\$ Thank you for the suggestion. -1000 was just a toy value, since this code isn't going anywhere and was just meant for practice, but a very valid note nonetheless. Given that NaN is not supported everywhere, is FLT_MAX a standard sentinel? \$\endgroup\$JPC– JPC2014年08月18日 15:47:38 +00:00Commented Aug 18, 2014 at 15:47
Unnecessary return
statements
Return statements with return 1
, return 2
not necessary when you are executing your job inside the main program only.
Use exit(EXIT_SUCCESS)
on success or exit(EXIT_FAILURE)
on failure, defined in <stdlib.h>
.
Variable Initialization
To initialize the filled array, you can use
memset(filled, 0, sizeof(filled));
instead of
for(j = 0; j < MAXLEN; j++)
filled[j] = 0.;
Prefer to use opening and closing braces for even single line if
/else
and for
/while
statements.