I have been implementing the merge-sort
algorithm for sorting the data for a long time, each time I use the following source code for implementation.
By secure, I mean, taking care of memory management, loopholes etc.
#include<stdio.h>
#include<stdlib.h>
#include<assert.h>
void mergesort(int*,int,int);
void mergeSequence(int*,int,int,int); //Merge the two arrays
void displaySequence(int*);
int n;
int main()
{
int t,test;
int*num=NULL;
int i;
printf("Enter the number of test cases\n", test);
scanf("%d", &test);
assert(test>0);
for(t=0;t<test;t++)
{
printf("Enter the size of the sequence\n", n);
scanf("%d", &n);
assert(n>0);
num=calloc(sizeof(int),n);
assert(num);
printf("Enter the sequence\n");
for(i=0;i<n;i++)
scanf("%d", (num+i));
printf("Sequence before sorting\n");
displaySequence(num);
mergeSort(num,0,(n-1));
printf("Sequence after sorting\n");
display(num);
free(num);
}
return 0;
}
void displaySequence(int*num)
{
int i;
for(i=0;i<n;i++)
printf("%d ", num[i]);
printf("\n");
}
void mergeSort(int*num,int start,int end)
{
int mid;
mid=start+((end-start)/2);
printf("Mid->%d\n", mid);
if(start<end)
{
mergeSort(num,start,mid);
mergeSort(num,(mid+1),end);
mergeSequence(num,start,mid,end);
}
}
void mergeSequence(int*num,int start,int mid,int end)
{
int n1,n2;
int i,j,k;
n1=(mid-start)+1;
n2=(end-mid);
printf("N1->%d\nN2->%d\n", n1,n2);
int left[n1];
int right[n2];
for(i=0;i<n1;i++)
left[i]=num[start+i];
for(i=0;i<n1;i++)
printf("%d ", left[i]);
printf("\n");
for(j=0;j<n2;j++)
right[j]=num[(mid+1)+j];
for(j=0;j<n2;j++)
printf("%d ", right[j]);
printf("\n");
//Now I have two arrays, and I can now merge them
k=start;
i=0;
j=0;
while((k<=end))
{
if(i==n1)
{
num[k]=right[j];
j++;
}
else if(j==n2)
{
num[k]=left[i];
i++;
}
else if(left[i]<=right[j])
{
num[k]=right[j];
j=j+1;
}
else
{
num[k]=left[i];
i=i+1;
}
k++;
}
displaySequence(num);
}
-
\$\begingroup\$ Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers . Consider posting a follow-up question linking back to this one instead. Thanks! \$\endgroup\$Mast– Mast ♦2019年12月18日 22:27:35 +00:00Commented Dec 18, 2019 at 22:27
4 Answers 4
make the code more compact and secure
Separate sort code from test code
To well address security issues, OP needs to present a clear distinction between what is the sort code and what is not. Put all sort code in a separate .c file.
int vs. size_t
C employs size_t
as the type that is neither to narrow nor too wide for array indexing and sizing.
Although int
is commonly sufficient, to make more secure and work with large data, use size_t
.
Aside from changing int
--> size_t
and "%d"
--> "%zu"
, a singular challenge is to remember size_t
is some unsigned type and code needs to consider that. (size_t)x < 0
is never true.
Beware large VLAs
Security risk: int left[n1];
is subject to stack overflow. Consider *alloc()
instead. Insufficient memory can be detected that way.
Avoid exposing helper functions
mergeSequence()
is not meant to be used outside of mergesort()
. To prevent other code from using this functions, make it static
. Further, put it and mergesort()
in their own .c file.
This increases security as mergeSequence()
then not need to handle all sorts of arguments, just the ones mergesort()
generates.
All warnings enabled?
Save time, enable all warnings.
I'd expect mergeSort(num,0,(n-1));
(upper case S) to warn about function usage before declaration.
void mergesort(int*,int,int);
(lower case S) is declared but not used.
Avoid generic names when a specific type is required
mergeSort()
sorts int
. With a name like mergeSort
, I expected something like qsort()
with its size-of-element parameter and compare function.
Of re-architect your merge sort code to take a generic object type.
void SS_mergeSort(void *base, size_t nmemb, size_t size,
int (*compar)(const void *, const void *));`
Compact by eliminating print out
It is unexpected that a sort routine would do any printing. Remove printf()
and displaySequence()
from merge...()
functions.
Compact left,right code
int left[n1]; int right[n2];
can be int left_right[n1+n2];
and then the copying facilitated with the fast lib function memcpy()
.
memcpy(left_right, num, sizeof left_right);
int *left = left_right;
int *right = left_right + n1;
...
-
\$\begingroup\$ Are there platforms where overflowing program stack is a security bug? On most platforms, it's trapped and merely results in program termination. \$\endgroup\$Toby Speight– Toby Speight2018年10月30日 08:36:42 +00:00Commented Oct 30, 2018 at 8:36
-
1\$\begingroup\$ @TobySpeight Any form of exception is a crack in security. Most platforms in 2018 are embedded processors (billions per year) and which may terminate the program - only to restart it. \$\endgroup\$chux– chux2018年10月31日 13:09:32 +00:00Commented Oct 31, 2018 at 13:09
-
\$\begingroup\$ Stack overflow usually results in a signal, rather than an exception, but that aside, if terminating the program is a security risk, then that speaks more to whatever's running it that the program itself. I still agree that it's something worth avoiding, of course, but disagree on the motivation. \$\endgroup\$Toby Speight– Toby Speight2018年10月31日 13:15:25 +00:00Commented Oct 31, 2018 at 13:15
-
2\$\begingroup\$ Actually, it seems that the question uses the term "security" to refer to what I'd call robustness - i.e. functional correctness in the face of difficult (or even hostile) input, rather than what I'd call security - resistant to attempts to perform unauthorised actions. \$\endgroup\$Toby Speight– Toby Speight2018年10月31日 13:18:01 +00:00Commented Oct 31, 2018 at 13:18
-
1\$\begingroup\$ @TobySpeight A crack in "functional correctness in the face of difficult (or even hostile) input" is often a first step to defeating "resistant to attempts to perform unauthorised actions". \$\endgroup\$chux– chux2018年10月31日 13:21:06 +00:00Commented Oct 31, 2018 at 13:21
Errors
- You use
printf()
to print the number of testcases etc. and give it more arguments than specified in the format string (clang:-Wformat-extra-args
). - You have conflicting types for
mergeSort()
as you declaremergesort()
(lower-cases
!) at the top and the use ofmergeSort()
inmain()
implicitly declaresint mergeSort()
which differs fromvoid mergeSort(int *, int, int)
. - The function
display()
does not exist; it should be calleddisplaySequence()
. - You use assertions to check user input, that's wrong. Assertions will be removed from code when
NDEBUG
is declared and are only used to state and verify assumptions that you already have in the code. Correct use of the assert would be:
Assertions are used to safeguard against eg. future changes where you might forgot that the following code assumesscanf("%d", &test); if (test <= 0) { return 1; } /* We can now assume that test > 0 */ assert (test > 0);
test > 0
.
Good practice
- Declare your module-local functions and variables
static
. - Declare functions taking no arguments as
f(void);
. E.g.f();
specifies that this function "has no parameters", where "parameter" is a restriction on what kind of arguments this function can take. "No parameters" means "no restriction", that is the following is legal as long as the implementation off()
actually uses the argument:
If you instead writef(); g() { f(42); // legal }
f(void);
you provide a function prototype with the parameter "no arguments", ie. this is illegal:f(void); g() { f(42); // illegal }
- Also the C99 feature of mixing declarations and code is useful as it compacts the code more and doesn't make you think "what was that variable used for again?":
int main(void) { /* ... */ for (/* declaration -> */ int t = 0; t < test; t++) { /* ... */ } }
- Don't initialize variables when they don't have a reasonable value yet, eg.
num
. This only stops the compiler from warning you about "use of uninitialized value" but doesn't help against unwanted behavior. But if you use the previous point, you can even initialize the variable directly on declaration - Check the return value of library functions such as
scanf()
:if (scanf("%d", &test) != 1) { return 1; }
- Don't use global variables such as
n
. - Always pass the length of the buffer when passing it around, ie.:
void displaySequence(size_t len, int *num); /* void displaySequence(size_t len, int num[len]); <- Alternative syntax */
- Use
size_t
for amounts of memory etc. - You can concatenate C-string-literals if you just write them after another, making this more readable:
printf("N1->%zu\n" "N2->%zu\n", n1, n2);
- Use
const
where possible. - When allocating, use this:
instead of:int *const num = calloc(sizeof (*num), n);
As it allows you to change the type and the code will not (possibly silently) fail.int *const num = calloc(sizeof (int), n);
- Use
perror
to get localized and standardized error messages (calloc()
settingerrno
is however a POSIX C extension and might not be available on some platforms, cf. comments):int *const num = calloc(sizeof (*num), n); if (!num) { perror("calloc"); return 1; }
- Don't use VLAs; they are dangerous (and IIRC even deprecated). They basically allow an attacker to overflow your stack here, with pretty much no safeguards possible for you.
- Don't mix pointer-arithmetic and array-notation:
scanf("%d", &num[i]); /* was: scanf("%d", num+i); */
Style
Use empty lines to structure your code. Eg. I have:
printf("Enter the sequence\n"); for (size_t i = 0; i < n; i++) { if (scanf("%d", num+i) != 1) { fprintf(stderr, "Invalid input\n"); return 1; } } printf("Sequence before sorting\n"); displaySequence(num, n); mergeSort(num, 0, n-1); printf("Sequence after sorting\n"); displaySequence(num, n); free(num);
- In a similar vein, use horizontal space more. I'm more or less following the Kernel Normal Form (KNF), with spaces after commas, around operators, after keywords, before the pointer indicator, etc. If you do that, you don't need weird things parenthesis
mergeSort(num,0,(n+1))
anymore as visual aid. It's far easier on the eye. - Use braces for
if-else
: it's such a common bug to forget them when expanding the if-statement. Or at least write them in one line otherwise.
Security
This is tricky. In general: Don't trust user input, at all. Everything passed into scanf()
is user-controlled. Currently the user can overflow your stack, try to allocate as much memory as possible (and since you don't check for the success of malloc()
, possibly write memory). As the user also controls the actual sequence, in theory they could even try to write arbitrary stuff into memory.
-
1\$\begingroup\$ I recommend the use of
perror()
, but your use here isn't correct:calloc()
is not specified to seterrno
when it fails. You need to either seterrno
toENOMEM
before callingperror()
, outputstrerror(ENOMEM)
, or print a fixed message.ENOMEM
is specified by POSIX, but isn't part of standard C, so the last option is the portable choice (though the POSIX-specified options may be internationalised, so could be a better option when available). \$\endgroup\$Toby Speight– Toby Speight2019年12月18日 09:24:47 +00:00Commented Dec 18, 2019 at 9:24 -
3\$\begingroup\$ Great answer. Minor quibble: I wouldn’t put parentheses around the argument to
sizeof
unless it’s required (complex expression, or type name). \$\endgroup\$Konrad Rudolph– Konrad Rudolph2019年12月18日 11:34:41 +00:00Commented Dec 18, 2019 at 11:34 -
\$\begingroup\$ @TobySpeight Ah, yeah, I always check against POSIX, since it specifies
calloc
shall setENOMEM
on fail (pubs.opengroup.org/onlinepubs/9699919799), but yes, that is a CX (I'll amend that). \$\endgroup\$ElleJay– ElleJay2019年12月18日 15:07:27 +00:00Commented Dec 18, 2019 at 15:07 -
\$\begingroup\$ @KonradRudolph Thanks! I'm more or less following the KNF there which also specifies
return (1)
(which I deviate from), or more specifically: Parenthesis around all keyword arguments. \$\endgroup\$ElleJay– ElleJay2019年12月18日 15:10:00 +00:00Commented Dec 18, 2019 at 15:10
Always check the result of scanf()
This code optimistically assumes that numeric inputs are successfully parsed. We need to check that the result of scanf()
(the number of conversions successfully parsed and assigned) matches our expectations before using any of the values.
Don't use assert()
for testing input validity
When reading user input, if we wish to check that we have more than 0 tests and more than 0 entries in each test, then assert()
is the wrong tool. Remember that it can be disabled by defining NDEBUG
, and it commonly is, for optimized builds.
Consider also using unsigned types for values that can't be negative.
Always check the result of memory allocation
Allocation functions such as calloc()
can always fail. Again, assert()
is not a robust means to check these.
Unexpected output in the bagging area
It looks like these statements were accidentally left in after a debugging session, and should be removed:
printf("Mid->%d\n", mid);
printf("N1->%d\nN2->%d\n", n1,n2);
No need to compute mid-point at end of recursion
Here, we only need mid
if start<end
:
void mergeSort(int*num,int start,int end) { int mid; mid=start+((end-start)/2); if(start<end) { mergeSort(num,start,mid); mergeSort(num,(mid+1),end); mergeSequence(num,start,mid,end); } }
We can re-write it as
void mergeSort(int *num, int start, int end)
{
if (start < end) {
const int mid = start + (end-start)/2;
mergeSort(num, start, mid);
mergeSort(num, mid+1, end);
mergeSequence(num, start, mid, end);
}
/* else, start==end, and no work to do */
}
-
\$\begingroup\$ Concerning "Don't use assert() for testing input validity". How do you use
assert()
? \$\endgroup\$chux– chux2018年10月29日 22:23:00 +00:00Commented Oct 29, 2018 at 22:23 -
1\$\begingroup\$ @chux Assertions are for enforcing facts that must be true according to the program logic. They are not meant for validating constraints on user input that you hope to be true. \$\endgroup\$200_success– 200_success2018年10月30日 03:17:38 +00:00Commented Oct 30, 2018 at 3:17
-
\$\begingroup\$ Still wondering of an example of how you use
assert()
. \$\endgroup\$chux– chux2018年10月30日 03:27:30 +00:00Commented Oct 30, 2018 at 3:27 -
\$\begingroup\$ You don't use
assert()
at all for tests on user input or other external conditions; you use a definite test that won't be compiled out. Useassert()
only to document things that cannot be false unless there's a program bug. \$\endgroup\$Toby Speight– Toby Speight2018年10月30日 08:30:52 +00:00Commented Oct 30, 2018 at 8:30 -
1\$\begingroup\$ Trivial example:
if (a < 100) return; double b = sqrt(a); assert (b >= 10);
Obviously, it takes more complex code for the assertion to be useful. \$\endgroup\$Toby Speight– Toby Speight2018年10月30日 08:32:32 +00:00Commented Oct 30, 2018 at 8:32
Properly document your code:
Is void mergeSequence(int *values, int start, int mid, int end)
/void mergeSort(int *values, int start, int end)
to merge/sort in ascending or descending order?
Is int end
inclusive or exclusive?
(Considering some mid
will be used to specify the end of some left as well as the start of some right, I think it much simpler to use the [inclusive, exclusive) convention.)
(One thing about declaring variables in the initialisation part of a for loop that larkey did not mention is that the scope of each such variable is restricted to that statement.)
Readable, short source code is easier to grasp.
Procedural abstraction can reduce the amount of code.
You have void displaySequence(int const *num)
, didn't use it in void mergeSequence(int*, int,int, int)
. That may be because you chose to use a global variable n
to communicate the amount of numbers to display - a static one would hardly have been better.
Note how in main()
you print different labels to know what the value-sequences are about: that should probably be a parameter to displaySequence()
as well as the number of values to print.
The while
-loop is funny for more than one thing, starting with not being a for
-loop:
You switch from j++;
to j=j+1;
(same for i
).
Even if boiled down to
for (int k=start, i=0, j=0; k<=end ; k++)
if (i==n1) {
num[k]=right[j++];
} else if(j==n2) {
num[k]=left[i++];
} else if(left[i]<=right[j]) {
num[k]=right[j++];
} else {
num[k]=left[i++];
}
, I can't make up my mind whether I'd rather
- relegate only one part left from this loop (→ bulk copy)
- use a conditional expression so as not to repeat num[k]