Merge function in C

Question 1

The task was as follows:

Write a function that will merge the contents of two sorted (ascending order) arrays of type double values, storing the result in an array output parameter (still in ascending order). The function should not assume that both its input parameter arrays are the same length but can assume that one array does not contain two copies of the same value. The result array should also contain no duplicate values.

Hint: When one of the input arrays has been exhausted, do not forget to copy the remaining data in the other array into the result array. Test your function with cases in which (1) the first array is exhausted first, (2) the second array is exhausted first, and (3) the two arrays are exhausted at the same time (i.e., they end with the same value). Remember that the arrays input to this function must already be sorted.

Please rate my solution.

I would have worked with "sizeof", but this concept has not been introduced in the book from where this task is taken.

#include <stdio.h>
#define LENGTH1 3
#define LENGTH2 4
int merge(int *output_array, int* array1, int* array2, int length1, int length2); //return counter
int main()
{
 int array1[LENGTH1] = {-2, -1, 0},
 array2[LENGTH2] = {0, 3, 4, 6},
 array3[LENGTH1+LENGTH2] = {0};
 int counter = merge(array3, array1, array2, LENGTH1, LENGTH2);
 for(int i = 0; i < counter; i++)
 printf("[%d]%d\n", i, array3[i]);
 return 0;
}
int merge(int *output_array, int *array1, int *array2, int length1, int length2)
{
 int i = 0, //array1 counter
 j = 0, //array2 counter
 k = 0, //output counter
 counter = 0;
 while(i < length1 && j < length2) //break when either array is fully checked
 {
 if(array1[i] < array2[j])
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 else if(array1[i] == array2[j]) //if element is same move to the next element in both arrays
 {
 output_array[k] = array1[i];
 i++;
 j++;
 k++;
 counter++;
 }
 else
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 }
 while(i >= length1 && j < length2) //enter if second array is not fully checked
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 while(j >= length2 && i < length1) //enter if first array is not fully checked
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 return counter;
}

Question 2

Re: "I would have worked with "sizeof", but this concept has not been introduced in the book from where this task is taken." ==> I failed to understand this. How do you find sizeof useful here? To eliminate the LENGTH macros? Or do you mean something else?

Question 3

If these are truly arrays of double and not int, consider the pitfalls of assuming equality between them. For example: floating-point-gui.de/errors/comparison It may be useful for your test cases to include floating points rather than integers.

Question 4

The implementation does not conform to the specification:

The specification explicitly mentioned "arrays of type double values", yet you have used int everywhere.

Extract the common code out of the branches:

In merge(), these statements:

k++;
counter++;

run in every branch. Consider taking them out and executing them unconditionally.

Place `main()` at the end of the source file:

This has the benefit that a separate prototype need not be provided for all the functions before main().

Array declarations:

You need not specify the size when an initializer-list is present. Let the compiler determine the number of elements.
Whilst it is always a good practice to initialize your variables, zeroing an array, especially a large one, is of no benefit when the next thing the program does it overwrite it.

I'd also call it size, and not LENGTH. LENGTH is usually associated with strings, not arrays (Okay, John, strings are char arrays, but you get the point). As an example, C++'s std::array have a size member function, but no length member function.

Choose more descriptive identifiers:

int counter = merge(array3, array1, array2, LENGTH1, LENGTH2);

What is counter? What is a counter of? As merge returns the size of the final sorted array, why not name it to final_size, total_size, total_elems, merged_count, or something similar? As is, counter is too vague.

In merge(), you have k and counter, which both hold the same value. Eliminate one of them, and name it to one of the afore-mentioned identifiers.

Use standard exit status codes:

Prefer using EXIT_SUCCESS and EXIT_FAILURE from <stdlib.h> instead of magic numbers like 0, 1, -1, et cetera.

Use `const` for arguments that are not modified:

int merge(int *output_array, int* array1, int* array2, int length1, int length2); //return counter

is written better as:

int merge(int *output_array, const int* array1, const int* array2, int length1, int length2); //return counter

This signals that array1 and array2 are not modified by this function. Adding the const qualifier to length1 and length2 is not useful (some people prefer to albeit, I am not looking for a debate) as these parameter are copies, and any modification made to them will not be reflected in the calling code. In other words, const is pointless when the argument is passed by value since you will not be modifying the caller's object.

But now the line has become too long and the comment (which is not really useful) is not visible.

We could rewrite it as:

/* Returns the size of the final array. */
int merge(int *output_array, 
 const int* array1, 
 const int* array2, 
 int length1, 
 int length2);

But why does this return an int? The specification does not mention anything about a return value.

Use `static` keyword for array parameters:

In C99 and above, you can do array[static size] in function parameters to specify that array must have at least size items.

A compiler has then the right to assume that argument is not nullptr and therefore it could perform some optimizations.

For this program:

int main(void)
{
 merge(10, nullptr, 20, nullptr, nullptr);
 int array1[5];
 int array2[10];
 int array3[12];
 merge(10, array1, 15, array2, array3);
 int *p = nullptr;
 merge(5, array1, 10, array2, p);
}

GCC 14.1 outputs:

<source>: In function 'main':
<source>:60:5: warning: argument 2 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:60:5: warning: argument 4 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:60:5: warning: argument 5 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:66:5: warning: 'merge' reading 40 bytes from a region of size 20 [-Wstringop-overread]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:66:5: note: referencing argument 2 of type 'const int[]'
<source>:66:5: warning: 'merge' reading 60 bytes from a region of size 40 [-Wstringop-overread]
<source>:66:5: note: referencing argument 4 of type 'const int[]'
<source>:66:5: note: referencing argument 5 of type 'int[]'
<source>:4:8: note: in a call to function 'merge'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:69:5: warning: argument 5 null where non-null expected [-Wnonnull]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:66:5: warning: 'array1' may be used uninitialized [-Wmaybe-uninitialized]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 2 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:62:9: note: 'array1' declared here
 62 | int array1[5];
 | ^~~~~~
<source>:66:5: warning: 'array2' may be used uninitialized [-Wmaybe-uninitialized]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 4 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:63:9: note: 'array2' declared here
 63 | int array2[10];
 | ^~~~~~
<source>:69:5: warning: 'array1' may be used uninitialized [-Wmaybe-uninitialized]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 2 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:62:9: note: 'array1' declared here
 62 | int array1[5];
 | ^~~~~~
<source>:69:5: warning: 'array2' may be used uninitialized [-Wmaybe-uninitialized]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 4 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:63:9: note: 'array2' declared here
 63 | int array2[10];
 | ^~~~~~
Compiler returned: 0

whilst Clang 18.1.0 outputs:

<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:5:24: note: callee declares array parameter as static here
 5 | const int array1[static size1],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:7:24: note: callee declares array parameter as static here
 7 | const int array2[static size2],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:8:18: note: callee declares array parameter as static here
 8 | int output_array[static size1 + size2]) 
 | ^ ~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
Compiler returned: 0

and x64-64 icx 202400 outputs:

<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:5:24: note: callee declares array parameter as static here
 5 | const int array1[static size1],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:7:24: note: callee declares array parameter as static here
 7 | const int array2[static size2],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:8:18: note: callee declares array parameter as static here
 8 | int output_array[static size1 + size2]) 
 | ^ ~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
Compiler returned: 0

GCC detects all 3 wrong use cases, whilst Clang and ICX only detect the first one. A win for GCC.

MSVC does not support this C99 feature (who is suprised?), but MinGW gcc 13.1.0 and MinGW clang 16.0.2 have the same output as GCC and Clang as shown above.

Moreover, this also documents the function better. As the array must have at least size items, it also means that a nullptr is not a valid value for this argument.

It also eliminates the need of having to document that the function would invoke undefined behavior if nullptrs are passed in.

Note that this feature is not usable for functions like memcpy because arrays of void are illegal, or for return values. A GCC extension, __attribute__((nonnull)) allows parameters to be marked as non-null. This extension is also supported by Intel's compiler and Clang. A Clang extension allows _Nullable, _Nonnull and _Null_unspecified. This extension is not supported by GCC. A GCC extension, __attribute__((returns_nonnull)) can be used to specify that a function returns a nonnull value. This extension is also supported by Clang and Intel's compiler. I await an _Optional type in the next revision of the C Standard.

After adding static, our final prototype becomes:

/* Returns the size of the final merged array. */
size_t merge(size_t size1,
 const int array1[static size1],
 size_t size2,
 const int array2[static size2],
 int output_array[static size1 + size2]);

Note that I have renamed length1 and length2 to size1 and size2, and have also changed their type to size_t, because array sizes are not signed. I have also changed the return type to size_t to be consistent.

output_array has space for at least size1 + size2 for the case that there are no duplicates in the sorted arrays.

I'd also replace the while loops:

while(i >= length1 && j < length2) //enter if second array is not fully checked
{
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
}
while(j >= length2 && i < length1) //enter if first array is not fully checked
{
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
}

with for loops:

for (; j < length2; ++j, ++k, ++counter) //enter if second array is not fully checked
{
 output_array[k] = array2[j];
}
for (; i < length1; ++i, ++k, ++counter) //enter if first array is not fully checked
{
 output_array[k] = array1[i];
}

I find that this expresses the intent more clearly now, and the code is also less verbose.

Though, i, j, and k do not hold any meaning to me. These too should be replaced with descriptive identifiers.

Question 5

merge should return a size_t too, for consistency.

Question 6

@MatthieuM. Code updated.

Question 7

I would move the length/size parameters to be just after their corresponding arrays instead of at the end for easier reading of the code. I might also change the returned size to be an out parameter using pointer indirection to make it more uniform.

Question 8

Apparently, I mentally checked out when I got to the static section as I've not seen that C99 language feature before. No, your last function prototype looks looks good. I was just recommending that they be bundled together unlike the prototype offered just before that section. You've already applied my comment as I intended if not wrote. :-)

Question 9

Note that MSVC does not support [static N] array parameters, and GCC, Clang and ICX all interpret [N] array parameters as meaning N or more, so making that change would only make the code less portable. You might define AT_LEAST to expand to static, conditionally.

Question 10

General Comments

It would be helpful if you mentioned what book you are using, and which version of C you are targeting. It might also be helpful if you mentioned whether you were programming on Linux or Windows.

If the program were taking user input, the input arrays should be sorted before they are merged. You would want a function for array sorting.

The program might be easier to read, write and maintain if the merge and the sorting were separate. The merge could be performed first and then the sort could be performed after, these would be two separate functions. In this instance the performance might be better as well.

Variable Declarations

A recommended programming practice to declare the variable as needed. In C the language doesn't provide a default initialization of the variable so variables should be initialized as part of the declaration. For readability and maintainability each variable should be declared and initialized on its own line. The method of variable declarations that you are currently using where the line ends in a comma rather than a semicolon will make the code more difficult to maintain.

C Array Declarations

There is no need to use a size in C array variable declarations if the array is being assigned values in the declaration. In this particular instance there is no need to assign all zeros to the output array since all values will be over written.

Instead of:

 int array1[LENGTH1] = {-2, -1, 0},
 array2[LENGTH2] = {0, 3, 4, 6},
 array3[LENGTH1+LENGTH2] = {0};

This might be better:

 double input_1[] = {-2, -1, 0};
 double input_2[] = {0, 3, 4, 6};
 double output[LENGTH1+LENGTH2];

Code Organization

Function prototypes are very useful in large programs that contain multiple source files, and that in case they will be in header files. In a single file program like this it is better to put the main() function at the bottom of the file and all the functions that get used in the proper order above main(). Keep in mind that every line of code written is another line of code where a bug can crawl into the code.

Question 11

Re: "merge, then sort"... Will then have to scan & remove possible duplicates from combined array. OP's problem statement suggests both the "pre-merge" arrays contain sorted and unique values.

Question 12

Re: "array declarations". OP's version ensures all 3 arrays are compatible. Suggestion to "let compiler count elements" (using arr[ ] = ..., then hardwired tokens LENGTHx risks a mismatch that could lead to buffer overrun and the lawless wilds known as UB. Also, the cost of initialising the destination array is nothing when compared to the cost of NOT practising the habit of initialising variables when they're declared. Bad recommendation, imo...

Question 13

@Fe2O3 This problem would be more real world if sizeof() was used to determine the length of each input array and then calloc() sufficient space. A better test of the code would be if the input array's were unsorted. The comma separated declarations are a bug waiting to happen. The code is already not using at least one location in the output array due. I agree that there is the possibility of overflow in my suggestion, but not as the code was written.

Question 14

In fact, it's all a bit of a minefield. OP's version (arr[ LEN ] = ...) risks careless definition of too few initialisers in one/both arrays. Result: multiple padding zeros, breaking the rule that there are no duplicate values in source arrays AND all array values are in ascending order... Part of the lot of a beginner is to have to "step on the Lego pieces scattered on the floor in the dark" just like all who've gone before... Cheers! :-)

Question 15

The task was to merge the contents of two sorted arrays of type double. In your solution you use int type.

You also use a variable counter, but in your code k and counter always have the same value. So counter is not useful, just return k.

i, j & k is for sure short name, but is kinda hard to follow. You also know length is always >= 0, so you can use unsigned int or size_t for type. As say by pacmaninbw, is recommended to use size_t because he will have the proper word size for the system and that will improve performance.

In

while(i >= length1 && j < length2)

and

while(j >= length2 && i < length1)

You don't need to check if i >= length1 or j >= length2 cause that always the case if the other's condition (j < length2 or i < length1) is true.

Question 16

The code is clear overall. The comments are quite well used, which is good. Yet, a few things can be improved.

First, if your manual has already covered const, you can use that on most of variables.
The macros are not well suited for this use. Don't get me wrong, they work fine here. A array1_length int variable makes more sense, since you pass the lengths to your merge function. Also the LENGTH macros refer to the size of object within the code which can lead to more errors when you modify it. Macros are better suited for values that are quite independent (error codes for example).
Are you already familiar with continue concept? If so you should them in your while loop, it simplifies a little the code because you avoid the else. It is also clearer for the reader that there is nothing happening after the if block is completed.

Question 17

~~(削除) Advice I - a correctness bug (削除ここまで)~~

If your merge works on source array with duplicate array components, it will omit (some?) duplicates. (See Advice III.)

Advice II - idiomatic C

I suggest you define all the array indices as of type size_t. Also, you could switch the "lonely" tail while loops to memcpy. Finally, the array lengths may be passed via size_t.

~~(削除) Advice III - alternative implementation (削除ここまで)~~


#include <stdio.h>
#include <string.h>
#define LENGTH1 3
#define LENGTH2 4
void merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2);
int main()
{
 int array1[LENGTH1] = { 0, 2, 4 },
 array2[LENGTH2] = { 1, 3, 4, 5 },
 array3[LENGTH1 + LENGTH2]; // No need to initialize to zero all the array components: it will be overwritten by the merve_v2.
 merge_v2(array3, array1, array2, LENGTH1, LENGTH2);
 puts("merge_v2 result:");
 for (int i = 0; i < LENGTH1 + LENGTH2; i++)
 printf("[%d]%d\n", i, array3[i]);
 puts("merge result:");
 int array4[LENGTH1] = { 0, 2, 4 },
 array5[LENGTH2] = { 1, 3, 4, 5 },
 array6[LENGTH1 + LENGTH2];
 
 int cnt = merge(array6, array4, array5, LENGTH1, LENGTH2);
 for (int i = 0; i < cnt; i++)
 printf("[%d]%d\n", i, array6[i]);
 return 0;
}
void merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2)
{
 size_t target_index = 0;
 size_t source_index_1 = 0;
 size_t source_index_2 = 0;
 while (source_index_1 != len1 && source_index_2 != len2)
 {
 // Stable sort, but does not make difference for ints:
 target_array[target_index++] = 
 source_array_2[source_index_2] < source_array_1[source_index_1] ?
 source_array_2[source_index_2++] :
 source_array_1[source_index_1++];
 }
 memcpy(&target_array[target_index], &source_array_1[source_index_1], sizeof(int) * (len1 - source_index_1));
 memcpy(&target_array[target_index], &source_array_2[source_index_2], sizeof(int) * (len2 - source_index_2));
}
int merge(int* output_array, int* array1, int* array2, int length1, int length2)
{
 int i = 0, //array1 counter
 j = 0, //array2 counter
 k = 0, //output counter
 counter = 0;
 while (i < length1 && j < length2) //break when either array is fully checked
 {
 if (array1[i] < array2[j])
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 else if (array1[i] == array2[j]) //if element is same move to the next element in both arrays
 {
 output_array[k] = array1[i];
 i++;
 j++;
 k++;
 counter++;
 }
 else
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 }
 while (i >= length1 && j < length2) //enter if second array is not fully checked
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 while (j >= length2 && i < length1) //enter if first array is not fully checked
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 return counter;
}

Which prints:

merge_v2 result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]4
[6]5
merge result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5

Edit

After being notified about the specs of this post, I had to rework my answer.

Advice IV - better naming

Instead of i, j and k, I would deploy target_index, source_index_1, source_index_2.

Advice V - size_t

I would declare the return type of merge as size_t. Also, I would define the arguments int length1 and int length2 as size_t len1 and size_t len2.

Advice VI - alternative implementation: take two

#include <stdio.h>
#include <string.h>
#define LENGTH1 3
#define LENGTH2 4
size_t merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2);
int main()
{
 int array1[LENGTH1] = { 0, 2, 4 },
 array2[LENGTH2] = { 1, 3, 4, 5 },
 array3[LENGTH1 + LENGTH2]; // No need to initialize to zero all the array components: it will be overwritten by the merve_v2.
 size_t counter_v2 = merge_v2(array3, array1, array2, LENGTH1, LENGTH2);
 puts("merge_v2 result:");
 for (int i = 0; i < counter_v2; i++)
 printf("[%d]%d\n", i, array3[i]);
 puts("merge result:");
 int array4[LENGTH1] = { 0, 2, 4 },
 array5[LENGTH2] = { 1, 3, 4, 5 },
 array6[LENGTH1 + LENGTH2] = { 0 };
 
 int cnt = merge(array6, array4, array5, LENGTH1, LENGTH2);
 for (int i = 0; i < cnt; i++)
 printf("[%d]%d\n", i, array6[i]);
 return 0;
}
size_t merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2)
{
 size_t target_index = 0;
 size_t source_index_1 = 0;
 size_t source_index_2 = 0;
 size_t counter = 0;
 while (source_index_1 != len1 && source_index_2 != len2)
 {
 // Stable sort, but does not make difference for ints:
 if (source_array_1[source_index_1] < source_array_2[source_index_2])
 {
 target_array[target_index++] = source_array_1[source_index_1++];
 }
 else if (source_array_1[source_index_1] > source_array_2[source_index_2])
 {
 target_array[target_index++] = source_array_2[source_index_2++];
 }
 else
 {
 target_array[target_index++] = source_array_1[source_index_1++];
 source_index_2++;
 }
 counter++;
 }
 counter += len1 - source_index_1;
 counter += len2 - source_index_2;
 memcpy(&target_array[target_index], &source_array_1[source_index_1], sizeof(int) * (len1 - source_index_1));
 memcpy(&target_array[target_index], &source_array_2[source_index_2], sizeof(int) * (len2 - source_index_2));
 return counter;
}
int merge(int* output_array, int* array1, int* array2, int length1, int length2)
{
 int i = 0, //array1 counter
 j = 0, //array2 counter
 k = 0, //output counter
 counter = 0;
 while (i < length1 && j < length2) //break when either array is fully checked
 {
 if (array1[i] < array2[j])
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 else if (array1[i] == array2[j]) //if element is same move to the next element in both arrays
 {
 output_array[k] = array1[i];
 i++;
 j++;
 k++;
 counter++;
 }
 else
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 }
 while (i >= length1 && j < length2) //enter if second array is not fully checked
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 while (j >= length2 && i < length1) //enter if first array is not fully checked
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 return counter;
}

... which outputs:

merge_v2 result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5
merge result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5

Question 18

You've got a lot of separate branches at separate levels of indentation. Reversing the approach to "fill an array from two sorted arrays" makes this code a lot more readable, in my opinion.

#include <stdio.h>
#include <stdbool.h>
#define LENGTH1 3
#define LENGTH2 4
int merge(int *it_out, const int* it1, const int* it2, const int length1, const int length2); //return counter
int main()
{
 const int array1[LENGTH1] = {-2, -1, 0};
 const int array2[LENGTH2] = {0, 3, 4, 6};
 int array3[LENGTH1+LENGTH2] = {0};
 int counter = merge(array3, array1, array2, LENGTH1, LENGTH2);
 for(int i = 0; i < counter; i++)
 printf("[%d]%d\n", i, array3[i]);
 return 0;
}
int merge(int *it_out, const int* it1, const int* it2, const int length1, const int length2)
{
 const int* const it1_end = it1+length1;
 const int* const it2_end = it2+length2;
 
 int* const it_out_end = it_out + length1 + length2;
 
 while (true) {
 const bool it1_has_value = it1 != it1_end;
 const bool it2_has_value = it2 != it2_end;
 if (!it1_has_value && !it2_has_value) {
 break;
 }
 if (!it1_has_value) {
 *it_out = *it2;
 it2++;
 }
 else if (!it2_has_value) {
 *it_out = *it1;
 it1++;
 }
 else if (*it1 == *it2) {
 *it_out = *it1;
 it1++;
 it2++;
 }
 else if (*it1 < *it2) {
 *it_out = *it1;
 it1++;
 }
 else {
 *it_out = *it2;
 it2++;
 }
 it_out++;
 }
 
 return it_out_end - it_out;
}

The other comments of actually using double and using size_t still stand.

Also, addding some consts, especially in your function signature, can help catching errors later. It will help you make sure you don't accidentally alter input arrays.

Question 19

This solution doesn't eliminate duplicates. I think it's also quite efficient in the case that one array is consumed long before the other array. It does a lot of work that is repetitive once one array is consumed.

Question 20

@Sinc thanks, I misread the no duplicates. Fixed that. And yes, it does a lot of extra work if the array lengths are heavily skewed. But often that matters way less than the readability, especially when people start out in this field. Prevent premature optimization.

Question 21

Y'know Daniel... Two things... 1) Not many readers actually copy/paste/compile the code posted as answers. Sometimes I do. This "fixed" code produces a very unremarkable result... You may wish to try it yourself. 2) This is "Code Review", not "Stack Overflow". The focus is to review the OP's code, providing guidance; not a forum to post your own version of a solution to the OP's problem statement. Sorry about my DV, but, y'know...

Question 22

Overall, I would rate your solution as adequate - and certainly good for someone who has not yet reached sizeof. This is a question I use in job interviews and in a short, stressful period, something of about your level is the most that can be expected even from fairly experienced programmers.

Harith has already made many good comments, though I disagree about using static as that makes the code no longer valid as C++. I find that it is common to use both languages and therefore handy if your simple utilities are compatible with both. But I would put the array and its size adjacent in the parameter list as they go together. And do not replace "length" with "size". The number of items is a length, the storage space they occupy (usually in bytes) is a size. Many bugs have been caused by confusing these two. It is perfectly fine and idiomatic to return 0 from main().

Naming is important. Each name should be a good mnemonic (in contrast to being an attempt to avoid having to look at the declaration or documentation). In this code array1 can be considerably improved. It tells us that it is an array, which is not useful as when we see in each use (e.g. array1[i]) that it is obviously an array. What is much more useful is a reminder of its purpose, so the names of the arrays in merge() could be in1, in2, and out.

It's fine to use a very short name like i in a very short area of code (such as you have main()), but in merge(), when we see it being used it does not really remind us of its meaning. I would suggest pos1, pos2, and outpos instead of i, j, and k. The name counter is too generic. What is it counting? It would be better named something like numout or even nout as using 'n' like that is a common idiom. Of course, in this case (as another answer has mentioned) it is redundant as it always hold the same value as k.

A valuable principle is Don't Repeat Yourself (DRY). In the first while loop, the code tests whether both arrays have elements, and when that test fails, these tests get repeated afterwards. It would probably be better to write the loop as a forever (for (;;)) loop with clauses to cover the possibilities. This gives five cases in the loop instead of three.

Moving on to more advanced matters, since this is written in C, it can use the language more idiomatically to make the code shorter and clearer. Instead of

 out[outpos] = in1[in1pos];
 in1pos++;
 outpos++;

we can write:

 out[outpos++] = in1[in1pos++];

This has the added advantage in this function that the case of consuming from both arrays stands out more as the other cases do not have isolated increments.

And for even more advanced use of C, remember that, while the data is stored in an array, the merge() function gets pointers which are variables and can be advanced as elements are consumed. A first attempt at this would replace:

 out[outpos++] = in1[in1pos++];

with

 *out++ = *in1++;
 --length1;
 ++outlen;

This removes the need for separate variable to hold the current position in each array, but leaves pairs of variables (e.g. out and outlen) which must always be modified together and it is easy to forget one when doing the other. This can be avoided by using only pointers. At the start of the function declare:

 int *in1end = in1 + length1;

and replace the test for exhaustion:

 if (in1pos < length1) {

with

 if (in1 < in1end) {

In this case, if you want to return the length of the result (which you do, despite it not being in the requirements, since it's a quantity that cannot be otherwise known to the caller), you need to keep a record of the initial value of out. Maybe make the parameter be called output and declare:

 int *out = output; // current position in the output

returning:

 return out - output;

Putting everything together, gives the following:

#include <stddef.h>
#include <stdio.h>
#define LENGTH1 3
#define LENGTH2 4
size_t merge(int *out, const int *in1, size_t len1, const int *in2, size_t len2);
int main()
{
 int test_in1[LENGTH1] = {-2, -1, 0},
 test_in2[LENGTH2] = {0, 3, 4, 6},
 test_out[LENGTH1+LENGTH2] = {0};
 int outlen = merge(test_out, test_in1, LENGTH1, test_in2, LENGTH2);
 for(int i = 0; i < outlen; i++) {
 printf("[%d]%d\n", i, test_out[i]);
 }
 return 0;
}
size_t merge(int *output, const int *in1, size_t len1, const int *in2, size_t len2)
{
 int *out = output; // current position in output
 const int *const in1end = in1 + len1;
 const int *const in2end = in2 + len2;
 for (;;) {
 if (in1 == in1end) {
 while (in2 < in2end) *out++ = *in2++;
 return out - output;
 } else if (in2 == in2end) {
 while (in1 < in1end) *out++ = *in1++;
 return out - output;
 } else if (*in1 < *in2) {
 *out++ = *in1++;
 } else if (*in1 > *in2) {
 *out++ = *in2++;
 } else { // identical elements
 *out++ = *in1++;
 in2++;
 }
 }
}

Question 23

Curious why you included <stddef.h>. C99 (N1256), C11 (N1570) and C17 (N2176) section 7.21.1 all confirm that stdio.h declares size_t.

Question 24

For C++ and MSVC, one can define AT_LEAST to expand to static, conditionally, as suggested by @Davislor under my answer.

Question 25

stddef,h was originally instinct, but on reflection, if you put the merge function first in the file (with stddef included) it is self-contained. Then you can append #ifdef TEST, the stdio include, and main() with the test cases, giving you a minimal utility when built normally, but a complete self testing unit when built with -DTEST

Madagascar Madagascar 10.1k1 gold badge16 silver badges51 bronze badges · Answer 1 · 2024-05-27 16:45:09Z

The implementation does not conform to the specification:

The specification explicitly mentioned "arrays of type double values", yet you have used int everywhere.

Extract the common code out of the branches:

In merge(), these statements:

k++;
counter++;

run in every branch. Consider taking them out and executing them unconditionally.

Place `main()` at the end of the source file:

This has the benefit that a separate prototype need not be provided for all the functions before main().

Array declarations:

You need not specify the size when an initializer-list is present. Let the compiler determine the number of elements.
Whilst it is always a good practice to initialize your variables, zeroing an array, especially a large one, is of no benefit when the next thing the program does it overwrite it.

I'd also call it size, and not LENGTH. LENGTH is usually associated with strings, not arrays (Okay, John, strings are char arrays, but you get the point). As an example, C++'s std::array have a size member function, but no length member function.

Choose more descriptive identifiers:

int counter = merge(array3, array1, array2, LENGTH1, LENGTH2);

What is counter? What is a counter of? As merge returns the size of the final sorted array, why not name it to final_size, total_size, total_elems, merged_count, or something similar? As is, counter is too vague.

In merge(), you have k and counter, which both hold the same value. Eliminate one of them, and name it to one of the afore-mentioned identifiers.

Use standard exit status codes:

Prefer using EXIT_SUCCESS and EXIT_FAILURE from <stdlib.h> instead of magic numbers like 0, 1, -1, et cetera.

Use `const` for arguments that are not modified:

int merge(int *output_array, int* array1, int* array2, int length1, int length2); //return counter

is written better as:

int merge(int *output_array, const int* array1, const int* array2, int length1, int length2); //return counter

This signals that array1 and array2 are not modified by this function. Adding the const qualifier to length1 and length2 is not useful (some people prefer to albeit, I am not looking for a debate) as these parameter are copies, and any modification made to them will not be reflected in the calling code. In other words, const is pointless when the argument is passed by value since you will not be modifying the caller's object.

But now the line has become too long and the comment (which is not really useful) is not visible.

We could rewrite it as:

/* Returns the size of the final array. */
int merge(int *output_array, 
 const int* array1, 
 const int* array2, 
 int length1, 
 int length2);

But why does this return an int? The specification does not mention anything about a return value.

Use `static` keyword for array parameters:

In C99 and above, you can do array[static size] in function parameters to specify that array must have at least size items.

A compiler has then the right to assume that argument is not nullptr and therefore it could perform some optimizations.

For this program:

int main(void)
{
 merge(10, nullptr, 20, nullptr, nullptr);
 int array1[5];
 int array2[10];
 int array3[12];
 merge(10, array1, 15, array2, array3);
 int *p = nullptr;
 merge(5, array1, 10, array2, p);
}

GCC 14.1 outputs:

<source>: In function 'main':
<source>:60:5: warning: argument 2 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:60:5: warning: argument 4 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:60:5: warning: argument 5 null where non-null expected [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:66:5: warning: 'merge' reading 40 bytes from a region of size 20 [-Wstringop-overread]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:66:5: note: referencing argument 2 of type 'const int[]'
<source>:66:5: warning: 'merge' reading 60 bytes from a region of size 40 [-Wstringop-overread]
<source>:66:5: note: referencing argument 4 of type 'const int[]'
<source>:66:5: note: referencing argument 5 of type 'int[]'
<source>:4:8: note: in a call to function 'merge'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:69:5: warning: argument 5 null where non-null expected [-Wnonnull]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: in a call to function 'merge' declared 'nonnull'
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:66:5: warning: 'array1' may be used uninitialized [-Wmaybe-uninitialized]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 2 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:62:9: note: 'array1' declared here
 62 | int array1[5];
 | ^~~~~~
<source>:66:5: warning: 'array2' may be used uninitialized [-Wmaybe-uninitialized]
 66 | merge(10, array1, 15, array2, array3);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 4 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:63:9: note: 'array2' declared here
 63 | int array2[10];
 | ^~~~~~
<source>:69:5: warning: 'array1' may be used uninitialized [-Wmaybe-uninitialized]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 2 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:62:9: note: 'array1' declared here
 62 | int array1[5];
 | ^~~~~~
<source>:69:5: warning: 'array2' may be used uninitialized [-Wmaybe-uninitialized]
 69 | merge(5, array1, 10, array2, p);
 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:4:8: note: by argument 4 of type 'const int[static]' to 'merge' declared here
 4 | size_t merge(size_t size1,
 | ^~~~~
<source>:63:9: note: 'array2' declared here
 63 | int array2[10];
 | ^~~~~~
Compiler returned: 0

whilst Clang 18.1.0 outputs:

<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:5:24: note: callee declares array parameter as static here
 5 | const int array1[static size1],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:7:24: note: callee declares array parameter as static here
 7 | const int array2[static size2],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:8:18: note: callee declares array parameter as static here
 8 | int output_array[static size1 + size2]) 
 | ^ ~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
Compiler returned: 0

and x64-64 icx 202400 outputs:

<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:5:24: note: callee declares array parameter as static here
 5 | const int array1[static size1],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:7:24: note: callee declares array parameter as static here
 7 | const int array2[static size2],
 | ^ ~~~~~~~~~~~~~~
<source>:60:5: warning: null passed to a callee that requires a non-null argument [-Wnonnull]
 60 | merge(10, nullptr, 20, nullptr, nullptr);
 | ^ ~~~~~~~
<source>:8:18: note: callee declares array parameter as static here
 8 | int output_array[static size1 + size2]) 
 | ^ ~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
Compiler returned: 0

GCC detects all 3 wrong use cases, whilst Clang and ICX only detect the first one. A win for GCC.

MSVC does not support this C99 feature (who is suprised?), but MinGW gcc 13.1.0 and MinGW clang 16.0.2 have the same output as GCC and Clang as shown above.

Moreover, this also documents the function better. As the array must have at least size items, it also means that a nullptr is not a valid value for this argument.

It also eliminates the need of having to document that the function would invoke undefined behavior if nullptrs are passed in.

Note that this feature is not usable for functions like memcpy because arrays of void are illegal, or for return values. A GCC extension, __attribute__((nonnull)) allows parameters to be marked as non-null. This extension is also supported by Intel's compiler and Clang. A Clang extension allows _Nullable, _Nonnull and _Null_unspecified. This extension is not supported by GCC. A GCC extension, __attribute__((returns_nonnull)) can be used to specify that a function returns a nonnull value. This extension is also supported by Clang and Intel's compiler. I await an _Optional type in the next revision of the C Standard.

After adding static, our final prototype becomes:

/* Returns the size of the final merged array. */
size_t merge(size_t size1,
 const int array1[static size1],
 size_t size2,
 const int array2[static size2],
 int output_array[static size1 + size2]);

Note that I have renamed length1 and length2 to size1 and size2, and have also changed their type to size_t, because array sizes are not signed. I have also changed the return type to size_t to be consistent.

output_array has space for at least size1 + size2 for the case that there are no duplicates in the sorted arrays.

I'd also replace the while loops:

while(i >= length1 && j < length2) //enter if second array is not fully checked
{
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
}
while(j >= length2 && i < length1) //enter if first array is not fully checked
{
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
}

with for loops:

for (; j < length2; ++j, ++k, ++counter) //enter if second array is not fully checked
{
 output_array[k] = array2[j];
}
for (; i < length1; ++i, ++k, ++counter) //enter if first array is not fully checked
{
 output_array[k] = array1[i];
}

I find that this expresses the intent more clearly now, and the code is also less verbose.

Though, i, j, and k do not hold any meaning to me. These too should be replaced with descriptive identifiers.

I would move the length/size parameters to be just after their corresponding arrays instead of at the end for easier reading of the code. I might also change the returned size to be an out parameter using pointer indirection to make it more uniform.
Apparently, I mentally checked out when I got to the static section as I've not seen that C99 language feature before. No, your last function prototype looks looks good. I was just recommending that they be bundled together unlike the prototype offered just before that section. You've already applied my comment as I intended if not wrote. :-)
Note that MSVC does not support [static N] array parameters, and GCC, Clang and ICX all interpret [N] array parameters as meaning N or more, so making that change would only make the code less portable. You might define AT_LEAST to expand to static, conditionally.

pacmaninbw ♦pacmaninbw 26.1k13 gold badges47 silver badges114 bronze badges · Answer 2 · 2024-05-27 15:34:41Z

General Comments

It would be helpful if you mentioned what book you are using, and which version of C you are targeting. It might also be helpful if you mentioned whether you were programming on Linux or Windows.

If the program were taking user input, the input arrays should be sorted before they are merged. You would want a function for array sorting.

The program might be easier to read, write and maintain if the merge and the sorting were separate. The merge could be performed first and then the sort could be performed after, these would be two separate functions. In this instance the performance might be better as well.

Variable Declarations

A recommended programming practice to declare the variable as needed. In C the language doesn't provide a default initialization of the variable so variables should be initialized as part of the declaration. For readability and maintainability each variable should be declared and initialized on its own line. The method of variable declarations that you are currently using where the line ends in a comma rather than a semicolon will make the code more difficult to maintain.

C Array Declarations

There is no need to use a size in C array variable declarations if the array is being assigned values in the declaration. In this particular instance there is no need to assign all zeros to the output array since all values will be over written.

Instead of:

 int array1[LENGTH1] = {-2, -1, 0},
 array2[LENGTH2] = {0, 3, 4, 6},
 array3[LENGTH1+LENGTH2] = {0};

This might be better:

 double input_1[] = {-2, -1, 0};
 double input_2[] = {0, 3, 4, 6};
 double output[LENGTH1+LENGTH2];

Code Organization

Function prototypes are very useful in large programs that contain multiple source files, and that in case they will be in header files. In a single file program like this it is better to put the main() function at the bottom of the file and all the functions that get used in the proper order above main(). Keep in mind that every line of code written is another line of code where a bug can crawl into the code.

Re: "merge, then sort"... Will then have to scan & remove possible duplicates from combined array. OP's problem statement suggests both the "pre-merge" arrays contain sorted and unique values.
Re: "array declarations". OP's version ensures all 3 arrays are compatible. Suggestion to "let compiler count elements" (using arr[ ] = ..., then hardwired tokens LENGTHx risks a mismatch that could lead to buffer overrun and the lawless wilds known as UB. Also, the cost of initialising the destination array is nothing when compared to the cost of NOT practising the habit of initialising variables when they're declared. Bad recommendation, imo...
@Fe2O3 This problem would be more real world if sizeof() was used to determine the length of each input array and then calloc() sufficient space. A better test of the code would be if the input array's were unsorted. The comma separated declarations are a bug waiting to happen. The code is already not using at least one location in the output array due. I agree that there is the possibility of overflow in my suggestion, but not as the code was written.
In fact, it's all a bit of a minefield. OP's version (arr[ LEN ] = ...) risks careless definition of too few initialisers in one/both arrays. Result: multiple padding zeros, breaking the rule that there are no duplicate values in source arrays AND all array values are in ascending order... Part of the lot of a beginner is to have to "step on the Lego pieces scattered on the floor in the dark" just like all who've gone before... Cheers! :-)

rate the solution rate the solution 612 bronze badges · Answer 3 · 2024-05-27 12:18:18Z

The task was to merge the contents of two sorted arrays of type double. In your solution you use int type.

You also use a variable counter, but in your code k and counter always have the same value. So counter is not useful, just return k.

i, j & k is for sure short name, but is kinda hard to follow. You also know length is always >= 0, so you can use unsigned int or size_t for type. As say by pacmaninbw, is recommended to use size_t because he will have the proper word size for the system and that will improve performance.

In

while(i >= length1 && j < length2)

and

while(j >= length2 && i < length1)

You don't need to check if i >= length1 or j >= length2 cause that always the case if the other's condition (j < length2 or i < length1) is true.

Winux Winux 411 bronze badge · Answer 4 · 2024-05-27 12:23:55Z

The code is clear overall. The comments are quite well used, which is good. Yet, a few things can be improved.

First, if your manual has already covered const, you can use that on most of variables.
The macros are not well suited for this use. Don't get me wrong, they work fine here. A array1_length int variable makes more sense, since you pass the lengths to your merge function. Also the LENGTH macros refer to the size of object within the code which can lead to more errors when you modify it. Macros are better suited for values that are quite independent (error codes for example).
Are you already familiar with continue concept? If so you should them in your while loop, it simplifies a little the code because you avoid the else. It is also clearer for the reader that there is nothing happening after the if block is completed.

coderodde coderodde 31.8k15 gold badges77 silver badges202 bronze badges · Answer 5 · 2024-05-28 10:33:26Z

~~(削除) Advice I - a correctness bug (削除ここまで)~~

If your merge works on source array with duplicate array components, it will omit (some?) duplicates. (See Advice III.)

Advice II - idiomatic C

I suggest you define all the array indices as of type size_t. Also, you could switch the "lonely" tail while loops to memcpy. Finally, the array lengths may be passed via size_t.

~~(削除) Advice III - alternative implementation (削除ここまで)~~


#include <stdio.h>
#include <string.h>
#define LENGTH1 3
#define LENGTH2 4
void merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2);
int main()
{
 int array1[LENGTH1] = { 0, 2, 4 },
 array2[LENGTH2] = { 1, 3, 4, 5 },
 array3[LENGTH1 + LENGTH2]; // No need to initialize to zero all the array components: it will be overwritten by the merve_v2.
 merge_v2(array3, array1, array2, LENGTH1, LENGTH2);
 puts("merge_v2 result:");
 for (int i = 0; i < LENGTH1 + LENGTH2; i++)
 printf("[%d]%d\n", i, array3[i]);
 puts("merge result:");
 int array4[LENGTH1] = { 0, 2, 4 },
 array5[LENGTH2] = { 1, 3, 4, 5 },
 array6[LENGTH1 + LENGTH2];
 
 int cnt = merge(array6, array4, array5, LENGTH1, LENGTH2);
 for (int i = 0; i < cnt; i++)
 printf("[%d]%d\n", i, array6[i]);
 return 0;
}
void merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2)
{
 size_t target_index = 0;
 size_t source_index_1 = 0;
 size_t source_index_2 = 0;
 while (source_index_1 != len1 && source_index_2 != len2)
 {
 // Stable sort, but does not make difference for ints:
 target_array[target_index++] = 
 source_array_2[source_index_2] < source_array_1[source_index_1] ?
 source_array_2[source_index_2++] :
 source_array_1[source_index_1++];
 }
 memcpy(&target_array[target_index], &source_array_1[source_index_1], sizeof(int) * (len1 - source_index_1));
 memcpy(&target_array[target_index], &source_array_2[source_index_2], sizeof(int) * (len2 - source_index_2));
}
int merge(int* output_array, int* array1, int* array2, int length1, int length2)
{
 int i = 0, //array1 counter
 j = 0, //array2 counter
 k = 0, //output counter
 counter = 0;
 while (i < length1 && j < length2) //break when either array is fully checked
 {
 if (array1[i] < array2[j])
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 else if (array1[i] == array2[j]) //if element is same move to the next element in both arrays
 {
 output_array[k] = array1[i];
 i++;
 j++;
 k++;
 counter++;
 }
 else
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 }
 while (i >= length1 && j < length2) //enter if second array is not fully checked
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 while (j >= length2 && i < length1) //enter if first array is not fully checked
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 return counter;
}

Which prints:

merge_v2 result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]4
[6]5
merge result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5

Edit

After being notified about the specs of this post, I had to rework my answer.

Advice IV - better naming

Instead of i, j and k, I would deploy target_index, source_index_1, source_index_2.

Advice V - size_t

I would declare the return type of merge as size_t. Also, I would define the arguments int length1 and int length2 as size_t len1 and size_t len2.

Advice VI - alternative implementation: take two

#include <stdio.h>
#include <string.h>
#define LENGTH1 3
#define LENGTH2 4
size_t merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2);
int main()
{
 int array1[LENGTH1] = { 0, 2, 4 },
 array2[LENGTH2] = { 1, 3, 4, 5 },
 array3[LENGTH1 + LENGTH2]; // No need to initialize to zero all the array components: it will be overwritten by the merve_v2.
 size_t counter_v2 = merge_v2(array3, array1, array2, LENGTH1, LENGTH2);
 puts("merge_v2 result:");
 for (int i = 0; i < counter_v2; i++)
 printf("[%d]%d\n", i, array3[i]);
 puts("merge result:");
 int array4[LENGTH1] = { 0, 2, 4 },
 array5[LENGTH2] = { 1, 3, 4, 5 },
 array6[LENGTH1 + LENGTH2] = { 0 };
 
 int cnt = merge(array6, array4, array5, LENGTH1, LENGTH2);
 for (int i = 0; i < cnt; i++)
 printf("[%d]%d\n", i, array6[i]);
 return 0;
}
size_t merge_v2(int* target_array, int* source_array_1, int* source_array_2, size_t len1, size_t len2)
{
 size_t target_index = 0;
 size_t source_index_1 = 0;
 size_t source_index_2 = 0;
 size_t counter = 0;
 while (source_index_1 != len1 && source_index_2 != len2)
 {
 // Stable sort, but does not make difference for ints:
 if (source_array_1[source_index_1] < source_array_2[source_index_2])
 {
 target_array[target_index++] = source_array_1[source_index_1++];
 }
 else if (source_array_1[source_index_1] > source_array_2[source_index_2])
 {
 target_array[target_index++] = source_array_2[source_index_2++];
 }
 else
 {
 target_array[target_index++] = source_array_1[source_index_1++];
 source_index_2++;
 }
 counter++;
 }
 counter += len1 - source_index_1;
 counter += len2 - source_index_2;
 memcpy(&target_array[target_index], &source_array_1[source_index_1], sizeof(int) * (len1 - source_index_1));
 memcpy(&target_array[target_index], &source_array_2[source_index_2], sizeof(int) * (len2 - source_index_2));
 return counter;
}
int merge(int* output_array, int* array1, int* array2, int length1, int length2)
{
 int i = 0, //array1 counter
 j = 0, //array2 counter
 k = 0, //output counter
 counter = 0;
 while (i < length1 && j < length2) //break when either array is fully checked
 {
 if (array1[i] < array2[j])
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 else if (array1[i] == array2[j]) //if element is same move to the next element in both arrays
 {
 output_array[k] = array1[i];
 i++;
 j++;
 k++;
 counter++;
 }
 else
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 }
 while (i >= length1 && j < length2) //enter if second array is not fully checked
 {
 output_array[k] = array2[j];
 j++;
 k++;
 counter++;
 }
 while (j >= length2 && i < length1) //enter if first array is not fully checked
 {
 output_array[k] = array1[i];
 i++;
 k++;
 counter++;
 }
 return counter;
}

... which outputs:

merge_v2 result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5
merge result:
[0]0
[1]1
[2]2
[3]3
[4]4
[5]5

Daniël van den Berg Daniël van den Berg 3131 silver badge9 bronze badges · Answer 6 · 2024-05-28 06:22:12Z

You've got a lot of separate branches at separate levels of indentation. Reversing the approach to "fill an array from two sorted arrays" makes this code a lot more readable, in my opinion.

#include <stdio.h>
#include <stdbool.h>
#define LENGTH1 3
#define LENGTH2 4
int merge(int *it_out, const int* it1, const int* it2, const int length1, const int length2); //return counter
int main()
{
 const int array1[LENGTH1] = {-2, -1, 0};
 const int array2[LENGTH2] = {0, 3, 4, 6};
 int array3[LENGTH1+LENGTH2] = {0};
 int counter = merge(array3, array1, array2, LENGTH1, LENGTH2);
 for(int i = 0; i < counter; i++)
 printf("[%d]%d\n", i, array3[i]);
 return 0;
}
int merge(int *it_out, const int* it1, const int* it2, const int length1, const int length2)
{
 const int* const it1_end = it1+length1;
 const int* const it2_end = it2+length2;
 
 int* const it_out_end = it_out + length1 + length2;
 
 while (true) {
 const bool it1_has_value = it1 != it1_end;
 const bool it2_has_value = it2 != it2_end;
 if (!it1_has_value && !it2_has_value) {
 break;
 }
 if (!it1_has_value) {
 *it_out = *it2;
 it2++;
 }
 else if (!it2_has_value) {
 *it_out = *it1;
 it1++;
 }
 else if (*it1 == *it2) {
 *it_out = *it1;
 it1++;
 it2++;
 }
 else if (*it1 < *it2) {
 *it_out = *it1;
 it1++;
 }
 else {
 *it_out = *it2;
 it2++;
 }
 it_out++;
 }
 
 return it_out_end - it_out;
}

The other comments of actually using double and using size_t still stand.

Also, addding some consts, especially in your function signature, can help catching errors later. It will help you make sure you don't accidentally alter input arrays.

This solution doesn't eliminate duplicates. I think it's also quite efficient in the case that one array is consumed long before the other array. It does a lot of work that is repetitive once one array is consumed.
@Sinc thanks, I misread the no duplicates. Fixed that. And yes, it does a lot of extra work if the array lengths are heavily skewed. But often that matters way less than the readability, especially when people start out in this field. Prevent premature optimization.
Y'know Daniel... Two things... 1) Not many readers actually copy/paste/compile the code posted as answers. Sometimes I do. This "fixed" code produces a very unremarkable result... You may wish to try it yourself. 2) This is "Code Review", not "Stack Overflow". The focus is to review the OP's code, providing guidance; not a forum to post your own version of a solution to the OP's problem statement. Sorry about my DV, but, y'know...

c19 c19 1143 bronze badges · Answer 7 · 2024-06-02 20:48:01Z

Overall, I would rate your solution as adequate - and certainly good for someone who has not yet reached sizeof. This is a question I use in job interviews and in a short, stressful period, something of about your level is the most that can be expected even from fairly experienced programmers.

Harith has already made many good comments, though I disagree about using static as that makes the code no longer valid as C++. I find that it is common to use both languages and therefore handy if your simple utilities are compatible with both. But I would put the array and its size adjacent in the parameter list as they go together. And do not replace "length" with "size". The number of items is a length, the storage space they occupy (usually in bytes) is a size. Many bugs have been caused by confusing these two. It is perfectly fine and idiomatic to return 0 from main().

Naming is important. Each name should be a good mnemonic (in contrast to being an attempt to avoid having to look at the declaration or documentation). In this code array1 can be considerably improved. It tells us that it is an array, which is not useful as when we see in each use (e.g. array1[i]) that it is obviously an array. What is much more useful is a reminder of its purpose, so the names of the arrays in merge() could be in1, in2, and out.

It's fine to use a very short name like i in a very short area of code (such as you have main()), but in merge(), when we see it being used it does not really remind us of its meaning. I would suggest pos1, pos2, and outpos instead of i, j, and k. The name counter is too generic. What is it counting? It would be better named something like numout or even nout as using 'n' like that is a common idiom. Of course, in this case (as another answer has mentioned) it is redundant as it always hold the same value as k.

A valuable principle is Don't Repeat Yourself (DRY). In the first while loop, the code tests whether both arrays have elements, and when that test fails, these tests get repeated afterwards. It would probably be better to write the loop as a forever (for (;;)) loop with clauses to cover the possibilities. This gives five cases in the loop instead of three.

Moving on to more advanced matters, since this is written in C, it can use the language more idiomatically to make the code shorter and clearer. Instead of

 out[outpos] = in1[in1pos];
 in1pos++;
 outpos++;

we can write:

 out[outpos++] = in1[in1pos++];

This has the added advantage in this function that the case of consuming from both arrays stands out more as the other cases do not have isolated increments.

And for even more advanced use of C, remember that, while the data is stored in an array, the merge() function gets pointers which are variables and can be advanced as elements are consumed. A first attempt at this would replace:

 out[outpos++] = in1[in1pos++];

with

 *out++ = *in1++;
 --length1;
 ++outlen;

This removes the need for separate variable to hold the current position in each array, but leaves pairs of variables (e.g. out and outlen) which must always be modified together and it is easy to forget one when doing the other. This can be avoided by using only pointers. At the start of the function declare:

 int *in1end = in1 + length1;

and replace the test for exhaustion:

 if (in1pos < length1) {

with

 if (in1 < in1end) {

In this case, if you want to return the length of the result (which you do, despite it not being in the requirements, since it's a quantity that cannot be otherwise known to the caller), you need to keep a record of the initial value of out. Maybe make the parameter be called output and declare:

 int *out = output; // current position in the output

returning:

 return out - output;

Putting everything together, gives the following:

#include <stddef.h>
#include <stdio.h>
#define LENGTH1 3
#define LENGTH2 4
size_t merge(int *out, const int *in1, size_t len1, const int *in2, size_t len2);
int main()
{
 int test_in1[LENGTH1] = {-2, -1, 0},
 test_in2[LENGTH2] = {0, 3, 4, 6},
 test_out[LENGTH1+LENGTH2] = {0};
 int outlen = merge(test_out, test_in1, LENGTH1, test_in2, LENGTH2);
 for(int i = 0; i < outlen; i++) {
 printf("[%d]%d\n", i, test_out[i]);
 }
 return 0;
}
size_t merge(int *output, const int *in1, size_t len1, const int *in2, size_t len2)
{
 int *out = output; // current position in output
 const int *const in1end = in1 + len1;
 const int *const in2end = in2 + len2;
 for (;;) {
 if (in1 == in1end) {
 while (in2 < in2end) *out++ = *in2++;
 return out - output;
 } else if (in2 == in2end) {
 while (in1 < in1end) *out++ = *in1++;
 return out - output;
 } else if (*in1 < *in2) {
 *out++ = *in1++;
 } else if (*in1 > *in2) {
 *out++ = *in2++;
 } else { // identical elements
 *out++ = *in1++;
 in2++;
 }
 }
}

Curious why you included <stddef.h>. C99 (N1256), C11 (N1570) and C17 (N2176) section 7.21.1 all confirm that stdio.h declares size_t.
For C++ and MSVC, one can define AT_LEAST to expand to static, conditionally, as suggested by @Davislor under my answer.
stddef,h was originally instinct, but on reflection, if you put the merge function first in the file (with stddef included) it is self-contained. Then you can append #ifdef TEST, the stdio include, and main() with the test cases, giving you a minimal utility when built normally, but a complete self testing unit when built with -DTEST

Stack Exchange Network

Merge function in C

7 Answers 7

The implementation does not conform to the specification:

Extract the common code out of the branches:

Place `main()` at the end of the source file:

Array declarations:

Choose more descriptive identifiers:

Use standard exit status codes:

Use `const` for arguments that are not modified:

Use `static` keyword for array parameters:

General Comments

Variable Declarations

C Array Declarations

Code Organization

Edit

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Merge function in C

7 Answers 7

The implementation does not conform to the specification:

Extract the common code out of the branches:

Place main() at the end of the source file:

Array declarations:

Choose more descriptive identifiers:

Use standard exit status codes:

Use const for arguments that are not modified:

Use static keyword for array parameters:

General Comments

Variable Declarations

C Array Declarations

Code Organization

Edit

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Place `main()` at the end of the source file:

Use `const` for arguments that are not modified:

Use `static` keyword for array parameters: