1
\$\begingroup\$

I was trying to sort a massive file of successive chars in C. I did some research and found a few file sorting algorithms that look the same. Their main idea is to read an amount of data to memory, sort them using one of classic sort algorithms, write them to a new file, then repeat the process and merge the two files and so on. You can find more here

I tried to make a new algorithm that does not require a lot of memory. I ended up with this code that actually works and is inspired from Bubble sort algorithm:

#include<stdio.h>
int main()
{
char a,b;
FILE *f,*aux;
int sorted;//BOOLEAN
do
{ 
 f = fopen("ltr.txt","r"); //Assuming that the file exists
 aux = fopen("aux.txt","w+");
 a = getc(f);
 sorted = 1;
 while ( (b = getc(f)) != EOF )
 {
 if (b < a) 
 {
 fputc(b, aux);
 sorted = 0;
 }
 else
 {
 fputc(a, aux);
 a = b;
 }
 }
 fputc(a, aux);
 fclose(f);
 fclose(aux);
 remove("ltr.txt");
 rename("aux.txt","ltr.txt");
}while(!sorted);
return 0; //EXIT_SUCCESS
}

The algorithm works but is improvable and can be optimized, however I'm asking for help by reviewing complexity, performance, read/write to disk, disk management, memory management and comparison to other sorting algorithms.

I can list some disadvantages:

  • requires disk space of file_size*2 (can be improved by deleting original char each time we write it to aux.txt)
  • file will be written to disk several times and the original will be deleted
  • execution time looks to be too long (yet I didn't measure it)
user1118321
11.9k1 gold badge20 silver badges46 bronze badges
asked Dec 7, 2016 at 9:07
\$\endgroup\$
1
  • \$\begingroup\$ Welcome to CodeReview! Hope you get some good answers. \$\endgroup\$ Commented Dec 7, 2016 at 10:38

2 Answers 2

1
\$\begingroup\$

If you are certain that you only have standard ASCII characters in the file, it's more efficient to just map the whole space and print it out with a counting sort.

The idea is that you use the character itself (a number between 0 and 255) as the index of an array, count how many of those are in the file, and then write the array starting from the beginning.

This is some sample code. Please note that I cannot test it right now, but it should give you a basic idea on how to do it.

#include <stdio.h>
int main()
{
 //You may want to consider a 'long long' type
 long char_count[256];
 FILE *input_file, *output_file;
 input_file = fopen("ltr.txt","r"); //Assuming that the file exists
 output_file = fopen("aux.txt","w+");
 memset(char_count, -1, 256);
 char input_char;
 while (input_char = getc(input_file)) {
 char_count[input_char] = char_count[input_char] > 0 ? (char_count[input_char] + 1) : 1;
 }
 int index;
 for (index = 0; index < 255; index++) {
 if (char_count[input_char] > 0) {
 int char_index;
 //Can be optimized by building a format string and using fprintf
 for (char_index = 0; char_index < char_count[input_char]; char_index++) {
 fputc(char_count[input_char], output_file);
 }
 }
 }
 fclose(output_file);
 fclose(input_file);
 return 0;
}
answered Dec 7, 2016 at 9:50
\$\endgroup\$
0
\$\begingroup\$

You can do the merge sort without using too much memory by reducing the initial block sizes.

However with 64 bit you can just mmap the entire file into memory and sort it just like an array and let the OS deal with pulling in and flushing out blocks as you access them.

If you do wish to continue with the bubble like sort then I would suggest keeping more values in memory to reduce the number of passes you need to do. You can do this by keeping a min heap of a max size:

while ( (b = getc(f)) != EOF )
{
 if(heapSize(heap) < MAX_HEAP_SIZE){
 addToHeap(heap, b);
 } else if(peekMin(heap) > b){
 fputc(b, aux);
 } else {
 fputc(peekMin(heap), aux);
 replaceMin(heap, b);
 }
}
while(!heapEmpty(heap)){
 fputc(popMin(heap), aux);
}

For a stopping condition you can use a simple counter because you know that you only need fileSize/MAX_HEAP_SIZE +1 passes over the file to sort it.

answered Dec 7, 2016 at 11:01
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.