The task is to merge 2 text files containing sorted integers into one file. Each number in both files is stored in new line(there are no empty lines,and only numbers are stored). Requirements are that if file1
has n numbers and file2
has m numbers, running time must be O(n+m) and memory consumption O(1).
My current solution is based on this algorithm:
while (not end of List A and not end of List B) if (List A current item <= List B current item) output List A current item advance List A index else output List B current item advance List B index // At this point, one of the lists is empty. // Output remaining items from the other while (not end of List A) output List A current item advance List A index while (not end of List B) output List B current item advance List B index
The solution is working fine, I've tested it for different cases.But, I'm looking for you opinion and suggestions, how can I improve the code, since it is quite huge.
Also, can I avoid assigning value to item 1
and item 2
variables at beginning? If I leave them unassigned, even if I pass them value
visual studio is returning error saying that I'm using unassigned variables in while loop
.
public static void Merge(string pathInputFile1, string pathInputFile2, string pathOutputFile)
{
int item1 = 0;
int item2 = 0;
bool endOfFile1 = false;
bool endOfFile2 = false;
string temp;
var inputFile1 = File.OpenText(pathInputFile1);
var inputFile2 = File.OpenText(pathInputFile2);
temp = inputFile1.ReadLine();
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
temp = inputFile2.ReadLine();
if (temp == null)
{
endOfFile2 = true;
}
else
{
item2 = Int32.Parse(temp);
}
while (!endOfFile1 && !endOfFile2)
{
if (item1 < item2)
{
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item1);
}
temp = inputFile1.ReadLine();
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
}
else if (item1 == item2)
{
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item1);
}
temp = inputFile1.ReadLine();
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
temp = inputFile2.ReadLine();
if (temp == null)
{
endOfFile2 = true;
}
else
{
item2 = Int32.Parse(temp);
}
}
else
{
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item2);
}
temp = inputFile2.ReadLine();
if (temp == null)
{
endOfFile2 = true;
}
else
{
item2 = Int32.Parse(temp);
}
}
}
while (!endOfFile1)
{
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item1);
}
temp = inputFile1.ReadLine();
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
}
while (!endOfFile2)
{
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item2);
}
temp = inputFile2.ReadLine();
if (temp == null)
{
endOfFile2 = true;
}
else
{
item2 = Int32.Parse(temp);
}
}
}
}
4 Answers 4
As the others have pointed out, there is a lot of repetition in your code. This is however a great opportunity to threat the files as simple iterators from which you can extract integers. This is done using a simple C# generator.
IEnumerator<int> ToIterator(StreamReader reader)
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return Convert.ToInt32(line);
}
}
Afterwards, simply follow the algorithm you initially described, but substitute List
with an iterator (IEnumerator
in .NET terms):
void MergeIntegersFiles(string source1, string source2, string destination)
{
using (var reader1 = new StreamReader(source1))
{
using (var reader2 = new StreamReader(source2))
{
using (var writer = new StreamWriter(destination))
{
var iterator1 = ToIterator(reader1);
var iterator2 = ToIterator(reader2);
var iterator1StillAvailable = iterator1.MoveNext();
var iterator2StillAvailable = iterator2.MoveNext();
while (iterator1StillAvailable && iterator2StillAvailable)
{
if (iterator1.Current <= iterator2.Current)
{
writer.WriteLine(iterator1.Current);
iterator1StillAvailable = iterator1.MoveNext();
}
else
{
writer.WriteLine(iterator2.Current);
iterator2StillAvailable = iterator2.MoveNext();
}
}
//check which iterator can still provide values
var iteratorRemaining = iterator1StillAvailable
? iterator1
: iterator2StillAvailable ? iterator2 : null;
if (null != iteratorRemaining)
{
do
{
writer.WriteLine(iteratorRemaining.Current);
} while (iteratorRemaining.MoveNext());
}
}
}
}
}
Abstracting further, you create a method which combines arbitrary iterators of sorted values:
IEnumerable<T> GetSortedValues<T>(IEnumerator<T> iterator1, IEnumerator<T> iterator2) where T : IComparable<T>
{
var iterator1StillAvailable = iterator1.MoveNext();
var iterator2StillAvailable = iterator2.MoveNext();
while (iterator1StillAvailable && iterator2StillAvailable)
{
if (iterator1.Current.CompareTo(iterator2.Current) < 1)
{
yield return iterator1.Current;
iterator1StillAvailable = iterator1.MoveNext();
}
else
{
yield return iterator2.Current;
iterator2StillAvailable = iterator2.MoveNext();
}
}
//check which iterator can still provide values
var iteratorRemaining = iterator1StillAvailable
? iterator1
: iterator2StillAvailable ? iterator2 : null;
if (null != iteratorRemaining)
{
do
{
yield return iteratorRemaining.Current;
} while (iteratorRemaining.MoveNext());
}
}
Which leads you to a simple function that just iterates through something created from two stream readers and writer the result back:
void MergeIntegersFiles(string source1, string source2, string destination)
{
using (var reader1 = new StreamReader(source1))
{
using (var reader2 = new StreamReader(source2))
{
using (var writer = new StreamWriter(destination))
{
foreach (var value in GetSortedValues(ToIterator(reader1), ToIterator(reader2)))
{
writer.WriteLine(value);
}
}
}
}
}
-
\$\begingroup\$ I guess I was a notch late but still your approach is not quite the same as mine. Also you meant
IEnumerable
and notIEnumerator
\$\endgroup\$Bruno Costa– Bruno Costa2016年06月10日 16:14:34 +00:00Commented Jun 10, 2016 at 16:14 -
2\$\begingroup\$
yield return
works with bothIEnumerable
andIEnumerator
. If I had returned anIEnumerable
I would have requested it'sIEnumerator
anyway, so I decided to simply return the enumerator. \$\endgroup\$D. Jurcau– D. Jurcau2016年06月10日 16:16:10 +00:00Commented Jun 10, 2016 at 16:16 -
\$\begingroup\$ Nice, A thing that seems for some reason I wasn't aware of. \$\endgroup\$Bruno Costa– Bruno Costa2016年06月10日 16:18:34 +00:00Commented Jun 10, 2016 at 16:18
-
1\$\begingroup\$ I like the iterator approach primarily because it is unit testable; you can pass in any two ordered lists. The only change I would make would be pass the string (filename) into the ToIterator and open the StreamReader inside the ToIterator. That way the caller is not responsible for opening and closing the stream. \$\endgroup\$Gene S– Gene S2016年06月10日 16:48:42 +00:00Commented Jun 10, 2016 at 16:48
-
\$\begingroup\$ You can make the
using
s even prettier by removing the{}
between them. They are not necessary.using
s can be written directly one below the other. \$\endgroup\$t3chb0t– t3chb0t2016年06月14日 12:00:02 +00:00Commented Jun 14, 2016 at 12:00
First some more general remarks on your code not yet identified:
temp = inputFile1.ReadLine(); if (temp == null) { endOfFile1 = true; }
The most reasonable way to tell if there is a new line or not usually comes into this code:
string line;
while ((line = stream.ReadLine()) != null)
Saves space, is readable, is concise. When I first started programming I always had those issues when reading line from files, from the moment I saw that I don't want another thing.
pathInputFile1
We normally suggest to use descriptive names but I admit that file1
is plenty for me.
You are having some trouble because you are doing three different tasks on your algorithm. I can identify at least three steps to achieve a solution for your problem:
- Read the integers from the files
- Merge the integers
- Write the merged integers to a new file
The major problem here is that we may not want to use additional memory, because of that we need to implement all algorithms on a lazily fashion.
public static IEnumerable<int> ReadIntegers(string file)
{
using (var stream = File.OpenText(file))
{
string line;
while ((line = stream.ReadLine()) != null)
{
yield return int.Parse(line);
}
}
}
public static IEnumerable<int> UnionSorted(IEnumerable<int> first, IEnumerable<int> second)
{
var it1 = first.GetEnumerator();
var it2 = second.GetEnumerator();
bool it1HasNext = it1.MoveNext();
bool it2HasNext = it2.MoveNext();
while (it1HasNext && it2HasNext)
{
if (it1.Current > it2.Current)
{
yield return it2.Current;
it2HasNext = it2.MoveNext();
}
else
{
yield return it1.Current;
it1HasNext = it1.MoveNext();
}
}
while (it1.MoveNext())
{
yield return it1.Current;
}
while (it2.MoveNext())
{
yield return it2.Current;
}
}
public static void MergeFiles(string file1, string file2, string ouputFile)
{
var merged = UnionSorted(ReadIntegers(file1), ReadIntegers(file2));
using (var writer = File.AppendText(ouputFile))
{
foreach (var number in merged)
{
writer.WriteLine(number);
}
}
}
Do not open and close the file stream of the output file each time you write one line. Instead open it once at the beginning and close it at the end (open/close a file stream is an expensive operation).
Put all the code in a try finally block. Close all file streams in the finally block to ensure that the streams will be closed if an exception occured.
Use TryParse
instad of Parse
if the format may be invalid.
If the output file already exists, the merged content will be appended (not sure if that is desired)
There are a lot of code fragements that seems to be duplicates. e.g.:
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
You could extract that in a separate method
bool TryGetNextNumber(FileStream strem, out number)
{
number = -1;
var line = inputFile1.ReadLine();
if (line == null) return false;
return int.TryParse(line, out number);
}
Maybe there is also a more elegent solution than the 3 while loops... I have to think about it...
-
3\$\begingroup\$ the two StreamReaders should be included in the single using block, rather than try / finally'ing them \$\endgroup\$Caleth– Caleth2016年06月10日 15:41:43 +00:00Commented Jun 10, 2016 at 15:41
This is used multiple times, so it should be moved to a method:
using (StreamWriter sw = File.AppendText(pathOutputFile))
{
sw.WriteLine(item1);
}
Same for this:
temp = inputFile1.ReadLine();
if (temp == null)
{
endOfFile1 = true;
}
else
{
item1 = Int32.Parse(temp);
}
You should avoid copy-pasting: if you're doing that, chances are you're doing something wrong.
Care about names: temp
is a bad variable name.