I've implemented an external mergesort to sort a file consisting of Java int primitives, however it is horribly slow (fortunately it does at least work).
Very little happens in the sort method: it just recursively calls merge with blockSize doubling each call and swapping input and output files each time.
How could I be losing so much time here?
//Merge stage of external mergesort
//Read from input file, already sorted into blocks of size blockSize
//Write to output file, sorted into blocks of 2*blockSize
public static void merge(String inputFile, String outputFile, long blockSize)
throws IOException
{
//readers for block1/2
FileInputStream fis1 = new FileInputStream(inputFile);
DataInputStream dis1 = new DataInputStream(fis1);
FileInputStream fis2 = new FileInputStream(inputFile);
DataInputStream dis2 = new DataInputStream(fis2);
//writer to output file
FileOutputStream fos = new FileOutputStream(outputFile);
DataOutputStream dos = new DataOutputStream(fos);
// merging 2 sub lists
// go along pairs of blocks in inputFile
// continue until end of input
//initialise block2 at right position
dis2.skipBytes((int) blockSize);
//while we haven't reached the end of the file
while (dis1.available() > 0)
{
// if block1 is last block, copy block1 to output
if (dis2.available() <= 0)
{
while (dis1.available() > 0)
dos.writeInt(dis1.readInt());
break;
}
// if block1 not last block, merge block1 and block2
else
{
long block1Pos = 0;
long block2Pos = 0;
boolean block1Over = false;
boolean block2Over = false;
//data read from each block
int e1 = dis1.readInt();
int e2 = dis2.readInt();
//keep going until fully examined both blocks
while (!block1Over | !block2Over)
{
//copy from block 1 if:
// block1 hasnt been fully examined AND
// block1 element less than block2s OR block2 has been fully examined
while ( !block1Over & ((e1 <= e2) | block2Over) )
{
dos.writeInt(e1); block1Pos += 4;
if (block1Pos < blockSize & dis1.available() > 0)
e1 = dis1.readInt();
else
block1Over = true;
}
//same for block2
while ( !block2Over & ((e2 < e1) | block1Over) )
{
dos.writeInt(e2); block2Pos += 4;
if (block2Pos < blockSize & dis2.available() > 0)
e2 = dis2.readInt();
else
block2Over = true;
}
}
}
// skip to next blocks
dis1.skipBytes((int) blockSize);
dis2.skipBytes((int) blockSize);
}
dis1.close();
dis2.close();
dos.close();
fos.close();
}
2 Answers 2
Steve is absolutely right that adding a buffered layer between the Data input/output streams and the File input/output streams, will make things work a whole lot better. I would also suggest changing to use a try-with-resources system, which will also close, and otherwise manage the files in a better way:
try (DataInputStream dis1 = new DataInputStream(new BufferedInputStream(new FileInputStream(inputFile)));
DataInputStream dis2 = new DataInputStream(new BufferedInputStream(new FileInputStream(inputFile)));
DataOutputStream dos = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(outputFile))); ) {
//initialise block2 at right position
dis2.skipBytes((int) blockSize);
... do other work ...
}
Now, your input/output is buffered, and it's closed cleanly, and there are no leaks. The input/output is also buffered, leading to fewer IO's, and more efficient processing.
This will likely make a huge difference in performance, but, I suspect that using NIO (ByteBuffer) operations (especially with memory-mapped IO) will be faster again. Consider using FileChannel operations that reduce the amount of memory copies that are made of the data in the file.
UPDATE:
I have taken some time to run some tests and use some strategies that I am familiar with from high-performance systems. As I suspected, a FilChannel with Memory-mapped IO is far faster.
On my computer, a file with 400,000 int values takes about 15 seconds to sort using your system. When I used Buffered IO, it took 1.5 seconds (10 times faster).
I then rewrote the system using a couple of tricks:
- use small sorts for blocks of 32 integers.
- then use merging for larger, and larger blocks.
- use memory mapped IO to do the file accesses
The result was a sort in 0.096 seconds, or 150 times faster than your code.
Now, this code is not exactly simple, so, be warned that it is a little obscure.
The first thing I did though, was create a class to abstract away the low-level IO:
private static final class FastFile implements AutoCloseable {
private final Path path;
private final FileChannel channel;
private final long size;
private MappedByteBuffer buffer = null;
private long mapPosition = -1;
public FastFile(final Path folder, final long size) throws IOException {
this.size = size;
this.path = Files.createTempFile(folder, "tmpdata", ".dat");
this.channel = FileChannel.open(path, StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE, StandardOpenOption.READ);
// Create the file with the right size.
channel.write(ByteBuffer.allocate(1), size - 1);
resetFile();
}
@Override
public void close() throws Exception {
channel.force(true);
channel.close();
buffer = null;
}
private final void relocate(final long filepos) throws IOException {
if (filepos >= size) {
throw new IllegalArgumentException("Illegal file position " + filepos + " in file of size " + size);
}
final long mappos = filepos >>> MAPPEDSHIFT;
if (mappos != mapPosition) {
final long pos = mappos << MAPPEDSHIFT;
final long len = Math.min(size - pos, MAPPEDSIZE);
buffer = channel.map(MapMode.READ_WRITE, pos, len);
buffer.load();
System.out.println("Move to position " + pos + " and length " + len);
mapPosition = mappos;
}
}
public void resetFile() throws IOException {
relocate(0);
}
public int getInt(final long intPosition) throws IOException {
final long filepos = intPosition << 2;
final int offset = (int)(filepos & MAPPEDMASK);
relocate(filepos);
return buffer.getInt(offset);
}
public void putInt(final long intPosition, final int intValue) throws IOException {
final long filepos = intPosition << 2;
final int offset = (int)(filepos & MAPPEDMASK);
relocate(filepos);
buffer.putInt(offset, intValue);
}
public void rename(String targetName) throws IOException {
Files.move(path, Paths.get(targetName));
}
public void delete() throws IOException {
Files.delete(path);
}
}
The above class can take a file, and read/write ints at any particular place. It first creates the file, and sets it to be the right size.
It is read/write and random-access. It can write an int at any position. The expensive part of the operation is relocating the buffer, but, that will happen seldom.
Using that file class, I have the following sort code:
private static void mergeSortSmart(String sourceName, String targetName) throws IOException {
long start = System.currentTimeMillis();
Path source = Paths.get(sourceName);
final long size = Files.size(source);
final long intvals = size >>> 2; // number of actual integer values (4 bytes per int).
Path target = Paths.get(targetName).toAbsolutePath();
if (Files.exists(target)) {
Files.delete(target);
}
Path tdir = target.getParent();
FastFile filea = new FastFile(tdir, size);
FastFile fileb = new FastFile(tdir, size);
int blockSize = 32;
// copy the source data to a fast file, but do 32-size block int sorts
// before merge-sorting.
copyAndMicroSort(source, size, filea, blockSize);
// then do iterative merge sorts.
for (long bs = blockSize; bs < size; bs *= 2) {
mergeFast(filea, fileb, intvals, bs);
FastFile tmp = filea;
filea = fileb;
fileb = tmp;
}
// rename the sorted file.
filea.rename(targetName);
// delete the temp file.
fileb.delete();
System.out.printf("Sorted in %.3fs%n", (System.currentTimeMillis() - start)/ 1000.0);
}
The copyAndMicroSort is simple:
private static void copyAndMicroSort(Path source, long size, FastFile filea, final int batchSize) throws IOException {
try (DataInputStream di = new DataInputStream(new BufferedInputStream(new FileInputStream(source.toFile())))) {
long pos = 0;
int cnt = 0;
long vcount = 0;
int[] data = new int[batchSize];
while (pos < size) {
if (cnt == data.length) {
appendSortedInts(data, cnt, filea, vcount);
vcount += cnt;
cnt = 0;
}
data[cnt++] = di.readInt();
pos += 4; // size of int;
}
appendSortedInts(data, cnt, filea, vcount);
}
}
private static void appendSortedInts(final int[] data, final int cnt, final FastFile filea,
final long vcount) throws IOException {
Arrays.sort(data, 0, cnt);
for (int i = 0; i < cnt; i++) {
filea.putInt(vcount + i, data[i]);
}
}
And the individual merge sorts are:
private static void mergeFast(final FastFile infile, final FastFile outfile, final long intCount, final long bs) throws IOException {
long apos = 0;
long bpos = bs;
long outpos = 0;
while (apos < intCount) {
long alimit = Math.min(bpos, intCount);
long blimit = Math.min(alimit + bs, intCount);
while (apos < alimit && bpos < blimit) {
int aval = infile.getInt(apos);
int bval = infile.getInt(bpos);
if (aval <= bval) {
outfile.putInt(outpos++, aval);
apos++;
} else {
outfile.putInt(outpos++, bval);
bpos++;
}
}
while (apos < alimit) {
outfile.putInt(outpos++, infile.getInt(apos++));
}
while (bpos < blimit) {
outfile.putInt(outpos++, infile.getInt(bpos++));
}
apos += bs;
bpos += bs;
}
}
-
\$\begingroup\$ By making the above change, I have changed a 15second sort on 400KB of data, to instead take 2.2 seconds, I expect I can bring it back to less than 0.5 seconds in a bit. \$\endgroup\$rolfl– rolfl2014年08月15日 21:03:42 +00:00Commented Aug 15, 2014 at 21:03
-
\$\begingroup\$ Thanks for this, nice to see it's such an easy change to make. One problem I've noticed is if I restrict the buffer size to a small amount, when blockSize > 2*bufferSize, dis.available() returns 7FFFFFFF after the stream runs out, as opposed to 0 or a negative number. What's the reasoning for this? \$\endgroup\$Alex– Alex2014年08月15日 21:42:49 +00:00Commented Aug 15, 2014 at 21:42
-
\$\begingroup\$ @Alex - good question, and I am not sure. \$\endgroup\$rolfl– rolfl2014年08月15日 21:44:09 +00:00Commented Aug 15, 2014 at 21:44
-
\$\begingroup\$ @Alex - I posted a revised version of my results, sorting very fast indeed now. \$\endgroup\$rolfl– rolfl2014年08月15日 23:32:57 +00:00Commented Aug 15, 2014 at 23:32
-
\$\begingroup\$ Thanks for this code, lots to learn from it. One problem I'm having is MAPPEDMASK, MAPPEDSIZE etc give compile errors; what should I import to have access to them (or what are their values)? \$\endgroup\$Alex– Alex2014年08月16日 16:53:56 +00:00Commented Aug 16, 2014 at 16:53
You'll be able to make a lot of headway on performance simply by adding BufferedInputStream and BufferedOutputStream to your stream chains.
You say that you recursively call merge - but I don't see the recursion. Are you just referring to the looping?
-
\$\begingroup\$ Sorry, misuse of recursively; I meant repeatedly. \$\endgroup\$Alex– Alex2014年08月16日 13:43:29 +00:00Commented Aug 16, 2014 at 13:43
Explore related questions
See similar questions with these tags.
inputfile
? You are not merging anything in the merge method, right? \$\endgroup\$