I've just finished writing a new compressor called FBC. It uses BWT (from DivSufSort) and a simple arithmetic coder. The arithmetic coder has a fast adaptation rate (1/16th of the error) and uses 14 bits of context, of which 11 bits are history and 3 bits are the position within the current byte. In addition, a transform is applied before the BWT stage to improve the performance of the BWT. The current version is written in VB.net, but I plan on rewriting it in C when I have the time. When I finish the C version, I will also probably release the source code. The 32 bit compile is attached.
Also, just a few notes. I originally wrote my own BWT implementation using both a bucket sort and quick sort, but it was too slow for a public compressor. I also tested using transforms such as SR or MTF between the BWT and arithmetic stages, but they seemed to hurt compression in most cases.
Results. http://mattmahoney.net/dc/text.html#1889
I used a block size of 250 MB. I tried 333333334 and it compressed enwik9 3 MB smaller (185,975,54, but gave an out of memory error half way through decompression, even after a reboot. Test was in 32 bit windows with 3 GB memory.
Thanks for testing it!
On my computer, I couldn't compress with a blocksize of 333333334 even with 4gb of memory. In all likelyhood, it's a problem with DivSufSort's memory allocation. When I write the C version, I'll look into this error in more detail.
I've just finished version 1.1. It fixes the memory errors during decompression and now loads either the 32 bit or 64 bit version of DivSufSort based on the operating system. The algorithm remains unchanged.
I should also note that the transform that is used before the BWT stage is Eugene Shelwein's BWT_reorder_v2. I apply the transform before the BWT, then use the inverse directly afterwards.
The results of the new version on my 64-bit Windows 7 laptop (4gb RAM, 2.33ghz with 4 cores):
Code:FILE BLOCK SIZE COMPRESSED SIZE COMPRESS TIME DECOMPRESS TIME MEMORY enwik8 100000000 22,554,133 41.50 39.74 507843kb enwik9 333333334 185,975,548 451.17 415.08 1647166kb
Thanks. I posted your results. http://mattmahoney.net/dc/text.html#1859
Thanks once again!
I've finished with version 1.2 of FBC. It now has an option to enable the MTF stage and also fixes an error with decompression.
I've just finished coding version 1.3. Version 1.3 now has delta and EXE filters, which can be switched on via compression flags or the auto option. The auto option ('a' flag) attempts to detect the input file type and automatically sets the compression flags for better compression. The auto feature currently supports WAV, BMP and EXE detection. As a note, version 1.3 is not compatible with the other versions, however it should be compatible with all of the versions that follow.
As for the C version, I will begin work on it after more experimentation and when I have the time. I'm still not content with FBC's current performance.
Version 1.3 had an error with the delta filter in rare cases. Please use version 1.3a instead.