FirstFirst ... 17 25 26 28 29 ... LastLast
Results 781 to 810 of 885
  1. 20th July 2016, 12:50 #781
    Programmer Bulat Ziganshin's Avatar
    Join Date
    Mar 2007
    Location
    Uzbekistan
    Posts
    4,739
    Thanks
    866
    Thanked 789 Times in 424 Posts
    As I already said, caches (by their name) are transparent to assembler (and therefore any higher-language) programs - they just transparently manages some part of memory contents. "volatile" specifier in C/C++ just tells the compiler to never "cache" this variable data in registers - any read or write should go right to memory.

    For example, sequence "X=I; X++" for usual variable may generate asm commands

    load I to register
    increment register by 1
    store register to X

    while for volatile X, each read/write of X should be executed explicitly, therefore code should be

    load I to register
    store register to X
    load X to register
    increment register by 1
    store register to X

    Also, writes to volatile vars aren't reordered by compiler, given guarantees that in "queue[128] = c; ptr = 128", ptr will not be updated before c is saved.

    Of course, it was useless for single-threaded programs, except for device I/O. Later, caches was added, but they are transparent to single-threaded programs except for improving performance. CPU has privileged management commands that allow OS to disable caching for parts of memory space, which is used for device I/O. Later, SSE added non-caching store command that may be used instead.


    The next part of story is multi-threading. The simplest approach to it is reusing volatile vars, that was made to work on x86 by implementing "full cache coherency", i.e. ensuring that all cores will see memory writes in exactly the same order. This requires deep cooperation between caches (individual to each core) and therefore quite expensive. The alternative approach, implemented in some non-x86 systems is to let the programmer explicitly state when it needs to synchronize cache state between cores - memory fences.

    So, for portability to non-x86 of your program you may use volatile vars for queue+ptr, and issue memory fences in-between:
    writer: queue[128] = c; WRITE_FENCE(); ptr = 128
    reader: if (ptr == 128 ) {READ_FENCE(); с = queue[128]...}

    or you can use atomic ptr reads/writes, that includes required memory fences. I'm not sure about queue array, though. Either compiler should be smart enough to push out any vars out of registers prior to atomic_write, or you still need "volatile". It's better to ask nemequ, who is definitely expert in that area.

  2. 21st July 2016, 08:13 #782
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    If I understand things correctly volatile is not necessary and an atomic store with memory order release is sufficient to ensure all writes in the current thread are visible to a thread that acquires the same atomic.

    I had forgotten about a board I designed about 10 years ago that used a ColdFire processor where the software engineers couldn't get the peripherals working correctly. When investigating the problem we eventually figured out that we had to explicitly disable the cache at the peripheral addresses to get the code to work right.

    I now have code working with acceptable performance - it's even a little faster than the released code. What I didn't realize when I first tried atomic was that just changing a variable to atomic causes the gcc compiler to use atomic loads and stores with sequentially consistent ordering even when the atomic variables were accessed with regular operations. This is why my initial attempts created executables that were a lot slower. After reading nemequ's posts many times and reading about atomics and memory order, it slowly started making sense. Now my development version of GLZAdecode does not use volatile, instead using three atomic uchars (flags) with release/acquire ordering. Hopefully nemequ will take a look when I release the code and let me know if that part looks okay.

  3. 23rd July 2016, 03:24 #783
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Sportman, this should fix the problem.
    Input:
    .glzf file

    Output:
    71,219,529 bytes

    GLZAdecode crash with 247,135,749 bytes at disk so far, last console output:

    ...
    Read 47405254 of 47405254 symbols, start 0.0000
    Common prefix scan 0 - 4892ff, score[0 - 144] = 0.00034 - 0.00000
    1375: 47401692 syms, dict. size 4756088, 16.9815 bits/sym, o0e 100618862 bytes
    Read 47401692 of 47401692 symbols, start 0.0000
    Common prefix scan 0 - 489377, score[0 - 118] = 0.00041 - 0.00000
    1376: 47398508 syms, dict. size 4756188, 16.9827 bits/sym, o0e 100619128 bytes
    Read 47398508 of 47398508 symbols, start 0.0000
    Common prefix scan 0 - 4893db, score[0 - 113] = 0.00019 - 0.00000
    1377: 47394331 syms, dict. size 4756294, 16.9842 bits/sym, o0e 100619413 bytes
    Read 47394331 of 47394331 symbols, start 0.0000
    Common prefix scan 0 - 489445, score[0 - 158] = 0.00024 - 0.00000
    1378: 47391131 syms, dict. size 4756442, 16.9854 bits/sym, o0e 100619821 bytes
    Read 47391131 of 47391131 symbols, start 0.0000
    Common prefix scan 0 - 4894d9, score[0 - 148] = 0.00018 - 0.00000
    1379: 47389018 syms, dict. size 4756585, 16.9863 bits/sym, o0e 100620219 bytes
    Read 47389018 of 47389018 symbols, start 0.0000
    Common prefix scan 0 - 489568, score[0 - 63] = 0.00010 - 0.00000
    1380: 47387516 syms, dict. size 4756648, 16.9868 bits/sym, o0e 100620394 bytes
    Read 47387516 of 47387516 symbols, start 0.0000
    Common prefix scan 0 - 4895a7, score[0 - 119] = 0.00016 - 0.00000
    1381: 47386843 syms, dict. size 4756765, 16.9871 bits/sym, o0e 100620727 bytes
    Read 47386843 of 47386843 symbols, start 0.0000
    Common prefix scan 0 - 48961c, score[0 - 24] = 0.00199 - 0.00000
    1382: 47385806 syms, dict. size 4756789, 16.9875 bits/sym, o0e 100620795 bytes
    Read 47385806 of 47385806 symbols, start 0.0000
    Common prefix scan 0 - 489634, score[0 - 39] = 0.00017 - 0.00000
    1383: 47385590 syms, dict. size 4756828, 16.9876 bits/sym, o0e 100620902 bytes
    Read 47385590 of 47385590 symbols, start 0.0000
    Common prefix scan 0 - 48965b, score[0 - 3] = 0.00002 - 0.00000
    1384: 47385576 syms, dict. size 4756831, 16.9876 bits/sym, o0e 100620910 bytes
    Read 47385576 of 47385576 symbols, start 0.0000
    4756831 grammar productions created in 417590.688 seconds.

    GLZAencode html.txt.glzc html.txt.glze
    cap encoded 1, UTF8 compliant 0
    Read 47385576 symbols including 4756831 definition symbols
    Parsed 23155084 level 0 symbols
    use_mtf 1, mcl 23 mrcl 21
    Encoded 23155084 level 1 symbols
    Reduced 28589 grammar rules
    Compressed file size: 71219529 bytes. 4728243 grammar rules. Grammar size: 473
    28398 symbols
    Grammar encoding time = 17.609 seconds.

    GLZAdecode html.txt.glze html.txt.glzd
    234358102 (then crash)

  4. 23rd July 2016, 05:45 #784
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by Sportman View Post
    GLZAdecode html.txt.glze html.txt.glzd
    234358102 (then crash)
    I think the file is probably still causing an overflow in the dictionary. This is completely solved in my working version. Can you try this version of GLZAdecode to confirm that is the problem?
    Attached Files Attached Files

  5. 23rd July 2016, 11:00 #785
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I think the file is probably still causing an overflow in the dictionary. This is completely solved in my working version. Can you try this version of GLZAdecode to confirm that is the problem?
    Congratulations you solved it! compare ok:

    GLZAdecode html.txt.glze html.txt.glzd
    Decompressed 1431632189 bytes in 10.937 seconds

    Format timing:

    GLZAformat html.txt html.txt.glzf
    Reading 1431632189 byte file
    Converting textual data
    Wrote 1 byte header and 1488089163 data bytes in 7.781 seconds.

  6. Thanks (2):

    Kennon Conrad (23rd July 2016),Paul W. (23rd July 2016)

  7. 23rd July 2016, 22:47 #786
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by Sportman View Post
    Congratulations you solved it! compare ok:

    GLZAdecode html.txt.glze html.txt.glzd
    Decompressed 1431632189 bytes in 10.937 seconds

    Format timing:

    GLZAformat html.txt html.txt.glzf
    Reading 1431632189 byte file
    Converting textual data
    Wrote 1 byte header and 1488089163 data bytes in 7.781 seconds.
    Thank you for the help, Sportman! It's not so easy with files that can't be shared. Here are the results with corrected GLZA:

    Input:
    1,431,632,189 bytes - HTML text

    Output:
    177,899,364 bytes - zstd 22
    172,474,317 bytes - bro 11
    134,856,213 bytes - freearc ultra
    121,396,303 bytes - lzturbo 32
    100,389,150 bytes - bsc 2
    89,429,996 bytes - rar max
    86,881,088 bytes - bce
    79,840,886 bytes - 7z ultra
    71,219,529 bytes - glza

    So compression ratio and decompression time are good but the compression time of almost 5 days is pretty ridiculous. In the long run this can be dramatically improved but will take time to do it properly. For now I think the important part is that using grammar rules instead of byte offsets (LZ77) provides a noticeably better compression ratio on the test file.

  8. 2nd August 2016, 11:15 #787
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts

    GLZA v0.6

    GLZA v0.6 has the following changes compared to v0.5:

    Bug fixes for >1GB files and large dictionaries (Sportman) and GLZAencode crash (rare)
    Fixed capital lock and mtf code errors that caused a slight penalty in compression ratio.
    Extended delta filter to check for strides of up to 100.
    Tweaked a model.

    From nemequ's list:
    Uses atomic variable instead of volatile variables.
    More functions, fewer macros
    Uses C99 types
    Programs print a (small) help screen when no command line arguments are used
    Fixed exit return values

    GLZAcompress is a little slower. GLZAdecode is a little faster. The change is less than 3% in either case.

    Compression ratios are generally similar, on average slightly better for typical benchmark files. The most dramatic change in my test set is with kennedy.xls from Canterbury Corpus:

    on AMD A8-5500 3.2 GHz:
    v0.5: 1,029,744 -> 146,270 bytes in 4.3 seconds, 0.046 seconds to decode.
    v0.6: 1,029,744 -> 23,099 bytes in 7.5 seconds, 0.015 seconds to decode.


    I assume kennedy.xls is the same file that Charles Bloom calls lzt25. This is what he shows on cb's rants (http://cbloom.com/rants.html):

    Found another weird one where RAR filters do magic; lzt25 is super-structured 13-byte structs :
    lzt25.rar,40024 // <- WOW RAR filters!
    lzt25.nz,45397
    lzt25.7z,51942
    lzt25_lp2.7z,52579
    lzt25.LZNA,58903
    lzt25.zl8.LZNA,61582 // <- zl8 LZNA worse than zl6 - weird file
    lzt25.lzx21,63198
    lzt25.zstd060,64550 // <- ZStd does surprisingly well here, I thought you needed more reps on this file
    lzt25.brotli9,67856
    lzt25.Kraken,67986
    lzt25.brotli10,68472 // <- brotli10 worse than brotli9 !
    lzt25.BitKnit,92940 // <- BitKnit oddly struggling
    lzt25.mc-.rar,106423 // <- unfiltered RAR is the worst of the LZ's
    lzt25.z9.zip,209811
    lzt25.lz4xc4,324125
    lzt25.raw,1029744
    Attached Files Attached Files

  9. Thanks (4):

    Paul W. (2nd August 2016),Sportman (2nd August 2016),surfersat (28th September 2016),xezz (4th August 2016)

  10. 2nd August 2016, 21:29 #788
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Quote Originally Posted by Kennon Conrad View Post
    GLZA v0.6
    Input:
    67,111,000 bytes - chrome_child.dll

    Output:
    25,942,102 bytes - bce
    23,401,099 bytes - zstd 22
    23,396,089 bytes - glza
    22,306,031 bytes - bro 11
    22,124,059 bytes - rar max
    19,641,152 bytes - freearc ultra
    18,716,820 bytes - 7z ultra


    Input:
    52,638,664 bytes - xul.dll

    Output:
    20,519,632 bytes - bce
    19,371,390 bytes - zstd 22
    19,007,845 bytes - glza
    18,418,048 bytes - bro 11
    18,311,943 bytes - rar max
    16,294,045 bytes - freearc ultra
    15,455,141 bytes - 7z ultra


    Input:
    50,708,664 bytes - libcef.dll

    Output:
    19,479,330 bytes - bce
    18,675,207 bytes - zstd 22
    18,198,201 bytes - glza
    17,865,765 bytes - bro 11
    17,212,605 bytes - rar max
    15,413,439 bytes - freearc ultra
    14,534,231 bytes - 7z ultra


    Input:
    25,338,368 bytes - icudt54.dll

    Output:
    9,443,665 bytes - glza
    8,226,926 bytes - bce
    7,362,616 bytes - rar max
    7,329,790 bytes - zstd 22
    6,522,117 bytes - bro 11
    6,108,062 bytes - 7z ultra
    5,962,531 bytes - freearc ultra


    Input:
    17,595,072 bytes - pepflashplayer32_22_0_0_192.dll

    Output:
    8,922,602 bytes - bce
    8,079,585 bytes - glza
    7,990,377 bytes - zstd 22
    7,632,298 bytes - bro 11
    7,532,777 bytes - rar max
    6,939,240 bytes - freearc ultra
    6,898,954 bytes - 7z ultra


    Input:
    14,984,992 bytes - QtWebKit4.dll

    Output:
    4,475,340 bytes - bce
    4,085,710 bytes - glza
    4,044,446 bytes - zstd 22
    3,861,756 bytes - bro 11
    3,555,295 bytes - rar max
    3,195,500 bytes - 7z ultra
    3,233,466 bytes - freearc ultra

  11. Thanks:

    Kennon Conrad (3rd August 2016)

  12. 3rd August 2016, 08:51 #789
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    As your results show GLZA is not very effective on most .dll files compares to LZ77 based algorithms. I think it could do considerably better if it had support for production rules that can be used to implement local transformations and support for BCJ filters. These are not things I consider to be high priority but they would be nice additions in the long run.

  13. 3rd August 2016, 10:16 #790
    Member
    Join Date
    Nov 2015
    Location
    ?l?nsk, PL
    Posts
    81
    Thanks
    9
    Thanked 13 Times in 11 Posts
    Quote Originally Posted by Kennon Conrad View Post
    As your results show GLZA is not very effective on most .dll files compares to LZ77 based algorithms. I think it could do considerably better if it had support for production rules that can be used to implement local transformations and support for BCJ filters. These are not things I consider to be high priority but they would be nice additions in the long run.
    Zstd doesn't have an exe filter, yet it looks very close to GLZA.

    Would adding filters inside the algorithm be significantly better than pre-processing with it? I ask because it doesn't seem right to increase the complexity of the algorithm to handle such special cases.

  14. 3rd August 2016, 20:19 #791
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by m^3 View Post
    Zstd doesn't have an exe filter, yet it looks very close to GLZA.

    Would adding filters inside the algorithm be significantly better than pre-processing with it? I ask because it doesn't seem right to increase the complexity of the algorithm to handle such special cases.
    It seems that pre-processing would be better and much easier to implement. I could see where some "skip" rules might be better as part of the main algorithm but for things like BCJ and local delta filters they would be best implemented up front.

  15. 4th August 2016, 15:21 #792
    Member
    Join Date
    Dec 2012
    Location
    japan
    Posts
    216
    Thanks
    35
    Thanked 89 Times in 57 Posts
    Can you write in other langeuage? I hope php or JavaScript or Python.

  16. 5th August 2016, 07:08 #793
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by xezz View Post
    Can you write in other langeuage? I hope php or JavaScript or Python.
    Sorry, I don't know any of those programming languages. If anyone else is interested, I'd certainly be willing to answer questions about the C code.

  17. 10th August 2016, 10:24 #794
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts

    GLZA v0.7

    GLZA v0.7 performance is similar to GLZA v0.6 except I fixed a problem that caused some symbols that appear 16 times in the production rules were not considered for MTF encoding. So for files that use MTF, compressed file sizes are now anywhere from 0.1% better to slightly worse.

    The main change is that GLZA is now one program, GLZA. To compress a file, use GLZA c <infile> <outfile>. To decompress a file, use GLZA d <infile> <outfile>. No user options are currently supported. I want to figure out the .dll thing before I add option passing to the subfunctions.
    Attached Files Attached Files

  18. Thanks (5):

    Bulat Ziganshin (12th August 2016),comp1 (10th August 2016),Nania Francesco (13th August 2016),Sportman (12th August 2016),surfersat (28th September 2016)

  19. 12th August 2016, 00:57 #795
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Input:
    75,481,052 bytes, text file with 4,343,633 unique domains from Wikipedia.

    Output:
    28,409,263 bytes - bro 11 (0.416 sec. decomp)
    28,387,045 bytes - rar max (0.526 sec. decomp)
    27,367,324 bytes - zstd 22 (0.327 sec. decomp)
    26,266,954 bytes - 7z ultra (1.213 sec. decomp)
    25,856,145 bytes - lzturbo 49 (1.715 sec. decomp)
    22,624,060 bytes - freearc ultra
    22,092,657 bytes - glza (1.688 sec. decomp)
    21,872,106 bytes - bce
    21,282,302 bytes - emma text max all dicts
    20,809,789 bytes - paq8pxd18 15
    20,486,693 bytes - cmix dict

  20. Thanks:

    Kennon Conrad (12th August 2016)

  21. 12th August 2016, 15:37 #796
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Same input file with lzbench:

    Code:
    Compressor name Compress. Decompress. Compr. size Ratio
    xz 5.2.2 -9 1.07 MB/s 63 MB/s 26264780 34.80
    zstd 0.7.1 -22 1.12 MB/s 256 MB/s 26382257 34.95
    csc 3.3 -5 2.21 MB/s 48 MB/s 26382631 34.95
    lzlib 1.7 -9 1.08 MB/s 47 MB/s 26438902 35.03
    lzma 9.38 -5 1.24 MB/s 70 MB/s 26829334 35.54
    tornado 0.6a -16 1.46 MB/s 149 MB/s 26847834 35.57
    csc 3.3 -3 4.79 MB/s 43 MB/s 27138268 35.95
    lzlib 1.7 -6 1.44 MB/s 47 MB/s 27272926 36.13
    xz 5.2.2 -6 1.50 MB/s 63 MB/s 27277528 36.14
    brotli 0.4.0 -11 0.57 MB/s 280 MB/s 27574511 36.53
    zstd 0.7.1 -18 2.27 MB/s 320 MB/s 27661719 36.65
    tornado 0.6a -13 5.12 MB/s 139 MB/s 28645105 37.95
    csc 3.3 -1 10 MB/s 44 MB/s 29091723 38.54
    tornado 0.6a -10 3.55 MB/s 131 MB/s 29239265 38.74
    zstd 0.7.1 -15 6.53 MB/s 304 MB/s 29816308 39.50
    tornado 0.6a -7 11 MB/s 137 MB/s 30014469 39.76
    lzham 1.0 -d26 -1 2.03 MB/s 145 MB/s 30225678 40.04
    lzlib 1.7 -3 4.02 MB/s 42 MB/s 30380336 40.25
    brotli 0.4.0 -8 5.63 MB/s 338 MB/s 30524458 40.44
    zling 2016年01月10日 -4 25 MB/s 118 MB/s 30530116 40.45
    zstd 0.7.1 -11 14 MB/s 306 MB/s 30615643 40.56
    zling 2016年01月10日 -3 28 MB/s 118 MB/s 30798048 40.80
    zling 2016年01月10日 -2 32 MB/s 116 MB/s 30965319 41.02
    zling 2016年01月10日 -1 36 MB/s 111 MB/s 31182632 41.31
    xz 5.2.2 -3 3.42 MB/s 54 MB/s 31294612 41.46
    zstd 0.7.1 -8 29 MB/s 315 MB/s 31359720 41.55
    zling 2016年01月10日 -0 42 MB/s 114 MB/s 31528082 41.77
    lzma 9.38 -4 7.32 MB/s 56 MB/s 32153474 42.60
    tornado 0.6a -6 32 MB/s 128 MB/s 32372581 42.89
    brotli 0.4.0 -5 17 MB/s 310 MB/s 32698630 43.32
    crush 1.0 -2 0.54 MB/s 207 MB/s 32730360 43.36
    xpack 2016年06月02日 -9 7.98 MB/s 573 MB/s 32947155 43.65
    lzma 9.38 -2 11 MB/s 51 MB/s 33081532 43.83
    tornado 0.6a -5 42 MB/s 126 MB/s 33161180 43.93
    xpack 2016年06月02日 -6 19 MB/s 569 MB/s 33355010 44.19
    zstd 0.7.1 -5 63 MB/s 307 MB/s 33442944 44.31
    crush 1.0 -1 3.38 MB/s 205 MB/s 33667337 44.60
    zlib 1.2.8 -9 8.68 MB/s 238 MB/s 34020428 45.07
    zstd 0.7.1 -2 134 MB/s 521 MB/s 34224538 45.34
    zlib 1.2.8 -6 19 MB/s 237 MB/s 34236668 45.36
    brotli 0.4.0 -2 74 MB/s 286 MB/s 34323549 45.47
    lzfse 2016年06月19日 42 MB/s 478 MB/s 34442323 45.63
    lzlib 1.7 -0 21 MB/s 34 MB/s 34802127 46.11
    lzham 1.0 -d26 -0 7.08 MB/s 129 MB/s 35157859 46.58
    lzma 9.38 -0 16 MB/s 42 MB/s 35252635 46.70
    lz5hc 1.4.1 -15 2.13 MB/s 291 MB/s 35269096 46.73
    tornado 0.6a -4 84 MB/s 171 MB/s 35299464 46.77
    zstd 0.7.1 -1 197 MB/s 668 MB/s 35856840 47.50
    xz 5.2.2 -0 13 MB/s 39 MB/s 35998760 47.69
    tornado 0.6a -3 101 MB/s 162 MB/s 36441868 48.28
    lzsse2 2016年05月14日 -17 7.22 MB/s 2387 MB/s 36669486 48.58
    lzsse2 2016年05月14日 -12 7.16 MB/s 2386 MB/s 36669486 48.58
    lzsse2 2016年05月14日 -6 7.21 MB/s 2388 MB/s 36669486 48.58
    xpack 2016年06月02日 -1 92 MB/s 504 MB/s 37202360 49.29
    crush 1.0 -0 36 MB/s 194 MB/s 37228896 49.32
    lzsse4 2016年05月14日 -6 9.46 MB/s 2530 MB/s 37511708 49.70
    lzsse4 2016年05月14日 -12 9.40 MB/s 2530 MB/s 37511708 49.70
    lzsse4 2016年05月14日 -17 9.38 MB/s 2528 MB/s 37511708 49.70
    lzsse8 2016年05月14日 -6 8.86 MB/s 2571 MB/s 37513193 49.70
    lzsse8 2016年05月14日 -12 8.96 MB/s 2572 MB/s 37513193 49.70
    lzsse8 2016年05月14日 -17 8.88 MB/s 2571 MB/s 37513193 49.70
    lz5hc 1.4.1 -12 6.03 MB/s 360 MB/s 37802087 50.08
    brotli 0.4.0 -0 174 MB/s 245 MB/s 38005179 50.35
    zlib 1.2.8 -1 68 MB/s 260 MB/s 38263933 50.69
    ucl_nrv2e 1.03 -9 0.73 MB/s 225 MB/s 38564531 51.09
    ucl_nrv2d 1.03 -9 0.73 MB/s 225 MB/s 38779760 51.38
    ucl_nrv2e 1.03 -6 9.76 MB/s 208 MB/s 39575337 52.43
    ucl_nrv2b 1.03 -9 0.71 MB/s 219 MB/s 39628722 52.50
    ucl_nrv2b 1.03 -6 9.54 MB/s 210 MB/s 39785420 52.71
    ucl_nrv2d 1.03 -6 9.78 MB/s 209 MB/s 39813335 52.75
    density 0.12.5 beta -3 245 MB/s 215 MB/s 39942582 52.92
    lzo1x 2.09 -999 4.68 MB/s 349 MB/s 40012645 53.01
    lz5hc 1.4.1 -9 19 MB/s 422 MB/s 40200534 53.26
    lzo1z 2.09 -999 4.62 MB/s 337 MB/s 40241379 53.31
    lzo1b 2.09 -999 5.44 MB/s 454 MB/s 40242414 53.31
    lzsse8 2016年05月14日 -1 13 MB/s 2426 MB/s 40469085 53.61
    lzsse4 2016年05月14日 -1 14 MB/s 2371 MB/s 40548546 53.72
    lzo2a 2.09 -999 21 MB/s 322 MB/s 40860926 54.13
    lzo1y 2.09 -999 4.57 MB/s 353 MB/s 40884666 54.17
    lz4hc r131 -12 19 MB/s 2414 MB/s 41061711 54.40
    lz4hc r131 -16 19 MB/s 2329 MB/s 41061711 54.40
    lz4hc r131 -9 24 MB/s 2413 MB/s 41100051 54.45
    lzo1c 2.09 -999 17 MB/s 449 MB/s 41641030 55.17
    lzmat 1.01 15 MB/s 241 MB/s 41894883 55.50
    lz4hc r131 -4 46 MB/s 2264 MB/s 42271114 56.00
    ucl_nrv2e 1.03 -1 31 MB/s 203 MB/s 42383482 56.15
    ucl_nrv2b 1.03 -1 31 MB/s 205 MB/s 42417041 56.20
    ucl_nrv2d 1.03 -1 32 MB/s 202 MB/s 42575130 56.41
    quicklz 1.5.0 -2 140 MB/s 395 MB/s 42798467 56.70
    lzo1f 2.09 -999 14 MB/s 347 MB/s 42880732 56.81
    brieflz 1.1.0 75 MB/s 131 MB/s 43301822 57.37
    density 0.12.5 beta -2 390 MB/s 377 MB/s 43685370 57.88
    gipfeli 2015年11月30日 168 MB/s 289 MB/s 43756909 57.97
    yalz77 2015年09月19日 -12 23 MB/s 232 MB/s 43978841 58.26
    quicklz 1.5.0 -3 36 MB/s 634 MB/s 44055449 58.37
    lzsse2 2016年05月14日 -1 16 MB/s 2018 MB/s 44453573 58.89
    lzrw 15-Jul-1991 -5 99 MB/s 356 MB/s 44681359 59.20
    lzvn 2016年06月19日 42 MB/s 676 MB/s 44963128 59.57
    yalz77 2015年09月19日 -8 30 MB/s 230 MB/s 45167741 59.84
    lzo1b 2.09 -99 67 MB/s 425 MB/s 45241364 59.94
    lzo1c 2.09 -99 68 MB/s 455 MB/s 45441579 60.20
    lzo1a 2.09 -99 80 MB/s 439 MB/s 45732697 60.59
    lzg 1.0.8 -8 3.65 MB/s 369 MB/s 46047120 61.00
    lzo1 2.09 -99 80 MB/s 384 MB/s 46205348 61.21
    lzo1c 2.09 -9 99 MB/s 460 MB/s 46423044 61.50
    lzo1b 2.09 -9 98 MB/s 441 MB/s 46566869 61.69
    lz5 1.4.1 126 MB/s 287 MB/s 46630732 61.78
    lzo1b 2.09 -6 136 MB/s 467 MB/s 46751918 61.94
    lzo1c 2.09 -6 125 MB/s 476 MB/s 46923883 62.17
    yalz77 2015年09月19日 -4 41 MB/s 225 MB/s 47219256 62.56
    tornado 0.6a -2 143 MB/s 236 MB/s 47234262 62.58
    lz5hc 1.4.1 -4 117 MB/s 766 MB/s 47278114 62.64
    lz4hc r131 -1 87 MB/s 2284 MB/s 47322165 62.69
    lzrw 15-Jul-1991 -4 202 MB/s 361 MB/s 48003209 63.60
    density 0.12.5 beta -1 624 MB/s 789 MB/s 48244212 63.92
    quicklz 1.5.0 -1 283 MB/s 399 MB/s 48265909 63.94
    shrinker 0.1 234 MB/s 809 MB/s 48392802 64.11
    lzf 3.6 -1 220 MB/s 484 MB/s 48794557 64.64
    pithy 2011年12月24日 -9 146 MB/s 800 MB/s 49682147 65.82
    lzo1b 2.09 -3 129 MB/s 417 MB/s 49704607 65.85
    lzg 1.0.8 -6 11 MB/s 362 MB/s 49954909 66.18
    lzrw 15-Jul-1991 -3 183 MB/s 386 MB/s 50240067 66.56
    lzo1c 2.09 -3 129 MB/s 447 MB/s 50360806 66.72
    fastlz 0.1 -2 213 MB/s 375 MB/s 50455768 66.85
    fastlz 0.1 -1 221 MB/s 379 MB/s 50524699 66.94
    blosclz 2015年11月10日 -9 176 MB/s 562 MB/s 50953927 67.51
    pithy 2011年12月24日 -6 177 MB/s 834 MB/s 51036884 67.62
    lzo1a 2.09 -1 182 MB/s 419 MB/s 51042307 67.62
    yalz77 2015年09月19日 -1 60 MB/s 229 MB/s 51386478 68.08
    lzo1b 2.09 -1 132 MB/s 406 MB/s 51581314 68.34
    yappy 2014年03月22日 -100 63 MB/s 2076 MB/s 51800889 68.63
    lzo1 2.09 -1 190 MB/s 362 MB/s 51865841 68.71
    lzo1c 2.09 -1 134 MB/s 452 MB/s 51954061 68.83
    yappy 2014年03月22日 -10 78 MB/s 2059 MB/s 52313910 69.31
    lzo1x 2.09 -1 314 MB/s 420 MB/s 52537407 69.60
    lzo1f 2.09 -1 126 MB/s 393 MB/s 52735666 69.87
    lzrw 15-Jul-1991 -2 179 MB/s 384 MB/s 52777872 69.92
    lzrw 15-Jul-1991 -1 174 MB/s 380 MB/s 52779016 69.92
    yappy 2014年03月22日 -1 92 MB/s 1959 MB/s 52980876 70.19
    lzg 1.0.8 -4 31 MB/s 381 MB/s 52999546 70.22
    lzo1y 2.09 -1 315 MB/s 430 MB/s 53441792 70.80
    lzo1x 2.09 -15 342 MB/s 429 MB/s 53545578 70.94
    lzf 3.6 -0 196 MB/s 445 MB/s 53607090 71.02
    pithy 2011年12月24日 -3 277 MB/s 1036 MB/s 54288680 71.92
    snappy 1.1.3 261 MB/s 936 MB/s 54538734 72.25
    lzo1x 2.09 -12 371 MB/s 453 MB/s 55077565 72.97
    lzg 1.0.8 -1 59 MB/s 426 MB/s 55593761 73.65
    lz4 r131 442 MB/s 2284 MB/s 56143775 74.38
    pithy 2011年12月24日 -0 336 MB/s 1208 MB/s 56786078 75.23
    lzo1x 2.09 -11 403 MB/s 511 MB/s 57152425 75.72
    wflz 2015年09月16日 123 MB/s 702 MB/s 58464473 77.46
    tornado 0.6a -1 201 MB/s 308 MB/s 58877103 78.00
    lzjb 2010 206 MB/s 413 MB/s 59912482 79.37
    lz4fast r131 -3 574 MB/s 2211 MB/s 61066462 80.90
    lz5hc 1.4.1 -1 449 MB/s 1597 MB/s 63586753 84.24
    lz4fast r131 -17 1506 MB/s 4960 MB/s 73101711 96.85
    blosclz 2015年11月10日 -6 196 MB/s 6493 MB/s 75481052 100.00
    blosclz 2015年11月10日 -3 527 MB/s 6482 MB/s 75481052 100.00
    blosclz 2015年11月10日 -1 985 MB/s 6494 MB/s 75481052 100.00

  22. Thanks:

    Kennon Conrad (12th August 2016)

  23. 12th August 2016, 21:17 #797
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Do you know why the brotli -11 result is different?

    I suppose glza is obtaining better compression ratios than other LZ compressors on this file is because it is better at handling frequently recurring strings with unstable match distances.

  24. 13th August 2016, 06:44 #798
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    485
    Thanks
    152
    Thanked 70 Times in 50 Posts
    Quote Originally Posted by Sportman View Post
    Same input file with lzbench:

    Code:
    Compressor name Compress. Decompress. Compr. size Ratio
    xz 5.2.2 -9 1.07 MB/s 63 MB/s 26264780 34.80
    zstd 0.7.1 -22 1.12 MB/s 256 MB/s 26382257 34.95
    csc 3.3 -5 2.21 MB/s 48 MB/s 26382631 34.95
    lzlib 1.7 -9 1.08 MB/s 47 MB/s 26438902 35.03
    lzma 9.38 -5 1.24 MB/s 70 MB/s 26829334 35.54
    tornado 0.6a -16 1.46 MB/s 149 MB/s 26847834 35.57
    csc 3.3 -3 4.79 MB/s 43 MB/s 27138268 35.95
    lzlib 1.7 -6 1.44 MB/s 47 MB/s 27272926 36.13
    xz 5.2.2 -6 1.50 MB/s 63 MB/s 27277528 36.14
    brotli 0.4.0 -11 0.57 MB/s 280 MB/s 27574511 36.53
    zstd 0.7.1 -18 2.27 MB/s 320 MB/s 27661719 36.65
    tornado 0.6a -13 5.12 MB/s 139 MB/s 28645105 37.95
    csc 3.3 -1 10 MB/s 44 MB/s 29091723 38.54
    tornado 0.6a -10 3.55 MB/s 131 MB/s 29239265 38.74
    zstd 0.7.1 -15 6.53 MB/s 304 MB/s 29816308 39.50
    tornado 0.6a -7 11 MB/s 137 MB/s 30014469 39.76
    lzham 1.0 -d26 -1 2.03 MB/s 145 MB/s 30225678 40.04
    lzlib 1.7 -3 4.02 MB/s 42 MB/s 30380336 40.25
    brotli 0.4.0 -8 5.63 MB/s 338 MB/s 30524458 40.44
    zling 2016年01月10日 -4 25 MB/s 118 MB/s 30530116 40.45
    zstd 0.7.1 -11 14 MB/s 306 MB/s 30615643 40.56
    zling 2016年01月10日 -3 28 MB/s 118 MB/s 30798048 40.80
    zling 2016年01月10日 -2 32 MB/s 116 MB/s 30965319 41.02
    zling 2016年01月10日 -1 36 MB/s 111 MB/s 31182632 41.31
    xz 5.2.2 -3 3.42 MB/s 54 MB/s 31294612 41.46
    zstd 0.7.1 -8 29 MB/s 315 MB/s 31359720 41.55
    zling 2016年01月10日 -0 42 MB/s 114 MB/s 31528082 41.77
    lzma 9.38 -4 7.32 MB/s 56 MB/s 32153474 42.60
    tornado 0.6a -6 32 MB/s 128 MB/s 32372581 42.89
    brotli 0.4.0 -5 17 MB/s 310 MB/s 32698630 43.32
    crush 1.0 -2 0.54 MB/s 207 MB/s 32730360 43.36
    xpack 2016年06月02日 -9 7.98 MB/s 573 MB/s 32947155 43.65
    lzma 9.38 -2 11 MB/s 51 MB/s 33081532 43.83
    tornado 0.6a -5 42 MB/s 126 MB/s 33161180 43.93
    xpack 2016年06月02日 -6 19 MB/s 569 MB/s 33355010 44.19
    zstd 0.7.1 -5 63 MB/s 307 MB/s 33442944 44.31
    crush 1.0 -1 3.38 MB/s 205 MB/s 33667337 44.60
    zlib 1.2.8 -9 8.68 MB/s 238 MB/s 34020428 45.07
    zstd 0.7.1 -2 134 MB/s 521 MB/s 34224538 45.34
    zlib 1.2.8 -6 19 MB/s 237 MB/s 34236668 45.36
    brotli 0.4.0 -2 74 MB/s 286 MB/s 34323549 45.47
    lzfse 2016年06月19日 42 MB/s 478 MB/s 34442323 45.63
    lzlib 1.7 -0 21 MB/s 34 MB/s 34802127 46.11
    lzham 1.0 -d26 -0 7.08 MB/s 129 MB/s 35157859 46.58
    lzma 9.38 -0 16 MB/s 42 MB/s 35252635 46.70
    lz5hc 1.4.1 -15 2.13 MB/s 291 MB/s 35269096 46.73
    tornado 0.6a -4 84 MB/s 171 MB/s 35299464 46.77
    zstd 0.7.1 -1 197 MB/s 668 MB/s 35856840 47.50
    xz 5.2.2 -0 13 MB/s 39 MB/s 35998760 47.69
    tornado 0.6a -3 101 MB/s 162 MB/s 36441868 48.28
    lzsse2 2016年05月14日 -17 7.22 MB/s 2387 MB/s 36669486 48.58
    lzsse2 2016年05月14日 -12 7.16 MB/s 2386 MB/s 36669486 48.58
    lzsse2 2016年05月14日 -6 7.21 MB/s 2388 MB/s 36669486 48.58
    xpack 2016年06月02日 -1 92 MB/s 504 MB/s 37202360 49.29
    crush 1.0 -0 36 MB/s 194 MB/s 37228896 49.32
    lzsse4 2016年05月14日 -6 9.46 MB/s 2530 MB/s 37511708 49.70
    lzsse4 2016年05月14日 -12 9.40 MB/s 2530 MB/s 37511708 49.70
    lzsse4 2016年05月14日 -17 9.38 MB/s 2528 MB/s 37511708 49.70
    lzsse8 2016年05月14日 -6 8.86 MB/s 2571 MB/s 37513193 49.70
    lzsse8 2016年05月14日 -12 8.96 MB/s 2572 MB/s 37513193 49.70
    lzsse8 2016年05月14日 -17 8.88 MB/s 2571 MB/s 37513193 49.70
    lz5hc 1.4.1 -12 6.03 MB/s 360 MB/s 37802087 50.08
    brotli 0.4.0 -0 174 MB/s 245 MB/s 38005179 50.35
    zlib 1.2.8 -1 68 MB/s 260 MB/s 38263933 50.69
    ucl_nrv2e 1.03 -9 0.73 MB/s 225 MB/s 38564531 51.09
    ucl_nrv2d 1.03 -9 0.73 MB/s 225 MB/s 38779760 51.38
    ucl_nrv2e 1.03 -6 9.76 MB/s 208 MB/s 39575337 52.43
    ucl_nrv2b 1.03 -9 0.71 MB/s 219 MB/s 39628722 52.50
    ucl_nrv2b 1.03 -6 9.54 MB/s 210 MB/s 39785420 52.71
    ucl_nrv2d 1.03 -6 9.78 MB/s 209 MB/s 39813335 52.75
    density 0.12.5 beta -3 245 MB/s 215 MB/s 39942582 52.92
    lzo1x 2.09 -999 4.68 MB/s 349 MB/s 40012645 53.01
    lz5hc 1.4.1 -9 19 MB/s 422 MB/s 40200534 53.26
    lzo1z 2.09 -999 4.62 MB/s 337 MB/s 40241379 53.31
    lzo1b 2.09 -999 5.44 MB/s 454 MB/s 40242414 53.31
    lzsse8 2016年05月14日 -1 13 MB/s 2426 MB/s 40469085 53.61
    lzsse4 2016年05月14日 -1 14 MB/s 2371 MB/s 40548546 53.72
    lzo2a 2.09 -999 21 MB/s 322 MB/s 40860926 54.13
    lzo1y 2.09 -999 4.57 MB/s 353 MB/s 40884666 54.17
    lz4hc r131 -12 19 MB/s 2414 MB/s 41061711 54.40
    lz4hc r131 -16 19 MB/s 2329 MB/s 41061711 54.40
    lz4hc r131 -9 24 MB/s 2413 MB/s 41100051 54.45
    lzo1c 2.09 -999 17 MB/s 449 MB/s 41641030 55.17
    lzmat 1.01 15 MB/s 241 MB/s 41894883 55.50
    lz4hc r131 -4 46 MB/s 2264 MB/s 42271114 56.00
    ucl_nrv2e 1.03 -1 31 MB/s 203 MB/s 42383482 56.15
    ucl_nrv2b 1.03 -1 31 MB/s 205 MB/s 42417041 56.20
    ucl_nrv2d 1.03 -1 32 MB/s 202 MB/s 42575130 56.41
    quicklz 1.5.0 -2 140 MB/s 395 MB/s 42798467 56.70
    lzo1f 2.09 -999 14 MB/s 347 MB/s 42880732 56.81
    brieflz 1.1.0 75 MB/s 131 MB/s 43301822 57.37
    density 0.12.5 beta -2 390 MB/s 377 MB/s 43685370 57.88
    gipfeli 2015年11月30日 168 MB/s 289 MB/s 43756909 57.97
    yalz77 2015年09月19日 -12 23 MB/s 232 MB/s 43978841 58.26
    quicklz 1.5.0 -3 36 MB/s 634 MB/s 44055449 58.37
    lzsse2 2016年05月14日 -1 16 MB/s 2018 MB/s 44453573 58.89
    lzrw 15-Jul-1991 -5 99 MB/s 356 MB/s 44681359 59.20
    lzvn 2016年06月19日 42 MB/s 676 MB/s 44963128 59.57
    yalz77 2015年09月19日 -8 30 MB/s 230 MB/s 45167741 59.84
    lzo1b 2.09 -99 67 MB/s 425 MB/s 45241364 59.94
    lzo1c 2.09 -99 68 MB/s 455 MB/s 45441579 60.20
    lzo1a 2.09 -99 80 MB/s 439 MB/s 45732697 60.59
    lzg 1.0.8 -8 3.65 MB/s 369 MB/s 46047120 61.00
    lzo1 2.09 -99 80 MB/s 384 MB/s 46205348 61.21
    lzo1c 2.09 -9 99 MB/s 460 MB/s 46423044 61.50
    lzo1b 2.09 -9 98 MB/s 441 MB/s 46566869 61.69
    lz5 1.4.1 126 MB/s 287 MB/s 46630732 61.78
    lzo1b 2.09 -6 136 MB/s 467 MB/s 46751918 61.94
    lzo1c 2.09 -6 125 MB/s 476 MB/s 46923883 62.17
    yalz77 2015年09月19日 -4 41 MB/s 225 MB/s 47219256 62.56
    tornado 0.6a -2 143 MB/s 236 MB/s 47234262 62.58
    lz5hc 1.4.1 -4 117 MB/s 766 MB/s 47278114 62.64
    lz4hc r131 -1 87 MB/s 2284 MB/s 47322165 62.69
    lzrw 15-Jul-1991 -4 202 MB/s 361 MB/s 48003209 63.60
    density 0.12.5 beta -1 624 MB/s 789 MB/s 48244212 63.92
    quicklz 1.5.0 -1 283 MB/s 399 MB/s 48265909 63.94
    shrinker 0.1 234 MB/s 809 MB/s 48392802 64.11
    lzf 3.6 -1 220 MB/s 484 MB/s 48794557 64.64
    pithy 2011年12月24日 -9 146 MB/s 800 MB/s 49682147 65.82
    lzo1b 2.09 -3 129 MB/s 417 MB/s 49704607 65.85
    lzg 1.0.8 -6 11 MB/s 362 MB/s 49954909 66.18
    lzrw 15-Jul-1991 -3 183 MB/s 386 MB/s 50240067 66.56
    lzo1c 2.09 -3 129 MB/s 447 MB/s 50360806 66.72
    fastlz 0.1 -2 213 MB/s 375 MB/s 50455768 66.85
    fastlz 0.1 -1 221 MB/s 379 MB/s 50524699 66.94
    blosclz 2015年11月10日 -9 176 MB/s 562 MB/s 50953927 67.51
    pithy 2011年12月24日 -6 177 MB/s 834 MB/s 51036884 67.62
    lzo1a 2.09 -1 182 MB/s 419 MB/s 51042307 67.62
    yalz77 2015年09月19日 -1 60 MB/s 229 MB/s 51386478 68.08
    lzo1b 2.09 -1 132 MB/s 406 MB/s 51581314 68.34
    yappy 2014年03月22日 -100 63 MB/s 2076 MB/s 51800889 68.63
    lzo1 2.09 -1 190 MB/s 362 MB/s 51865841 68.71
    lzo1c 2.09 -1 134 MB/s 452 MB/s 51954061 68.83
    yappy 2014年03月22日 -10 78 MB/s 2059 MB/s 52313910 69.31
    lzo1x 2.09 -1 314 MB/s 420 MB/s 52537407 69.60
    lzo1f 2.09 -1 126 MB/s 393 MB/s 52735666 69.87
    lzrw 15-Jul-1991 -2 179 MB/s 384 MB/s 52777872 69.92
    lzrw 15-Jul-1991 -1 174 MB/s 380 MB/s 52779016 69.92
    yappy 2014年03月22日 -1 92 MB/s 1959 MB/s 52980876 70.19
    lzg 1.0.8 -4 31 MB/s 381 MB/s 52999546 70.22
    lzo1y 2.09 -1 315 MB/s 430 MB/s 53441792 70.80
    lzo1x 2.09 -15 342 MB/s 429 MB/s 53545578 70.94
    lzf 3.6 -0 196 MB/s 445 MB/s 53607090 71.02
    pithy 2011年12月24日 -3 277 MB/s 1036 MB/s 54288680 71.92
    snappy 1.1.3 261 MB/s 936 MB/s 54538734 72.25
    lzo1x 2.09 -12 371 MB/s 453 MB/s 55077565 72.97
    lzg 1.0.8 -1 59 MB/s 426 MB/s 55593761 73.65
    lz4 r131 442 MB/s 2284 MB/s 56143775 74.38
    pithy 2011年12月24日 -0 336 MB/s 1208 MB/s 56786078 75.23
    lzo1x 2.09 -11 403 MB/s 511 MB/s 57152425 75.72
    wflz 2015年09月16日 123 MB/s 702 MB/s 58464473 77.46
    tornado 0.6a -1 201 MB/s 308 MB/s 58877103 78.00
    lzjb 2010 206 MB/s 413 MB/s 59912482 79.37
    lz4fast r131 -3 574 MB/s 2211 MB/s 61066462 80.90
    lz5hc 1.4.1 -1 449 MB/s 1597 MB/s 63586753 84.24
    lz4fast r131 -17 1506 MB/s 4960 MB/s 73101711 96.85
    blosclz 2015年11月10日 -6 196 MB/s 6493 MB/s 75481052 100.00
    blosclz 2015年11月10日 -3 527 MB/s 6482 MB/s 75481052 100.00
    blosclz 2015年11月10日 -1 985 MB/s 6494 MB/s 75481052 100.00
    Was this supposed to include glza? I don't see it.

    Also, what are your system specs? What CPU instructions are you enabling in the compilation?

  25. 13th August 2016, 13:44 #799
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Do you know why the brotli -11 result is different?
    Yes I used an older Brotli and Zstd version, latest released Brotli and Zstd are missing Window compiles.

  26. Thanks:

    Kennon Conrad (13th August 2016)

  27. 13th August 2016, 13:52 #800
    Member
    Join Date
    Aug 2008
    Location
    Planet Earth
    Posts
    1,496
    Thanks
    219
    Thanked 770 Times in 513 Posts
    Quote Originally Posted by SolidComp View Post
    Was this supposed to include glza? I don't see it.
    Also, what are your system specs? What CPU instructions are you enabling in the compilation?
    No, it was for size compare to my earlier manual done tests.

    System specs are No 79. listed here http://www.mattmahoney.net/dc/text.html

    I used the Lzbench v1.2 Jun 23 compile from here https://github.com/inikep/lzbench/releases

  28. 13th August 2016, 14:52 #801
    Member
    Join Date
    Jun 2015
    Location
    Switzerland
    Posts
    1,059
    Thanks
    279
    Thanked 394 Times in 244 Posts
    Try brotli with the 16 mb window if others are not limited to 4 mb either.

  29. 14th August 2016, 23:44 #802
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts

    GLZA v0.7.1

    Compression ratios are unchanged from GLZA v0.7 and speed should be about the same. The code is structured differently, so that it is easy to integrate into other programs like lzbench.

    Test results (top 25) for my prototype version of lzbench on the Canterbury Corpus and the Large Canterbury Corpus:

    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.19 MB/s 40 MB/s 40750 26.79 alice29.txt
    brotli 0.4.0 -11 0.84 MB/s 369 MB/s 46521 30.59 alice29.txt
    lzlib 1.7 -6 4.40 MB/s 58 MB/s 48333 31.78 alice29.txt
    xz 5.2.2 -9 4.04 MB/s 69 MB/s 48448 31.86 alice29.txt
    xz 5.2.2 -6 4.37 MB/s 71 MB/s 48448 31.86 alice29.txt
    lzlib 1.7 -9 4.12 MB/s 59 MB/s 48451 31.86 alice29.txt
    lzma 9.38 -5 3.63 MB/s 81 MB/s 48458 31.86 alice29.txt
    csc 3.3 -5 7.61 MB/s 59 MB/s 49019 32.23 alice29.txt
    zstd 0.8.0 -15 7.50 MB/s 700 MB/s 49701 32.68 alice29.txt
    csc 3.3 -3 8.89 MB/s 59 MB/s 49782 32.73 alice29.txt
    zstd 0.8.0 -22 6.41 MB/s 678 MB/s 49846 32.77 alice29.txt
    zstd 0.8.0 -18 6.44 MB/s 678 MB/s 49846 32.77 alice29.txt
    brotli 0.4.0 -8 15 MB/s 459 MB/s 51190 33.66 alice29.txt
    lzham 1.0 -d26 -1 3.79 MB/s 183 MB/s 51593 33.92 alice29.txt
    lzlib 1.7 -3 7.98 MB/s 55 MB/s 51698 33.99 alice29.txt
    zling 2016年01月10日 -4 25 MB/s 79 MB/s 51699 33.99 alice29.txt
    zstd 0.8.0 -11 17 MB/s 707 MB/s 51869 34.10 alice29.txt
    zling 2016年01月10日 -3 26 MB/s 79 MB/s 51916 34.14 alice29.txt
    tornado 0.6a -16 6.69 MB/s 188 MB/s 52009 34.20 alice29.txt
    xz 5.2.2 -3 11 MB/s 68 MB/s 52085 34.25 alice29.txt
    xpack 2016年06月02日 -9 18 MB/s 1020 MB/s 52221 34.34 alice29.txt
    zling 2016年01月10日 -2 28 MB/s 79 MB/s 52362 34.43 alice29.txt
    csc 3.3 -1 22 MB/s 55 MB/s 52492 34.51 alice29.txt
    zstd 0.8.0 -8 33 MB/s 691 MB/s 52593 34.58 alice29.txt
    brotli 0.4.0 -5 33 MB/s 444 MB/s 52807 34.72 alice29.txt
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.18 MB/s 42 MB/s 38265 30.57 asyoulik.txt
    brotli 0.4.0 -11 0.89 MB/s 318 MB/s 42749 34.15 asyoulik.txt
    lzma 9.38 -5 3.66 MB/s 71 MB/s 44485 35.54 asyoulik.txt
    lzlib 1.7 -9 4.40 MB/s 52 MB/s 44489 35.54 asyoulik.txt
    xz 5.2.2 -6 4.49 MB/s 63 MB/s 44489 35.54 asyoulik.txt
    xz 5.2.2 -9 4.10 MB/s 61 MB/s 44489 35.54 asyoulik.txt
    lzlib 1.7 -6 4.70 MB/s 52 MB/s 44519 35.56 asyoulik.txt
    csc 3.3 -5 8.23 MB/s 51 MB/s 45769 36.56 asyoulik.txt
    zstd 0.8.0 -22 7.02 MB/s 629 MB/s 45816 36.60 asyoulik.txt
    zstd 0.8.0 -18 7.03 MB/s 629 MB/s 45816 36.60 asyoulik.txt
    zstd 0.8.0 -15 7.55 MB/s 625 MB/s 45975 36.73 asyoulik.txt
    csc 3.3 -3 9.15 MB/s 51 MB/s 46093 36.82 asyoulik.txt
    lzlib 1.7 -3 7.16 MB/s 50 MB/s 46288 36.98 asyoulik.txt
    zling 2016年01月10日 -4 22 MB/s 68 MB/s 46515 37.16 asyoulik.txt
    zling 2016年01月10日 -3 23 MB/s 68 MB/s 46534 37.17 asyoulik.txt
    brotli 0.4.0 -8 15 MB/s 406 MB/s 46723 37.32 asyoulik.txt
    zling 2016年01月10日 -1 25 MB/s 68 MB/s 47031 37.57 asyoulik.txt
    zstd 0.8.0 -11 15 MB/s 651 MB/s 47238 37.74 asyoulik.txt
    zling 2016年01月10日 -2 24 MB/s 68 MB/s 47261 37.75 asyoulik.txt
    lzham 1.0 -d26 -1 4.00 MB/s 162 MB/s 47326 37.81 asyoulik.txt
    zling 2016年01月10日 -0 25 MB/s 67 MB/s 47351 37.83 asyoulik.txt
    xz 5.2.2 -3 10 MB/s 60 MB/s 47437 37.90 asyoulik.txt
    brotli 0.4.0 -5 31 MB/s 396 MB/s 47585 38.01 asyoulik.txt
    zstd 0.8.0 -8 31 MB/s 632 MB/s 47634 38.05 asyoulik.txt
    xpack 2016年06月02日 -9 22 MB/s 913 MB/s 47801 38.19 asyoulik.txt
    csc 3.3 -1 20 MB/s 49 MB/s 47996 38.34 asyoulik.txt
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    brotli 0.4.0 -11 1.03 MB/s 315 MB/s 6893 28.02 cp.html
    glza 0.7.1 0.79 MB/s 23 MB/s 6966 28.31 cp.html
    xz 5.2.2 -6 3.89 MB/s 61 MB/s 7600 30.89 cp.html
    xz 5.2.2 -9 3.06 MB/s 53 MB/s 7600 30.89 cp.html
    lzlib 1.7 -9 4.64 MB/s 53 MB/s 7610 30.93 cp.html
    lzma 9.38 -5 2.01 MB/s 72 MB/s 7624 30.99 cp.html
    lzlib 1.7 -6 6.03 MB/s 53 MB/s 7649 31.09 cp.html
    brotli 0.4.0 -8 8.37 MB/s 431 MB/s 7675 31.20 cp.html
    brotli 0.4.0 -5 35 MB/s 431 MB/s 7759 31.54 cp.html
    zstd 0.8.0 -22 7.53 MB/s 723 MB/s 7761 31.54 cp.html
    zstd 0.8.0 -18 7.53 MB/s 723 MB/s 7761 31.54 cp.html
    lzlib 1.7 -0 8.12 MB/s 52 MB/s 7789 31.66 cp.html
    zstd 0.8.0 -15 12 MB/s 723 MB/s 7837 31.85 cp.html
    zstd 0.8.0 -11 54 MB/s 793 MB/s 7899 32.11 cp.html
    xz 5.2.2 -3 9.22 MB/s 61 MB/s 7924 32.21 cp.html
    zlib 1.2.8 -9 46 MB/s 384 MB/s 7940 32.27 cp.html
    xpack 2016年06月02日 -9 68 MB/s 946 MB/s 7954 32.33 cp.html
    zlib 1.2.8 -6 53 MB/s 384 MB/s 7961 32.36 cp.html
    zstd 0.8.0 -8 79 MB/s 768 MB/s 7965 32.37 cp.html
    lzma 9.38 -4 8.45 MB/s 70 MB/s 7990 32.48 cp.html
    lzma 9.38 -2 22 MB/s 70 MB/s 7991 32.48 cp.html
    xpack 2016年06月02日 -6 88 MB/s 946 MB/s 7992 32.48 cp.html
    lzlib 1.7 -3 11 MB/s 51 MB/s 8017 32.59 cp.html
    zstd 0.8.0 -5 132 MB/s 745 MB/s 8085 32.86 cp.html
    lzma 9.38 -0 26 MB/s 68 MB/s 8102 32.93 cp.html
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    brotli 0.4.0 -11 0.95 MB/s 484 MB/s 2717 24.37 fields.c
    glza 0.7.1 0.03 MB/s 17 MB/s 2812 25.22 fields.c
    lzma 9.38 -5 1.09 MB/s 87 MB/s 2967 26.61 fields.c
    brotli 0.4.0 -8 7.15 MB/s 586 MB/s 2968 26.62 fields.c
    xz 5.2.2 -6 3.72 MB/s 70 MB/s 2981 26.74 fields.c
    xz 5.2.2 -9 2.82 MB/s 51 MB/s 2981 26.74 fields.c
    lzlib 1.7 -9 4.68 MB/s 63 MB/s 2988 26.80 fields.c
    lzlib 1.7 -6 5.80 MB/s 63 MB/s 3000 26.91 fields.c
    brotli 0.4.0 -5 27 MB/s 586 MB/s 3012 27.01 fields.c
    zstd 0.8.0 -18 7.82 MB/s 743 MB/s 3026 27.14 fields.c
    zstd 0.8.0 -15 8.83 MB/s 743 MB/s 3026 27.14 fields.c
    zstd 0.8.0 -22 7.82 MB/s 743 MB/s 3026 27.14 fields.c
    lzlib 1.7 -0 7.66 MB/s 62 MB/s 3033 27.20 fields.c
    zstd 0.8.0 -11 16 MB/s 743 MB/s 3087 27.69 fields.c
    xz 5.2.2 -3 7.75 MB/s 70 MB/s 3111 27.90 fields.c
    zlib 1.2.8 -9 47 MB/s 484 MB/s 3115 27.94 fields.c
    zstd 0.8.0 -8 62 MB/s 796 MB/s 3122 28.00 fields.c
    zlib 1.2.8 -6 56 MB/s 484 MB/s 3122 28.00 fields.c
    lzma 9.38 -4 4.79 MB/s 83 MB/s 3126 28.04 fields.c
    lzma 9.38 -0 27 MB/s 83 MB/s 3126 28.04 fields.c
    lzma 9.38 -2 19 MB/s 83 MB/s 3126 28.04 fields.c
    xpack 2016年06月02日 -9 74 MB/s 929 MB/s 3126 28.04 fields.c
    xpack 2016年06月02日 -6 90 MB/s 929 MB/s 3149 28.24 fields.c
    lzlib 1.7 -3 11 MB/s 60 MB/s 3162 28.36 fields.c
    zstd 0.8.0 -5 107 MB/s 743 MB/s 3163 28.37 fields.c
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    brotli 0.4.0 -11 0.85 MB/s 372 MB/s 1126 30.26 grammar.lsp
    glza 0.7.1 0.01 MB/s 9.05 MB/s 1175 31.58 grammar.lsp
    brotli 0.4.0 -8 4.50 MB/s 477 MB/s 1185 31.85 grammar.lsp
    brotli 0.4.0 -5 14 MB/s 472 MB/s 1189 31.95 grammar.lsp
    zstd 0.8.0 -15 10 MB/s 542 MB/s 1221 32.81 grammar.lsp
    zstd 0.8.0 -22 9.47 MB/s 542 MB/s 1222 32.84 grammar.lsp
    zstd 0.8.0 -18 9.47 MB/s 542 MB/s 1222 32.84 grammar.lsp
    zlib 1.2.8 -9 66 MB/s 471 MB/s 1222 32.84 grammar.lsp
    zlib 1.2.8 -6 70 MB/s 474 MB/s 1222 32.84 grammar.lsp
    lzma 9.38 -5 0.41 MB/s 95 MB/s 1234 33.16 grammar.lsp
    zstd 0.8.0 -11 18 MB/s 536 MB/s 1239 33.30 grammar.lsp
    zstd 0.8.0 -8 86 MB/s 579 MB/s 1243 33.40 grammar.lsp
    zstd 0.8.0 -5 143 MB/s 568 MB/s 1246 33.49 grammar.lsp
    xz 5.2.2 -6 2.84 MB/s 67 MB/s 1247 33.51 grammar.lsp
    xz 5.2.2 -9 1.90 MB/s 32 MB/s 1247 33.51 grammar.lsp
    lzlib 1.7 -9 4.86 MB/s 64 MB/s 1259 33.83 grammar.lsp
    lzlib 1.7 -6 5.85 MB/s 64 MB/s 1260 33.86 grammar.lsp
    xpack 2016年06月02日 -9 84 MB/s 766 MB/s 1263 33.94 grammar.lsp
    xpack 2016年06月02日 -6 90 MB/s 767 MB/s 1263 33.94 grammar.lsp
    lzlib 1.7 -0 6.97 MB/s 63 MB/s 1265 34.00 grammar.lsp
    lzma 9.38 -4 1.80 MB/s 90 MB/s 1271 34.16 grammar.lsp
    lzma 9.38 -2 11 MB/s 90 MB/s 1271 34.16 grammar.lsp
    lzma 9.38 -0 21 MB/s 90 MB/s 1271 34.16 grammar.lsp
    xz 5.2.2 -3 4.68 MB/s 70 MB/s 1284 34.51 grammar.lsp
    lzlib 1.7 -3 9.59 MB/s 63 MB/s 1289 34.64 grammar.lsp
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.47 MB/s 348 MB/s 23099 2.24 kennedy.xls
    lzlib 1.7 -3 23 MB/s 170 MB/s 42537 4.13 kennedy.xls
    xz 5.2.2 -6 3.25 MB/s 195 MB/s 49071 4.77 kennedy.xls
    xz 5.2.2 -9 3.27 MB/s 192 MB/s 49071 4.77 kennedy.xls
    lzlib 1.7 -6 2.94 MB/s 144 MB/s 51229 4.97 kennedy.xls
    lzma 9.38 -5 3.24 MB/s 222 MB/s 51388 4.99 kennedy.xls
    lzlib 1.7 -9 1.73 MB/s 143 MB/s 51576 5.01 kennedy.xls
    lzlib 1.7 -0 71 MB/s 163 MB/s 56620 5.50 kennedy.xls
    csc 3.3 -3 8.55 MB/s 221 MB/s 57135 5.55 kennedy.xls
    csc 3.3 -5 6.64 MB/s 219 MB/s 60279 5.85 kennedy.xls
    tornado 0.6a -16 5.57 MB/s 257 MB/s 62140 6.03 kennedy.xls
    brotli 0.4.0 -11 0.46 MB/s 618 MB/s 62192 6.04 kennedy.xls
    lzham 1.0 -d26 -1 5.66 MB/s 336 MB/s 62201 6.04 kennedy.xls
    tornado 0.6a -13 9.28 MB/s 256 MB/s 62775 6.10 kennedy.xls
    lzma 9.38 -4 32 MB/s 260 MB/s 66394 6.45 kennedy.xls
    lzma 9.38 -2 39 MB/s 260 MB/s 66432 6.45 kennedy.xls
    xz 5.2.2 -3 16 MB/s 229 MB/s 67536 6.56 kennedy.xls
    brotli 0.4.0 -8 21 MB/s 736 MB/s 68136 6.62 kennedy.xls
    xz 5.2.2 -0 66 MB/s 245 MB/s 68663 6.67 kennedy.xls
    lzma 9.38 -0 59 MB/s 255 MB/s 69143 6.71 kennedy.xls
    zstd 0.8.0 -22 1.86 MB/s 545 MB/s 70226 6.82 kennedy.xls
    brotli 0.4.0 -5 61 MB/s 713 MB/s 71510 6.94 kennedy.xls
    csc 3.3 -1 42 MB/s 192 MB/s 72935 7.08 kennedy.xls
    xpack 2016年06月02日 -9 17 MB/s 1180 MB/s 77766 7.55 kennedy.xls
    xpack 2016年06月02日 -6 42 MB/s 1162 MB/s 78243 7.60 kennedy.xls
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.11 MB/s 49 MB/s 97262 22.79 lcet10.txt
    brotli 0.4.0 -11 0.77 MB/s 430 MB/s 113475 26.59 lcet10.txt
    csc 3.3 -5 7.43 MB/s 70 MB/s 118720 27.82 lcet10.txt
    lzlib 1.7 -6 4.13 MB/s 67 MB/s 119134 27.92 lcet10.txt
    lzlib 1.7 -9 3.76 MB/s 68 MB/s 119267 27.95 lcet10.txt
    xz 5.2.2 -6 4.29 MB/s 84 MB/s 119446 27.99 lcet10.txt
    xz 5.2.2 -9 4.09 MB/s 83 MB/s 119446 27.99 lcet10.txt
    lzma 9.38 -5 3.95 MB/s 94 MB/s 119519 28.01 lcet10.txt
    csc 3.3 -3 9.17 MB/s 67 MB/s 120628 28.27 lcet10.txt
    zstd 0.8.0 -22 5.37 MB/s 777 MB/s 121995 28.59 lcet10.txt
    zstd 0.8.0 -18 7.49 MB/s 811 MB/s 122632 28.74 lcet10.txt
    tornado 0.6a -16 5.55 MB/s 218 MB/s 124713 29.22 lcet10.txt
    zling 2016年01月10日 -4 40 MB/s 137 MB/s 126706 29.69 lcet10.txt
    zstd 0.8.0 -15 13 MB/s 827 MB/s 127506 29.88 lcet10.txt
    zling 2016年01月10日 -3 43 MB/s 137 MB/s 127565 29.89 lcet10.txt
    lzham 1.0 -d26 -1 3.62 MB/s 228 MB/s 127859 29.96 lcet10.txt
    brotli 0.4.0 -8 17 MB/s 536 MB/s 128084 30.01 lcet10.txt
    tornado 0.6a -13 7.56 MB/s 212 MB/s 128203 30.04 lcet10.txt
    csc 3.3 -1 25 MB/s 64 MB/s 128890 30.20 lcet10.txt
    zstd 0.8.0 -11 33 MB/s 808 MB/s 129899 30.44 lcet10.txt
    zling 2016年01月10日 -2 48 MB/s 135 MB/s 130036 30.47 lcet10.txt
    lzlib 1.7 -3 8.56 MB/s 62 MB/s 130566 30.60 lcet10.txt
    xz 5.2.2 -3 10 MB/s 79 MB/s 130823 30.66 lcet10.txt
    xpack 2016年06月02日 -9 11 MB/s 1131 MB/s 131026 30.70 lcet10.txt
    zstd 0.8.0 -8 47 MB/s 790 MB/s 131246 30.75 lcet10.txt
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.17 MB/s 51 MB/s 136857 28.40 plrabn12.txt
    brotli 0.4.0 -11 0.81 MB/s 375 MB/s 163282 33.89 plrabn12.txt
    lzlib 1.7 -9 3.76 MB/s 59 MB/s 165247 34.29 plrabn12.txt
    lzma 9.38 -5 3.72 MB/s 82 MB/s 165311 34.31 plrabn12.txt
    xz 5.2.2 -6 4.01 MB/s 72 MB/s 165346 34.31 plrabn12.txt
    xz 5.2.2 -9 3.88 MB/s 72 MB/s 165346 34.31 plrabn12.txt
    lzlib 1.7 -6 3.87 MB/s 58 MB/s 165400 34.33 plrabn12.txt
    csc 3.3 -5 7.68 MB/s 55 MB/s 165487 34.34 plrabn12.txt
    csc 3.3 -3 8.48 MB/s 55 MB/s 166767 34.61 plrabn12.txt
    zstd 0.8.0 -18 7.10 MB/s 685 MB/s 168550 34.98 plrabn12.txt
    zstd 0.8.0 -22 5.42 MB/s 672 MB/s 168627 34.99 plrabn12.txt
    tornado 0.6a -16 5.68 MB/s 188 MB/s 171140 35.52 plrabn12.txt
    zling 2016年01月10日 -4 34 MB/s 126 MB/s 173903 36.09 plrabn12.txt
    zling 2016年01月10日 -3 38 MB/s 125 MB/s 174762 36.27 plrabn12.txt
    tornado 0.6a -13 7.92 MB/s 181 MB/s 175832 36.49 plrabn12.txt
    lzlib 1.7 -3 6.59 MB/s 54 MB/s 176016 36.53 plrabn12.txt
    zstd 0.8.0 -15 12 MB/s 705 MB/s 176419 36.61 plrabn12.txt
    zling 2016年01月10日 -2 44 MB/s 125 MB/s 176467 36.62 plrabn12.txt
    lzham 1.0 -d26 -1 3.42 MB/s 206 MB/s 176829 36.70 plrabn12.txt
    zling 2016年01月10日 -1 47 MB/s 124 MB/s 177283 36.79 plrabn12.txt
    csc 3.3 -1 21 MB/s 52 MB/s 178138 36.97 plrabn12.txt
    brotli 0.4.0 -8 14 MB/s 453 MB/s 178173 36.98 plrabn12.txt
    zstd 0.8.0 -11 26 MB/s 686 MB/s 178997 37.15 plrabn12.txt
    zling 2016年01月10日 -0 49 MB/s 122 MB/s 179345 37.22 plrabn12.txt
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio File
    lzlib 1.7 -9 1.70 MB/s 154 MB/s 39587 7.71 ptt5
    brotli 0.4.0 -11 0.55 MB/s 747 MB/s 40987 7.99 ptt5
    xz 5.2.2 -6 5.94 MB/s 209 MB/s 41945 8.17 ptt5
    xz 5.2.2 -9 5.64 MB/s 204 MB/s 41945 8.17 ptt5
    lzlib 1.7 -6 8.60 MB/s 147 MB/s 43341 8.44 ptt5
    lzma 9.38 -5 8.65 MB/s 242 MB/s 43495 8.47 ptt5
    zstd 0.8.0 -22 2.76 MB/s 1763 MB/s 43800 8.53 ptt5
    csc 3.3 -5 9.56 MB/s 201 MB/s 44506 8.67 ptt5
    lzham 1.0 -d26 -1 6.79 MB/s 444 MB/s 46023 8.97 ptt5
    csc 3.3 -3 33 MB/s 197 MB/s 46486 9.06 ptt5
    lzham 1.0 -d26 -0 15 MB/s 429 MB/s 47189 9.19 ptt5
    lzlib 1.7 -3 20 MB/s 140 MB/s 47221 9.20 ptt5
    glza 0.7.1 0.22 MB/s 103 MB/s 47337 9.22 ptt5
    csc 3.3 -1 57 MB/s 195 MB/s 47514 9.26 ptt5
    lzma 9.38 -0 70 MB/s 223 MB/s 47570 9.27 ptt5
    lzlib 1.7 -0 74 MB/s 138 MB/s 47693 9.29 ptt5
    xz 5.2.2 -0 68 MB/s 199 MB/s 48249 9.40 ptt5
    xz 5.2.2 -3 37 MB/s 197 MB/s 48356 9.42 ptt5
    brotli 0.4.0 -8 41 MB/s 833 MB/s 48415 9.43 ptt5
    brotli 0.4.0 -5 91 MB/s 823 MB/s 48417 9.43 ptt5
    tornado 0.6a -16 3.49 MB/s 445 MB/s 48650 9.48 ptt5
    zstd 0.8.0 -15 32 MB/s 1973 MB/s 49530 9.65 ptt5
    zstd 0.8.0 -18 24 MB/s 1819 MB/s 49734 9.69 ptt5
    zstd 0.8.0 -8 162 MB/s 1846 MB/s 49863 9.72 ptt5
    zstd 0.8.0 -11 130 MB/s 1859 MB/s 49924 9.73 ptt5
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    xz 5.2.2 -9 3.45 MB/s 62 MB/s 9406 24.60 sum
    xz 5.2.2 -6 4.35 MB/s 68 MB/s 9406 24.60 sum
    lzlib 1.7 -9 4.24 MB/s 61 MB/s 9414 24.62 sum
    lzma 9.38 -5 2.56 MB/s 81 MB/s 9422 24.64 sum
    lzlib 1.7 -6 5.66 MB/s 60 MB/s 9447 24.70 sum
    lzlib 1.7 -0 6.21 MB/s 60 MB/s 9463 24.75 sum
    lzlib 1.7 -3 9.36 MB/s 59 MB/s 10068 26.33 sum
    xz 5.2.2 -0 26 MB/s 67 MB/s 10159 26.57 sum
    brotli 0.4.0 -11 0.86 MB/s 283 MB/s 10198 26.67 sum
    lzma 9.38 -2 24 MB/s 79 MB/s 10229 26.75 sum
    lzma 9.38 -4 11 MB/s 79 MB/s 10229 26.75 sum
    xz 5.2.2 -3 11 MB/s 66 MB/s 10237 26.77 sum
    lzma 9.38 -0 27 MB/s 78 MB/s 10282 26.89 sum
    csc 3.3 -5 7.95 MB/s 69 MB/s 10459 27.35 sum
    csc 3.3 -3 10 MB/s 69 MB/s 10553 27.60 sum
    lzham 1.0 -d26 -1 4.99 MB/s 113 MB/s 10948 28.63 sum
    csc 3.3 -1 25 MB/s 67 MB/s 11144 29.14 sum
    zstd 0.8.0 -22 7.52 MB/s 616 MB/s 11170 29.21 sum
    zstd 0.8.0 -18 8.45 MB/s 616 MB/s 11172 29.22 sum
    zstd 0.8.0 -15 10 MB/s 616 MB/s 11193 29.27 sum
    brotli 0.4.0 -8 10 MB/s 354 MB/s 11535 30.16 sum
    brotli 0.4.0 -5 36 MB/s 354 MB/s 11556 30.22 sum
    lzham 1.0 -d26 -0 7.27 MB/s 108 MB/s 11638 30.43 sum
    xpack 2016年06月02日 -9 70 MB/s 780 MB/s 11670 30.52 sum
    xpack 2016年06月02日 -6 89 MB/s 796 MB/s 11691 30.57 sum
    (#26) glza 0.7.1 0.16 MB/s 24 MB/s 11896 31.11 sum
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    brotli 0.4.0 -11 0.89 MB/s 325 MB/s 1463 34.61 xargs.1
    glza 0.7.1 0.01 MB/s 9.69 MB/s 1592 37.66 xargs.1
    brotli 0.4.0 -8 3.82 MB/s 422 MB/s 1649 39.01 xargs.1
    brotli 0.4.0 -5 14 MB/s 422 MB/s 1655 39.15 xargs.1
    zlib 1.2.8 -6 64 MB/s 431 MB/s 1736 41.07 xargs.1
    zlib 1.2.8 -9 63 MB/s 428 MB/s 1736 41.07 xargs.1
    zstd 0.8.0 -22 11 MB/s 494 MB/s 1739 41.14 xargs.1
    zstd 0.8.0 -18 11 MB/s 494 MB/s 1739 41.14 xargs.1
    zstd 0.8.0 -15 11 MB/s 494 MB/s 1739 41.14 xargs.1
    zstd 0.8.0 -11 17 MB/s 499 MB/s 1745 41.28 xargs.1
    zstd 0.8.0 -8 91 MB/s 523 MB/s 1746 41.31 xargs.1
    zstd 0.8.0 -5 117 MB/s 516 MB/s 1749 41.38 xargs.1
    lzma 9.38 -5 0.47 MB/s 65 MB/s 1752 41.45 xargs.1
    xpack 2016年06月02日 -9 82 MB/s 676 MB/s 1765 41.76 xargs.1
    xz 5.2.2 -6 2.58 MB/s 50 MB/s 1766 41.78 xargs.1
    xz 5.2.2 -9 1.81 MB/s 28 MB/s 1766 41.78 xargs.1
    xpack 2016年06月02日 -6 82 MB/s 676 MB/s 1767 41.80 xargs.1
    lzlib 1.7 -9 5.14 MB/s 45 MB/s 1779 42.09 xargs.1
    lzlib 1.7 -6 5.74 MB/s 45 MB/s 1782 42.16 xargs.1
    lzlib 1.7 -0 6.53 MB/s 45 MB/s 1789 42.32 xargs.1
    lzma 9.38 -4 1.98 MB/s 64 MB/s 1799 42.56 xargs.1
    lzma 9.38 -2 11 MB/s 64 MB/s 1799 42.56 xargs.1
    lzma 9.38 -0 18 MB/s 64 MB/s 1799 42.56 xargs.1
    xz 5.2.2 -3 4.04 MB/s 51 MB/s 1811 42.84 xargs.1
    lzlib 1.7 -3 8.40 MB/s 45 MB/s 1811 42.84 xargs.1
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.33 MB/s 72 MB/s 716442 17.70 bible.txt
    csc 3.3 -5 5.07 MB/s 87 MB/s 851882 21.05 bible.txt
    lzlib 1.7 -9 2.36 MB/s 87 MB/s 884235 21.85 bible.txt
    xz 5.2.2 -6 2.88 MB/s 115 MB/s 885002 21.87 bible.txt
    xz 5.2.2 -9 2.86 MB/s 115 MB/s 885002 21.87 bible.txt
    lzlib 1.7 -6 2.65 MB/s 87 MB/s 886785 21.91 bible.txt
    lzma 9.38 -5 2.79 MB/s 127 MB/s 888300 21.95 bible.txt
    brotli 0.4.0 -11 0.67 MB/s 585 MB/s 890810 22.01 bible.txt
    zstd 0.8.0 -22 3.25 MB/s 933 MB/s 894717 22.11 bible.txt
    tornado 0.6a -16 3.15 MB/s 294 MB/s 898872 22.21 bible.txt
    csc 3.3 -3 8.31 MB/s 80 MB/s 901173 22.27 bible.txt
    zstd 0.8.0 -18 4.08 MB/s 927 MB/s 905974 22.38 bible.txt
    tornado 0.6a -13 6.22 MB/s 266 MB/s 966485 23.88 bible.txt
    zstd 0.8.0 -15 7.19 MB/s 947 MB/s 972617 24.03 bible.txt
    lzham 1.0 -d26 -1 3.20 MB/s 312 MB/s 984755 24.33 bible.txt
    csc 3.3 -1 28 MB/s 78 MB/s 989332 24.44 bible.txt
    tornado 0.6a -10 9.90 MB/s 283 MB/s 989367 24.44 bible.txt
    zling 2016年01月10日 -4 50 MB/s 226 MB/s 1000857 24.73 bible.txt
    zling 2016年01月10日 -3 62 MB/s 224 MB/s 1015300 25.09 bible.txt
    brotli 0.4.0 -8 12 MB/s 608 MB/s 1017025 25.13 bible.txt
    zstd 0.8.0 -11 24 MB/s 879 MB/s 1031151 25.48 bible.txt
    tornado 0.6a -7 27 MB/s 267 MB/s 1031847 25.49 bible.txt
    zling 2016年01月10日 -2 78 MB/s 224 MB/s 1040712 25.71 bible.txt
    crush 1.0 -2 0.45 MB/s 454 MB/s 1045206 25.82 bible.txt
    xz 5.2.2 -3 8.21 MB/s 94 MB/s 1056901 26.11 bible.txt
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.12 MB/s 24 MB/s 1137825 24.53 E.coli
    brotli 0.4.0 -11 0.44 MB/s 384 MB/s 1138119 24.54 E.coli
    tornado 0.6a -16 2.12 MB/s 241 MB/s 1181243 25.47 E.coli
    lzma 9.38 -5 1.71 MB/s 129 MB/s 1185115 25.55 E.coli
    xz 5.2.2 -9 1.80 MB/s 120 MB/s 1186258 25.57 E.coli
    xz 5.2.2 -6 1.80 MB/s 121 MB/s 1186258 25.57 E.coli
    lzlib 1.7 -6 1.64 MB/s 82 MB/s 1186445 25.58 E.coli
    lzlib 1.7 -9 1.62 MB/s 82 MB/s 1187207 25.59 E.coli
    zstd 0.8.0 -22 2.08 MB/s 790 MB/s 1198623 25.84 E.coli
    zstd 0.8.0 -18 2.20 MB/s 774 MB/s 1198895 25.85 E.coli
    csc 3.3 -5 1.90 MB/s 118 MB/s 1208731 26.06 E.coli
    tornado 0.6a -13 7.96 MB/s 203 MB/s 1224769 26.40 E.coli
    lzham 1.0 -d26 -1 3.48 MB/s 241 MB/s 1243583 26.81 E.coli
    zstd 0.8.0 -15 3.95 MB/s 827 MB/s 1269985 27.38 E.coli
    csc 3.3 -3 5.54 MB/s 87 MB/s 1273792 27.46 E.coli
    tornado 0.6a -10 9.06 MB/s 254 MB/s 1277805 27.55 E.coli
    lzlib 1.7 -3 7.26 MB/s 83 MB/s 1283007 27.66 E.coli
    zlib 1.2.8 -9 1.34 MB/s 479 MB/s 1299717 28.02 E.coli
    brotli 0.4.0 -8 5.38 MB/s 503 MB/s 1317226 28.40 E.coli
    xpack 2016年06月02日 -9 4.84 MB/s 1183 MB/s 1325115 28.57 E.coli
    tornado 0.6a -7 29 MB/s 207 MB/s 1325226 28.57 E.coli
    zstd 0.8.0 -11 23 MB/s 645 MB/s 1333833 28.75 E.coli
    xz 5.2.2 -3 9.05 MB/s 76 MB/s 1335356 28.79 E.coli
    zlib 1.2.8 -6 6.68 MB/s 449 MB/s 1341990 28.93 E.coli
    zstd 0.8.0 -8 39 MB/s 604 MB/s 1349606 29.09 E.coli
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.7.1 0.12 MB/s 66 MB/s 414039 16.74 world192.txt
    brotli 0.4.0 -11 0.69 MB/s 520 MB/s 475200 19.21 world192.txt
    csc 3.3 -5 5.18 MB/s 89 MB/s 482532 19.51 world192.txt
    lzlib 1.7 -9 2.82 MB/s 87 MB/s 483798 19.56 world192.txt
    xz 5.2.2 -9 3.69 MB/s 113 MB/s 487374 19.70 world192.txt
    xz 5.2.2 -6 3.83 MB/s 113 MB/s 487374 19.70 world192.txt
    lzlib 1.7 -6 3.92 MB/s 86 MB/s 496279 20.06 world192.txt
    lzma 9.38 -5 4.24 MB/s 125 MB/s 499271 20.19 world192.txt
    zstd 0.8.0 -22 3.55 MB/s 1028 MB/s 506441 20.48 world192.txt
    tornado 0.6a -16 3.48 MB/s 302 MB/s 507530 20.52 world192.txt
    zstd 0.8.0 -18 6.00 MB/s 1053 MB/s 522618 21.13 world192.txt
    zstd 0.8.0 -15 8.63 MB/s 1069 MB/s 532074 21.51 world192.txt
    tornado 0.6a -13 6.78 MB/s 285 MB/s 534502 21.61 world192.txt
    csc 3.3 -3 11 MB/s 82 MB/s 535400 21.65 world192.txt
    zling 2016年01月10日 -4 64 MB/s 236 MB/s 538842 21.79 world192.txt
    tornado 0.6a -10 10 MB/s 296 MB/s 541285 21.88 world192.txt
    brotli 0.4.0 -8 18 MB/s 623 MB/s 541411 21.89 world192.txt
    lzham 1.0 -d26 -1 3.48 MB/s 320 MB/s 545944 22.07 world192.txt
    zling 2016年01月10日 -3 71 MB/s 234 MB/s 546018 22.08 world192.txt
    xz 5.2.2 -3 12 MB/s 106 MB/s 550228 22.25 world192.txt
    zling 2016年01月10日 -2 82 MB/s 231 MB/s 556779 22.51 world192.txt
    zstd 0.8.0 -11 31 MB/s 1009 MB/s 558678 22.59 world192.txt
    tornado 0.6a -7 27 MB/s 281 MB/s 561201 22.69 world192.txt
    zling 2016年01月10日 -1 89 MB/s 227 MB/s 569705 23.03 world192.txt
    crush 1.0 -2 1.81 MB/s 490 MB/s 572170 23.13 world192.txt
    GLZA has the best compression ratio on 8 of the 14 files but is the slowest in almost all cases. Brotli has the best compression ratio on 4 of the 14 files and is fast, it's really pretty impressive. It would be nice if Kraken were available to see how it compares.
    Attached Files Attached Files
    Last edited by Kennon Conrad; 15th August 2016 at 09:33.

  30. Thanks (5):

    Bulat Ziganshin (15th August 2016),Jyrki Alakuijala (16th August 2016),Mike (15th August 2016),pixar (20th August 2016),surfersat (28th September 2016)

  31. 16th August 2016, 04:41 #803
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    485
    Thanks
    152
    Thanked 70 Times in 50 Posts
    Kennon, have you tried Zstd with a dictionary? You can train a dictionary once then use it forever. It should be trained to relevant content, even training it on the target file itself should work. Brotli has a built in text/web dictionary, so for that kind of content, the best comparison would be Zstd with a similar dictionary.

  32. 16th August 2016, 09:42 #804
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by SolidComp View Post
    Kennon, have you tried Zstd with a dictionary? You can train a dictionary once then use it forever. It should be trained to relevant content, even training it on the target file itself should work. Brotli has a built in text/web dictionary, so for that kind of content, the best comparison would be Zstd with a similar dictionary.
    No, I have never tried Zstd at all outside of my recent work on lzbench integration. I have noticed it has impressive performance and follow the thread but that's all. You are right, it's not really fair to compare compressors with options set differently but Brotli gets the advantage because it does that by default. Since most compressors don't have a built in dictionary, it seems like it might be best to turn Brotli's off. I have tried turning off the dictionary in brotli when testing separately but never figured out how to do it. It might be good if lzbench turned off the dictionary too, if possible.

    It seems like a dictionary could help GLZA too, maybe more than for Brotli since it's more dictionary based. There is usually a lot of commonality between the dictionaries produced by GLZA for different files. There's probably no reason GLZA couldn't start with a basic dictionary of the (really) common words/structures/phrases and go from there. Also, a dictionary could be used to train the o(1) trailing/leading character model, which would help a bit, most noticably on small files.

  33. 26th August 2016, 06:30 #805
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    485
    Thanks
    152
    Thanked 70 Times in 50 Posts
    Quote Originally Posted by Kennon Conrad View Post
    It seems like a dictionary could help GLZA too, maybe more than for Brotli since it's more dictionary based. There is usually a lot of commonality between the dictionaries produced by GLZA for different files. There's probably no reason GLZA couldn't start with a basic dictionary of the (really) common words/structures/phrases and go from there. Also, a dictionary could be used to train the o(1) trailing/leading character model, which would help a bit, most noticably on small files.
    For web compression (HTML, CSS, JS), we really ought to have a simple dictionary for the tags, keywords, and URL strings – I'm still surprised that the web, as important as it is, doesn't have any sort of tailored compression. I'd be very curious to see how GLZA does with dictionaries, since it's already so good. I also think some sort of interpolated string dictionary would be interesting. We tend to focus on continuous strings, but multi-part punctuated or interpolated strings have a lot of potential. In the simplest case, we have start tags and end tags in XML/HTML that we ought to encode as a unit, and we have lots of URLs that start with the same https:// sequence and end with the same .com or .jpg strings, differing only in the middle section.

    By the way, do you have updated RAM memory usage figures for GLZA compression and decompression?

  34. 28th August 2016, 00:10 #806
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    I am sorry for the slow reply, I wasn't feeling well the last few days.

    Quote Originally Posted by SolidComp View Post
    For web compression (HTML, CSS, JS), we really ought to have a simple dictionary for the tags, keywords, and URL strings – I'm still surprised that the web, as important as it is, doesn't have any sort of tailored compression.
    Have you considered XWRT? I have never been able to get better compression ratios using it as a preprocessor for GLZA on web pages but it does seems to help several other compressors and I think it is at least somewhat tailored to the web, at least for tags.

    Quote Originally Posted by SolidComp View Post
    I'd be very curious to see how GLZA does with dictionaries, since it's already so good.
    Me too! Unfortunately, I think it's a little more complicated than with other LZ style programs such as Brotli. It seems that with LZ77 you could just put a dictionary in the history at the start and use that to send matches right from the start. You could do something similar with GLZ but it's a little different because it references grammar rules instead of offsets. Normally if a string like "<page>" was used at the start of a file, it would not be likely to be sent as just one rule S1 -> <page>, but instead it might be sent S1 -> <S2>, S2 -> pS3, S3 -> age, so that "age" and "page" also go into the dictionary. So I need to decide how to approach this deviation from the normal case. If I pull <page> out of a prebuilt history, I need to decide if GLZA would create the other dictionary entries at that time by adding additional codes to indicate which substrings should go into the dictionary, or whether putting substrings into the dictionary would be deferred until later when (and if) those substrings appear for the first time and are not coming from the dictionary. Honestly, I'm not sure which is the "right" way to go.

    Quote Originally Posted by SolidComp View Post
    I also think some sort of interpolated string dictionary would be interesting. We tend to focus on continuous strings, but multi-part punctuated or interpolated strings have a lot of potential. In the simplest case, we have start tags and end tags in XML/HTML that we ought to encode as a unit, and we have lots of URLs that start with the same https:// sequence and end with the same .com or .jpg strings, differing only in the middle section.
    Yes, I completely agree. You are getting into some areas that Paul and I have discussed a bit. For tags, it seems like production rules such as Stag -> <tag>S</tag> could be very useful if supported and used properly. I think it makes the grammar into one that is not straight-line, but don't see why that would be a problem. Similary, production rules like Shttps-com -> https://S.com could be used for some URLs. Since GLZ uses production rules, this seems like a natural fit. I have wanted to get to a good baseline before getting serious about adding support for this (it's not trivial, at least for me!) but it seems totally reasonable to add, as long as it is applied properly (probably not a good idea to use Spage -> <page>S</page> on enwik's because it's probably more effective to just deduplicate "</page><page>"). This brings me back to the dictionary idea and the thought that maybe it's not just a preloaded "dictionary" that's best for GLZA, but also preloaded "skip" production rules and a mechanism to create them on the fly from just the "tag".

    Quote Originally Posted by SolidComp View Post
    By the way, do you have updated RAM memory usage figures for GLZA compression and decompression?
    For enwik9 or which files? I don't have exact numbers but can get them. In general, both compression and decompression memory usage is as high as ever for large files. For compression, that's because I bumped up memory use (can impact window size) and recently made the memory use option unavailable (I will put it back in fairly soon if people care). For decompression it is because lzbench (etc.) expect a buffer with the entire file so I took out some of the buffer management code that allowed the buffer to be much smaller. I will put this back in fairly soon for standalone mode because if the data is being written to disk it doesn't need the history.

    I am curious, which is important to you?

  35. 30th August 2016, 08:09 #807
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    485
    Thanks
    152
    Thanked 70 Times in 50 Posts
    Quote Originally Posted by Kennon Conrad View Post
    I am sorry for the slow reply, I wasn't feeling well the last few days.



    Have you considered XWRT? I have never been able to get better compression ratios using it as a preprocessor for GLZA on web pages but it does seems to help several other compressors and I think it is at least somewhat tailored to the web, at least for tags.
    Yes, I like XWRT a lot. However it's not supported by browsers, so I can't use it. I was referring to the fact that there are no web-specific compression solutions that are supported by browsers. It's really strange that we only have generic deflate.


    Yes, I completely agree. You are getting into some areas that Paul and I have discussed a bit. For tags, it seems like production rules such as Stag -> <tag>S</tag> could be very useful if supported and used properly. I think it makes the grammar into one that is not straight-line, but don't see why that would be a problem. Similary, production rules like Shttps-com -> https://S.com could be used for some URLs. Since GLZ uses production rules, this seems like a natural fit. I have wanted to get to a good baseline before getting serious about adding support for this (it's not trivial, at least for me!) but it seems totally reasonable to add, as long as it is applied properly (probably not a good idea to use Spage -> <page>S</page> on enwik's because it's probably more effective to just deduplicate "</page><page>"). This brings me back to the dictionary idea and the thought that maybe it's not just a preloaded "dictionary" that's best for GLZA, but also preloaded "skip" production rules and a mechanism to create them on the fly from just the "tag".
    Yes, that's exactly the approach I'm thinking of. It might help to combine it with normalization of HTML, CSS, and JS files. They're such a mess, and the specs are terrible, with bizarre statements allowing zero or more spaces in arbitrary places for no reason. That's not how specs are normally written. But one could normalize and minify these files in ways that make compression easier, faster, better, etc. And we could be smarter about string matching and knowing when to not bother, like when we scan the enormous URLs created by some servers on the fly (like Google's PageSpeed module). If we scan a URL like this one:

    (You might have to hover over it – the site is truncating it in the code box). We should know that there's not going to be a significant string match beyond ten or so characters into the path (unless we configure our apps to generate nearly identical URLs when they do this, which I don't think anyone does yet).

    For enwik9 or which files? I don't have exact numbers but can get them. In general, both compression and decompression memory usage is as high as ever for large files. For compression, that's because I bumped up memory use (can impact window size) and recently made the memory use option unavailable (I will put it back in fairly soon if people care). For decompression it is because lzbench (etc.) expect a buffer with the entire file so I took out some of the buffer management code that allowed the buffer to be much smaller. I will put this back in fairly soon for standalone mode because if the data is being written to disk it doesn't need the history.

    I am curious, which is important to you?
    Decompression memory and CPU use are most important to me. I asked because I had seen some discussion of these topics from a few months ago, I think, and it sounded like there were going to be code changes aimed at improving memory use, but I might be remembering it wrong. In general, I'm becoming fairly strident in my position that I can't do anything with compression benchmarks that don't report memory and CPU use for both compression and decompression. I think we get too distracted by the glitter of compression ratios and speed, but a lot of these "winning" codecs use enormous memory and CPU. If something uses 1 GiB of RAM and 100% of a flagship Android phone's CPU to decompress then it's only usable on powerful desktops. I'm most interested in codecs that can replace gzip on the web, in mobile devices and such, so the target content would be much smaller than enwik9 and diverse – HTML, CSS, and JS files, maybe JSON and CSV data (there's an upcoming CSV in HTML standard).

  36. 31st August 2016, 12:14 #808
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts
    Quote Originally Posted by SolidComp View Post
    Yes, I like XWRT a lot. However it's not supported by browsers, so I can't use it. I was referring to the fact that there are no web-specific compression solutions that are supported by browsers. It's really strange that we only have generic deflate.
    Thanks for the clarification. This does seem strange. I wonder if it's related to deflates low memory usage.

    Quote Originally Posted by SolidComp View Post
    Yes, that's exactly the approach I'm thinking of. It might help to combine it with normalization of HTML, CSS, and JS files. They're such a mess, and the specs are terrible, with bizarre statements allowing zero or more spaces in arbitrary places for no reason. That's not how specs are normally written. But one could normalize and minify these files in ways that make compression easier, faster, better, etc. And we could be smarter about string matching and knowing when to not bother, like when we scan the enormous URLs created by some servers on the fly (like Google's PageSpeed module).
    So they have specs? I have been looking and not finding anything useful but maybe need to look harder. I don't know much about parsers but from a little reading it sounds like something like an LL parser should be used.

    You may already know this but GLZA's string matching is a lot different from LZ77 or deflate. Instead of searching for the most efficient local transmission of matches/literals the algorithm recursively searches for the best global matches and creates rules for those. So instead of knowing not to bother with looking for matches in a section of data, for GLZA you would want to not include that section of data in the suffix tree that is built for global match finding. I think this wouldn't be much time saving if the data doesn't have long matches because the code doesn't have to traverse very far in the tree before it reaches a leaf. I think what's really important is being able to efficiently find the strings that provide the most immediate compression while minimizing the loss in "future" compressibility of the file. So smart string finding via parsing and HTML token recognition is of interest to me because it may improve speed (significantly?) and compression ratio (slightly?), plus it seems like there may be some synergies with tag finding, so I'd like to find some decent specs on the structures to get a better idea of what smarter string matching can mean.

    Quote Originally Posted by SolidComp View Post
    Decompression memory and CPU use are most important to me. I asked because I had seen some discussion of these topics from a few months ago, I think, and it sounded like there were going to be code changes aimed at improving memory use, but I might be remembering it wrong.
    There were code changes a while back that decreased memory usage for decoding. v0.5 - v0.7 are best, generally Pareto Frontier decoding memory for large texty files and a few MB for small files, which can probably be improved. Starting with v0.7.1, I changed to decoder to support a buffer with all the decoded data to be compatible with lzbench so memory usage is not as good and virtual memory use went way up until I find time to add some code. I need to decide whether to put the standalone version back to the way it was or not. When I started I only thought about decompression to disk, but I was novice at compression. Now I think decompression to RAM may be more important (or at least a better fit for what GLZA does well) and am thinking it might be better to live with having decompression require at least as much memory as the decompressed data takes and have the dictionary pointers point to history rather than a separate dictionary.

    Quote Originally Posted by SolidComp View Post
    I think we get too distracted by the glitter of compression ratios and speed, but a lot of these "winning" codecs use enormous memory and CPU. If something uses 1 GiB of RAM and 100% of a flagship Android phone's CPU to decompress then it's only usable on powerful desktops. I'm most interested in codecs that can replace gzip on the web, in mobile devices and such, so the target content would be much smaller than enwik9 and diverse – HTML, CSS, and JS files, maybe JSON and CSV data (there's an upcoming CSV in HTML standard).
    Just to be clear, GLZA does use lots of memory and CPU for compression compared to most or all LZxx compressors. Decompression characteristics are much better; if there's a problem on an Android, then it's something I did wrong in the coding. In the medium to long run, I could see where GLZ could be very useful for decompressing web data. In the short term, it's probably a little immature. Compression ratios are likely to improve at least a little over time, speed can be improved, code can be cleaner, etc.

  37. 31st August 2016, 13:22 #809
    Member SolidComp's Avatar
    Join Date
    Jun 2015
    Location
    USA
    Posts
    485
    Thanks
    152
    Thanked 70 Times in 50 Posts
    Quote Originally Posted by Kennon Conrad View Post
    Thanks for the clarification. This does seem strange. I wonder if it's related to deflates low memory usage.

    So they have specs? I have been looking and not finding anything useful but maybe need to look harder. I don't know much about parsers but from a little reading it sounds like something like an LL parser should be used.
    Yeah, there are specs for HTML, CSS, and JS, though for HTML there are two different specs from different organizations. Thankfully, the two specs are similar enough, but they're extremely poorly written and constructed. I guess the science and art of spec writing is somewhat immature, and we don't have a reliable pipeline of people trained in it. They should also expand their medium to include authoritative graphical representations of some of their concepts, and use machine-readable formats as well. They rely too much on English, but they don't know how to write good specs in English. Here's one of the HTML5 specs: https://www.w3.org/TR/html5/

    You may already know this but GLZA's string matching is a lot different from LZ77 or deflate. Instead of searching for the most efficient local transmission of matches/literals the algorithm recursively searches for the best global matches and creates rules for those. So instead of knowing not to bother with looking for matches in a section of data, for GLZA you would want to not include that section of data in the suffix tree that is built for global match finding. I think this wouldn't be much time saving if the data doesn't have long matches because the code doesn't have to traverse very far in the tree before it reaches a leaf. I think what's really important is being able to efficiently find the strings that provide the most immediate compression while minimizing the loss in "future" compressibility of the file. So smart string finding via parsing and HTML token recognition is of interest to me because it may improve speed (significantly?) and compression ratio (slightly?), plus it seems like there may be some synergies with tag finding, so I'd like to find some decent specs on the structures to get a better idea of what smarter string matching can mean.
    I'm embarrassed to admit that I'm not familiar with GLZA's string matching algorithm(s), just with GLZA's remarkable performance. Your approach to smarter string matching sounds promising. gzip isn't aware that it is compressing HTML, CSS, or JS. It has no idea that body> won't be repeated until near the end of the document, doesn't have any concept of tags or attributes or any idea what to expect. There are interesting opportunities for smarter string matching. Take the following snippet from Apple's iPhone page:

    Code:
    <meta property="analytics-s-channel" content="iphone.tab+other" />
    <meta property="analytics-s-bucket-0" content="appleglobal,apple{COUNTRY_CODE}iphonetab" />
    <meta property="analytics-s-bucket-1" content="apple{COUNTRY_CODE}global,apple{COUNTRY_CODE}iphonetab" />
    <meta property="analytics-s-bucket-2" content="apple{COUNTRY_CODE}global" />
    <meta property="analytics-s-bucket-store" content="applestoreww,applestoreamr,applestoreus" />
    Gzip/deflate will lose the match after bucket-, and will need to code the enumerations as literals 0, 1, 2, etc., then start a new match (with new symbols) with " content="apple, or something along those lines. This isn't the best example of broken-up matches, but it illustrates some of the common phenomena. There's a lot of opportunity for smarter interpolated or permuted matches, and much of it could be informed by the nature of the section or tag. (meta tags are good places to look for such strings.) Brotli has an interesting permutable dictionary that I'd like to dig into some more. ("permutable dictionary" is what I call it – I don't know if there's a more official label for that kind of dictionary)

    There's also an opportunity for manipulating/changing the data, which we normally assume we can't do when compressing. In the case of HTML, CSS, and JS, simple minification and normalization transformations are possible without changing the meaning of the code. For example, a lot of times the above meta elements will vary in how the tag is closed. Here we see the XML-style />, but in HTML5 these elements are supposed to be closed without a slash, just with >. You'll often see some with the slash, and some without, in the same HTML file, and with or without a space before, which busts the deflate string match into at least two permutations (assuming there's a match at the end of the element, or combining the tag closure with the start of the next element, like /><meta. A compressor that knew that it could normalize all those as no-space and no-slash would help in some cases. (and stripping all the CRs from CRLF combos, normalizing the ordering of attributes, and lots of other things.)

    Just to be clear, GLZA does use lots of memory and CPU for compression compared to most or all LZxx compressors. Decompression characteristics are much better; if there's a problem on an Android, then it's something I did wrong in the coding. In the medium to long run, I could see where GLZ could be very useful for decompressing web data. In the short term, it's probably a little immature. Compression ratios are likely to improve at least a little over time, speed can be improved, code can be cleaner, etc.
    Oh I didn't mean to suggest that there was anything wrong with GLZA, on Android or any other platform. My point was a general one. I have no idea how GLZA performs in terms of decomp resources, and I wholly agree that it's too early to hold GLZA to high standards for efficiency as a web compression format. I think it has great potential. One other consideration on smart string matching is that with the web, where you have both HTML and CSS content referring to the same elements or objects in a tree like the DOM, there's an opportunity to bundle the attributes and style directives together on the encoding of the relevant element. A compressor that did this would eliminate a lot of bloat and redundancy in how a CSS file repeats strings from an HTML file.

    FYI, Mahoney's LTCB page says that GLZA is not a general-purpose compressor and is only designed to compress enwik9. I take it things have changed and this is no longer the case? You might want to have him update that passage.

  38. Thanks:

    Kennon Conrad (1st September 2016)

  39. 27th September 2016, 10:48 #810
    Member
    Join Date
    Jan 2014
    Location
    Bothell, Washington, USA
    Posts
    704
    Thanks
    156
    Thanked 186 Times in 109 Posts

    GLZA v0.8

    GLZA v0.8 includes the following changes compared to v0.7.1:

    1. Bug fix for files that start with three or more capital letters and "bug" fix for incorrect delta filter decision math that made lzt24 compression worse than it should.
    2. There are 17 dictionaries and models for extended UTF-8 symbols instead of 1 so that Greek, Latin, Cyrillic, Hebrew, etc. can have unique trailing/leading symbol models. This should improve compression of multi-lingual files and it helps on the wiki's, but my test set is limited.
    3. Lots of little changes that sometimes give faster compression, typically just a few percent, but up to about 500% faster on some of my test files.
    4. Added -c#, -p#, and -r# command line options back in, c is the production cost in bits, p is a factor used to favor longer strings over most compressive, and r sets the compression memory use in MB.

    Results for enwik8:
    GLZA c enwik8 enwik8.glza: 20,472,828 bytes in 431 sec., 3,321 MB; decompress 1.8 sec., 47 MB.
    GLZA c -p3 enwik8 enwik8.glza: 20,442,490 bytes in 493 sec., 3,257 MB; decompress 1.8 sec., 48 MB.

    Results for enwik9:
    GLZA c enwik9 enwik9.glza: 164,943,294 bytes in 9,328 sec., 12,673 MB; decompress 15.8 sec., 363 MB.
    GLZA c -p3 enwik9 enwik9.glza: 164,634,038 bytes in 10,106 sec., 12,369 MB; decompress 15.8 sec., 364 MB.

    The source code in a .zip file is 64,327 bytes so enwik9 (-p3) + code = 164,698,365 bytes. Matt, if you read this, could you also update the description as SolidComp mentions to indicate GLZA is general purpose (but most effective on text)?

    Top twenty results by compression ratio on the ten wiki's from the xml test in Stephan Busch's Squeezechart with my custom lzbench:

    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.17 MB/s 77 MB/s 15111331 15.11 arwiki-20090209-pages-articles.xml
    xz 5.2.2 -9 1.88 MB/s 127 MB/s 17801068 17.80 arwiki-20090209-pages-articles.xml
    lzlib 1.7 -9 1.61 MB/s 92 MB/s 17898415 17.90 arwiki-20090209-pages-articles.xml
    csc 3.3 -5 2.17 MB/s 118 MB/s 18468350 18.47 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -22 1.87 MB/s 595 MB/s 18666760 18.67 arwiki-20090209-pages-articles.xml
    lzma 9.38 -5 2.27 MB/s 133 MB/s 18892558 18.89 arwiki-20090209-pages-articles.xml
    tornado 0.6a -16 1.84 MB/s 277 MB/s 18927009 18.93 arwiki-20090209-pages-articles.xml
    xz 5.2.2 -6 2.42 MB/s 123 MB/s 18943046 18.94 arwiki-20090209-pages-articles.xml
    lzlib 1.7 -6 2.28 MB/s 90 MB/s 19157260 19.16 arwiki-20090209-pages-articles.xml
    brotli 0.4.0 -11 0.56 MB/s 599 MB/s 19455349 19.46 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -18 3.66 MB/s 796 MB/s 20786291 20.79 arwiki-20090209-pages-articles.xml
    tornado 0.6a -13 6.21 MB/s 255 MB/s 21622792 21.62 arwiki-20090209-pages-articles.xml
    tornado 0.6a -10 7.39 MB/s 251 MB/s 22082206 22.08 arwiki-20090209-pages-articles.xml
    csc 3.3 -3 7.34 MB/s 81 MB/s 22085718 22.09 arwiki-20090209-pages-articles.xml
    zling 2016年01月10日 -4 48 MB/s 245 MB/s 22407488 22.41 arwiki-20090209-pages-articles.xml
    lzham 1.0 -d26 -1 3.20 MB/s 305 MB/s 22416942 22.42 arwiki-20090209-pages-articles.xml
    zling 2016年01月10日 -3 61 MB/s 244 MB/s 22758777 22.76 arwiki-20090209-pages-articles.xml
    zstd 0.8.0 -15 7.55 MB/s 838 MB/s 23045717 23.05 arwiki-20090209-pages-articles.xml
    brotli 0.4.0 -8 11 MB/s 577 MB/s 23319342 23.32 arwiki-20090209-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.19 MB/s 58 MB/s 18432446 18.43 dewiki-20090311-pages-articles.xml
    xz 5.2.2 -9 1.76 MB/s 108 MB/s 22337284 22.34 dewiki-20090311-pages-articles.xml
    lzlib 1.7 -9 1.59 MB/s 77 MB/s 22559912 22.56 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -22 1.81 MB/s 505 MB/s 23012645 23.01 dewiki-20090311-pages-articles.xml
    tornado 0.6a -16 1.83 MB/s 241 MB/s 23367196 23.37 dewiki-20090311-pages-articles.xml
    lzma 9.38 -5 2.11 MB/s 113 MB/s 23383008 23.38 dewiki-20090311-pages-articles.xml
    csc 3.3 -5 2.66 MB/s 89 MB/s 23499178 23.50 dewiki-20090311-pages-articles.xml
    xz 5.2.2 -6 2.31 MB/s 103 MB/s 23927507 23.93 dewiki-20090311-pages-articles.xml
    lzlib 1.7 -6 2.17 MB/s 77 MB/s 24066730 24.07 dewiki-20090311-pages-articles.xml
    brotli 0.4.0 -11 0.63 MB/s 504 MB/s 24843180 24.84 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -18 3.54 MB/s 684 MB/s 25519360 25.52 dewiki-20090311-pages-articles.xml
    tornado 0.6a -13 5.73 MB/s 236 MB/s 25539161 25.54 dewiki-20090311-pages-articles.xml
    tornado 0.6a -10 6.90 MB/s 217 MB/s 26153155 26.15 dewiki-20090311-pages-articles.xml
    csc 3.3 -3 6.79 MB/s 78 MB/s 26344037 26.34 dewiki-20090311-pages-articles.xml
    lzham 1.0 -d26 -1 2.82 MB/s 248 MB/s 27006473 27.01 dewiki-20090311-pages-articles.xml
    tornado 0.6a -7 20 MB/s 225 MB/s 27527486 27.53 dewiki-20090311-pages-articles.xml
    zling 2016年01月10日 -4 43 MB/s 196 MB/s 27540390 27.54 dewiki-20090311-pages-articles.xml
    zling 2016年01月10日 -3 52 MB/s 205 MB/s 27835794 27.84 dewiki-20090311-pages-articles.xml
    zstd 0.8.0 -15 7.76 MB/s 713 MB/s 28192801 28.19 dewiki-20090311-pages-articles.xml
    zling 2016年01月10日 -2 62 MB/s 201 MB/s 28309613 28.31 dewiki-20090311-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.15 MB/s 54 MB/s 20663809 20.66 enwiki-20090306-pages-articles.xml
    csc 3.3 -5 3.20 MB/s 76 MB/s 24730845 24.73 enwiki-20090306-pages-articles.xml
    xz 5.2.2 -9 1.65 MB/s 94 MB/s 24897946 24.90 enwiki-20090306-pages-articles.xml
    lzlib 1.7 -9 1.56 MB/s 71 MB/s 25193126 25.19 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -22 1.72 MB/s 451 MB/s 25471920 25.47 enwiki-20090306-pages-articles.xml
    tornado 0.6a -16 1.75 MB/s 202 MB/s 25869767 25.87 enwiki-20090306-pages-articles.xml
    lzma 9.38 -5 1.96 MB/s 102 MB/s 25919628 25.92 enwiki-20090306-pages-articles.xml
    xz 5.2.2 -6 2.23 MB/s 92 MB/s 26385363 26.39 enwiki-20090306-pages-articles.xml
    lzlib 1.7 -6 2.06 MB/s 70 MB/s 26467752 26.47 enwiki-20090306-pages-articles.xml
    csc 3.3 -3 6.81 MB/s 71 MB/s 26623711 26.62 enwiki-20090306-pages-articles.xml
    brotli 0.4.0 -11 0.61 MB/s 460 MB/s 27035931 27.04 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -18 3.39 MB/s 645 MB/s 27729827 27.73 enwiki-20090306-pages-articles.xml
    tornado 0.6a -13 5.69 MB/s 218 MB/s 27948707 27.95 enwiki-20090306-pages-articles.xml
    csc 3.3 -1 20 MB/s 69 MB/s 28747189 28.75 enwiki-20090306-pages-articles.xml
    tornado 0.6a -10 6.23 MB/s 194 MB/s 28997414 29.00 enwiki-20090306-pages-articles.xml
    lzham 1.0 -d26 -1 2.74 MB/s 250 MB/s 29325456 29.33 enwiki-20090306-pages-articles.xml
    zling 2016年01月10日 -4 42 MB/s 194 MB/s 29540358 29.54 enwiki-20090306-pages-articles.xml
    zling 2016年01月10日 -3 49 MB/s 193 MB/s 29804829 29.80 enwiki-20090306-pages-articles.xml
    tornado 0.6a -7 19 MB/s 208 MB/s 30004796 30.00 enwiki-20090306-pages-articles.xml
    zstd 0.8.0 -15 7.81 MB/s 673 MB/s 30244817 30.24 enwiki-20090306-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.14 MB/s 57 MB/s 19487875 19.49 eswiki-20090124-pages-articles.xml
    xz 5.2.2 -9 1.67 MB/s 104 MB/s 22973553 22.97 eswiki-20090124-pages-articles.xml
    lzlib 1.7 -9 1.55 MB/s 76 MB/s 23257551 23.26 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -22 1.72 MB/s 479 MB/s 23659939 23.66 eswiki-20090124-pages-articles.xml
    tornado 0.6a -16 1.76 MB/s 233 MB/s 24081543 24.08 eswiki-20090124-pages-articles.xml
    lzma 9.38 -5 1.98 MB/s 109 MB/s 24101658 24.10 eswiki-20090124-pages-articles.xml
    csc 3.3 -5 2.22 MB/s 93 MB/s 24132085 24.13 eswiki-20090124-pages-articles.xml
    xz 5.2.2 -6 2.25 MB/s 99 MB/s 24604100 24.60 eswiki-20090124-pages-articles.xml
    lzlib 1.7 -6 2.07 MB/s 74 MB/s 24706819 24.71 eswiki-20090124-pages-articles.xml
    brotli 0.4.0 -11 0.60 MB/s 479 MB/s 25323130 25.32 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -18 3.40 MB/s 670 MB/s 26161278 26.16 eswiki-20090124-pages-articles.xml
    tornado 0.6a -13 5.75 MB/s 228 MB/s 26379024 26.38 eswiki-20090124-pages-articles.xml
    tornado 0.6a -10 6.55 MB/s 207 MB/s 27067931 27.07 eswiki-20090124-pages-articles.xml
    csc 3.3 -3 6.23 MB/s 79 MB/s 27453650 27.45 eswiki-20090124-pages-articles.xml
    lzham 1.0 -d26 -1 2.86 MB/s 262 MB/s 27968904 27.97 eswiki-20090124-pages-articles.xml
    tornado 0.6a -7 19 MB/s 220 MB/s 28386862 28.39 eswiki-20090124-pages-articles.xml
    zling 2016年01月10日 -4 43 MB/s 200 MB/s 28438083 28.44 eswiki-20090124-pages-articles.xml
    zstd 0.8.0 -15 7.74 MB/s 705 MB/s 28707523 28.71 eswiki-20090124-pages-articles.xml
    zling 2016年01月10日 -3 51 MB/s 200 MB/s 28724958 28.72 eswiki-20090124-pages-articles.xml
    brotli 0.4.0 -8 9.98 MB/s 541 MB/s 29074937 29.07 eswiki-20090124-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.18 MB/s 59 MB/s 18718785 18.72 frwiki-20090224-pages-articles.xml
    xz 5.2.2 -9 1.77 MB/s 108 MB/s 21992043 21.99 frwiki-20090224-pages-articles.xml
    lzlib 1.7 -9 1.60 MB/s 79 MB/s 22236448 22.24 frwiki-20090224-pages-articles.xml
    csc 3.3 -5 2.05 MB/s 105 MB/s 22430756 22.43 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -22 1.79 MB/s 506 MB/s 22741082 22.74 frwiki-20090224-pages-articles.xml
    lzma 9.38 -5 2.10 MB/s 114 MB/s 23059972 23.06 frwiki-20090224-pages-articles.xml
    tornado 0.6a -16 1.82 MB/s 242 MB/s 23137970 23.14 frwiki-20090224-pages-articles.xml
    xz 5.2.2 -6 2.34 MB/s 101 MB/s 23412050 23.41 frwiki-20090224-pages-articles.xml
    lzlib 1.7 -6 2.17 MB/s 77 MB/s 23546325 23.55 frwiki-20090224-pages-articles.xml
    brotli 0.4.0 -11 0.62 MB/s 501 MB/s 24094280 24.09 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -18 3.53 MB/s 698 MB/s 25031960 25.03 frwiki-20090224-pages-articles.xml
    tornado 0.6a -13 5.81 MB/s 236 MB/s 25363574 25.36 frwiki-20090224-pages-articles.xml
    tornado 0.6a -10 6.79 MB/s 217 MB/s 25972505 25.97 frwiki-20090224-pages-articles.xml
    csc 3.3 -3 6.41 MB/s 86 MB/s 26377779 26.38 frwiki-20090224-pages-articles.xml
    lzham 1.0 -d26 -1 2.87 MB/s 277 MB/s 26777656 26.78 frwiki-20090224-pages-articles.xml
    zling 2016年01月10日 -4 44 MB/s 207 MB/s 27150257 27.15 frwiki-20090224-pages-articles.xml
    tornado 0.6a -7 20 MB/s 227 MB/s 27305793 27.31 frwiki-20090224-pages-articles.xml
    zling 2016年01月10日 -3 52 MB/s 205 MB/s 27435686 27.44 frwiki-20090224-pages-articles.xml
    zstd 0.8.0 -15 7.93 MB/s 725 MB/s 27481008 27.48 frwiki-20090224-pages-articles.xml
    brotli 0.4.0 -8 10 MB/s 550 MB/s 27778927 27.78 frwiki-20090224-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.14 MB/s 100 MB/s 9960057 9.96 hiwiki-20090201-pages-articles.xml
    lzlib 1.7 -9 1.70 MB/s 125 MB/s 11841707 11.84 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -9 2.64 MB/s 180 MB/s 11987284 11.99 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -22 2.07 MB/s 930 MB/s 12584945 12.58 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -6 3.16 MB/s 174 MB/s 12707044 12.71 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -16 2.19 MB/s 394 MB/s 12786422 12.79 hiwiki-20090201-pages-articles.xml
    csc 3.3 -5 3.47 MB/s 163 MB/s 12794301 12.79 hiwiki-20090201-pages-articles.xml
    brotli 0.4.0 -11 0.62 MB/s 796 MB/s 12939409 12.94 hiwiki-20090201-pages-articles.xml
    lzlib 1.7 -6 3.16 MB/s 119 MB/s 13096161 13.10 hiwiki-20090201-pages-articles.xml
    lzma 9.38 -5 3.49 MB/s 185 MB/s 13111327 13.11 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -18 5.04 MB/s 1101 MB/s 14466596 14.47 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -10 10 MB/s 367 MB/s 14756446 14.76 hiwiki-20090201-pages-articles.xml
    tornado 0.6a -13 7.80 MB/s 350 MB/s 14903736 14.90 hiwiki-20090201-pages-articles.xml
    zstd 0.8.0 -15 8.63 MB/s 1139 MB/s 15476283 15.48 hiwiki-20090201-pages-articles.xml
    brotli 0.4.0 -8 19 MB/s 786 MB/s 15597658 15.60 hiwiki-20090201-pages-articles.xml
    zling 2016年01月10日 -4 75 MB/s 343 MB/s 15662654 15.66 hiwiki-20090201-pages-articles.xml
    lzham 1.0 -d26 -1 3.72 MB/s 421 MB/s 15666744 15.67 hiwiki-20090201-pages-articles.xml
    csc 3.3 -3 11 MB/s 106 MB/s 15750979 15.75 hiwiki-20090201-pages-articles.xml
    xz 5.2.2 -3 14 MB/s 141 MB/s 15880497 15.88 hiwiki-20090201-pages-articles.xml
    zling 2016年01月10日 -3 94 MB/s 341 MB/s 15933596 15.93 hiwiki-20090201-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.13 MB/s 58 MB/s 17633722 17.63 ptwiki-20090128-pages-articles.xml
    xz 5.2.2 -9 1.89 MB/s 112 MB/s 20962447 20.96 ptwiki-20090128-pages-articles.xml
    lzlib 1.7 -9 1.63 MB/s 82 MB/s 21157830 21.16 ptwiki-20090128-pages-articles.xml
    csc 3.3 -5 2.22 MB/s 110 MB/s 21387628 21.39 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -22 1.79 MB/s 532 MB/s 21611704 21.61 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -16 1.85 MB/s 254 MB/s 21990066 21.99 ptwiki-20090128-pages-articles.xml
    lzma 9.38 -5 2.27 MB/s 118 MB/s 22140923 22.14 ptwiki-20090128-pages-articles.xml
    xz 5.2.2 -6 2.50 MB/s 106 MB/s 22401452 22.40 ptwiki-20090128-pages-articles.xml
    lzlib 1.7 -6 2.34 MB/s 80 MB/s 22598101 22.60 ptwiki-20090128-pages-articles.xml
    brotli 0.4.0 -11 0.61 MB/s 508 MB/s 22928058 22.93 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -13 6.16 MB/s 247 MB/s 23956744 23.96 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -18 3.79 MB/s 733 MB/s 24044270 24.04 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -10 7.17 MB/s 227 MB/s 24693567 24.69 ptwiki-20090128-pages-articles.xml
    csc 3.3 -3 6.91 MB/s 91 MB/s 24879954 24.88 ptwiki-20090128-pages-articles.xml
    lzham 1.0 -d26 -1 2.98 MB/s 285 MB/s 25432622 25.43 ptwiki-20090128-pages-articles.xml
    zling 2016年01月10日 -4 48 MB/s 219 MB/s 25704231 25.70 ptwiki-20090128-pages-articles.xml
    tornado 0.6a -7 21 MB/s 239 MB/s 25768559 25.77 ptwiki-20090128-pages-articles.xml
    zling 2016年01月10日 -3 56 MB/s 218 MB/s 25959462 25.96 ptwiki-20090128-pages-articles.xml
    zstd 0.8.0 -15 8.32 MB/s 762 MB/s 26164752 26.16 ptwiki-20090128-pages-articles.xml
    brotli 0.4.0 -8 11 MB/s 545 MB/s 26172167 26.17 ptwiki-20090128-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.26 MB/s 82 MB/s 13723940 13.72 ruwiki-20081228-pages-articles.xml
    xz 5.2.2 -9 1.88 MB/s 139 MB/s 16379184 16.38 ruwiki-20081228-pages-articles.xml
    lzlib 1.7 -9 1.58 MB/s 94 MB/s 16559388 16.56 ruwiki-20081228-pages-articles.xml
    csc 3.3 -5 2.16 MB/s 130 MB/s 17094657 17.09 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -22 1.83 MB/s 629 MB/s 17154468 17.15 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -16 1.86 MB/s 301 MB/s 17444960 17.44 ruwiki-20081228-pages-articles.xml
    lzma 9.38 -5 2.29 MB/s 142 MB/s 17576244 17.58 ruwiki-20081228-pages-articles.xml
    xz 5.2.2 -6 2.36 MB/s 132 MB/s 17653495 17.65 ruwiki-20081228-pages-articles.xml
    lzlib 1.7 -6 2.25 MB/s 95 MB/s 17883170 17.88 ruwiki-20081228-pages-articles.xml
    brotli 0.4.0 -11 0.60 MB/s 643 MB/s 18283528 18.28 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -18 3.65 MB/s 831 MB/s 19555354 19.56 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -10 7.61 MB/s 276 MB/s 20069790 20.07 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -13 5.99 MB/s 277 MB/s 20596619 20.60 ruwiki-20081228-pages-articles.xml
    zstd 0.8.0 -15 7.20 MB/s 866 MB/s 21593370 21.59 ruwiki-20081228-pages-articles.xml
    zling 2016年01月10日 -4 47 MB/s 249 MB/s 21834612 21.83 ruwiki-20081228-pages-articles.xml
    lzham 1.0 -d26 -1 3.26 MB/s 315 MB/s 21886926 21.89 ruwiki-20081228-pages-articles.xml
    csc 3.3 -3 8.06 MB/s 85 MB/s 22092178 22.09 ruwiki-20081228-pages-articles.xml
    brotli 0.4.0 -8 12 MB/s 617 MB/s 22258010 22.26 ruwiki-20081228-pages-articles.xml
    zling 2016年01月10日 -3 60 MB/s 246 MB/s 22388231 22.39 ruwiki-20081228-pages-articles.xml
    tornado 0.6a -7 25 MB/s 270 MB/s 22484142 22.48 ruwiki-20081228-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.16 MB/s 57 MB/s 17364165 17.36 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -9 2.01 MB/s 113 MB/s 20105194 20.11 trwiki-20090207-pages-articles.xml
    lzlib 1.7 -9 1.65 MB/s 83 MB/s 20196223 20.20 trwiki-20090207-pages-articles.xml
    csc 3.3 -5 2.43 MB/s 110 MB/s 20598614 20.60 trwiki-20090207-pages-articles.xml
    zstd 0.8.0 -22 1.84 MB/s 546 MB/s 20884896 20.88 trwiki-20090207-pages-articles.xml
    lzma 9.38 -5 2.46 MB/s 119 MB/s 21260452 21.26 trwiki-20090207-pages-articles.xml
    tornado 0.6a -16 1.94 MB/s 252 MB/s 21307495 21.31 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -6 2.64 MB/s 108 MB/s 21373835 21.37 trwiki-20090207-pages-articles.xml
    lzlib 1.7 -6 2.52 MB/s 81 MB/s 21627986 21.63 trwiki-20090207-pages-articles.xml
    brotli 0.4.0 -11 0.60 MB/s 515 MB/s 21815872 21.82 trwiki-20090207-pages-articles.xml
    tornado 0.6a -13 6.19 MB/s 248 MB/s 22985956 22.99 trwiki-20090207-pages-articles.xml
    zstd 0.8.0 -18 4.04 MB/s 725 MB/s 23301538 23.30 trwiki-20090207-pages-articles.xml
    tornado 0.6a -10 7.31 MB/s 230 MB/s 23804940 23.80 trwiki-20090207-pages-articles.xml
    csc 3.3 -3 7.39 MB/s 93 MB/s 23848749 23.85 trwiki-20090207-pages-articles.xml
    lzham 1.0 -d26 -1 2.99 MB/s 288 MB/s 24197793 24.20 trwiki-20090207-pages-articles.xml
    zling 2016年01月10日 -4 49 MB/s 226 MB/s 24598592 24.60 trwiki-20090207-pages-articles.xml
    zling 2016年01月10日 -3 57 MB/s 224 MB/s 24813797 24.81 trwiki-20090207-pages-articles.xml
    brotli 0.4.0 -8 11 MB/s 543 MB/s 24855628 24.86 trwiki-20090207-pages-articles.xml
    tornado 0.6a -7 21 MB/s 238 MB/s 24907785 24.91 trwiki-20090207-pages-articles.xml
    xz 5.2.2 -3 6.51 MB/s 96 MB/s 25078677 25.08 trwiki-20090207-pages-articles.xml
    Code:
    Compressor name Compress. Decompress. Compr. size Ratio Filename
    glza 0.8 0.22 MB/s 55 MB/s 21376704 21.38 zhwiki-20090116-pages-articles.xml
    xz 5.2.2 -9 2.07 MB/s 90 MB/s 25256625 25.26 zhwiki-20090116-pages-articles.xml
    lzlib 1.7 -9 1.77 MB/s 69 MB/s 25515468 25.52 zhwiki-20090116-pages-articles.xml
    csc 3.3 -5 2.63 MB/s 88 MB/s 25762986 25.76 zhwiki-20090116-pages-articles.xml
    zstd 0.8.0 -22 1.93 MB/s 438 MB/s 26285914 26.29 zhwiki-20090116-pages-articles.xml
    lzma 9.38 -5 2.53 MB/s 97 MB/s 26461362 26.46 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -16 2.14 MB/s 207 MB/s 26754406 26.75 zhwiki-20090116-pages-articles.xml
    xz 5.2.2 -6 2.77 MB/s 87 MB/s 26876747 26.88 zhwiki-20090116-pages-articles.xml
    lzlib 1.7 -6 2.58 MB/s 67 MB/s 27044349 27.04 zhwiki-20090116-pages-articles.xml
    brotli 0.4.0 -11 0.59 MB/s 425 MB/s 27726251 27.73 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -13 6.20 MB/s 203 MB/s 28199772 28.20 zhwiki-20090116-pages-articles.xml
    csc 3.3 -3 6.25 MB/s 77 MB/s 28433217 28.43 zhwiki-20090116-pages-articles.xml
    zstd 0.8.0 -18 4.48 MB/s 600 MB/s 29416075 29.42 zhwiki-20090116-pages-articles.xml
    lzham 1.0 -d26 -1 2.66 MB/s 247 MB/s 29506389 29.51 zhwiki-20090116-pages-articles.xml
    tornado 0.6a -10 6.42 MB/s 185 MB/s 30456354 30.46 zhwiki-20090116-pages-articles.xml
    zling 2016年01月10日 -4 37 MB/s 177 MB/s 30735528 30.74 zhwiki-20090116-pages-articles.xml
    zling 2016年01月10日 -3 42 MB/s 175 MB/s 30907553 30.91 zhwiki-20090116-pages-articles.xml
    csc 3.3 -1 20 MB/s 77 MB/s 31113233 31.11 zhwiki-20090116-pages-articles.xml
    brotli 0.4.0 -8 8.86 MB/s 426 MB/s 31175292 31.18 zhwiki-20090116-pages-articles.xml
    zling 2016年01月10日 -2 47 MB/s 172 MB/s 31182794 31.18 zhwiki-20090116-pages-articles.xml
    GLZA's compressed files are 13 - 18% smaller than the #2 entry in the above cases. That's more consistent than I expected considering all the different languages. It seems like there is plenty of margin to allow for a much faster/slightly less effective production rule generator and/or encoder/decoder.
    Attached Files Attached Files

  40. Thanks (9):

    Gonzalo (27th September 2016),JamesB (3rd October 2016),Matt Mahoney (28th September 2016),Mike (27th September 2016),Nania Francesco (28th September 2016),Sportman (27th September 2016),Stephan Busch (27th September 2016),surfersat (28th September 2016),VadimV (1st October 2016)

FirstFirst ... 17 25 26 28 29 ... LastLast
« Previous Thread | Next Thread »

Similar Threads

  1. Replies: 4
    Last Post: 2nd December 2012, 03:55
  2. Suffix Tree's internal representation
    By Piotr Tarsa in forum Data Compression
    Replies: 4
    Last Post: 18th December 2011, 08:37
  3. M03 alpha
    By michael maniscalco in forum Data Compression
    Replies: 6
    Last Post: 10th October 2009, 01:31
  4. PIM 2.00 (alpha) is here!!!
    By encode in forum Forum Archive
    Replies: 46
    Last Post: 14th June 2007, 20:27
  5. PIM 2.00 (alpha) overview
    By encode in forum Forum Archive
    Replies: 21
    Last Post: 8th June 2007, 14:41

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Forum Rules

AltStyle によって変換されたページ (->オリジナル) /