The binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.
Fixed yet? Many (most?) people here don't want to (or can't!) mess with compiling linux sources - be good to keep the binary/version nos updated after every new source code update. In the continued absence of the win version that'd be best way to get feedback on pcompress.Quote Originally Posted by Matt Mahoney View PostThe binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.
Also re the avx2 haswell issue, can't it be compiled with the fastest option & then if avx2 not available the binary uses alternate code.
J
Fixed the download, it was my oversight. Now the binaries are built on Ubuntu with AVX disabled, so it should run on non-Haswell processors. Also the binaries are now 3.1 version.
I have written explicit code in Pcompress where multiple hand-optimized (ASM or Intrinsics) variants exist for different processor capabilities and it detects the processor at runtime. This is called CPU dispatching. See for example the AES stuff which checks for AES-NI or Vector (VPAES) instruction capability:
https://github.com/moinakg/pcompress...ypto_aes.c#L81
This is the code which probes processor capabilities using the cpuid instruction at runtime:
https://github.com/moinakg/pcompress/blob/master/utils/cpuid.c
However the Gcc compiler can auto-vectorize normal C code loops and arithmetic computations. In that case it just uses the max processor capability that has been specified (via optimization flags) and generates machine code for that. I think there is a Gcc feature under development (or in bleeding edge versions) that will generate multiple CPU type variants and generate additional checks to do automatic cpu dispatching. The stable version in Ubuntu does not appear to have that capability.
I still get "illegal instruction" with the 3.1 Ubuntu binary. I'm testing on a Core i7 M620 (has sse4.2).
Strange. Let me check.
I rebuilt everything with "-march=core2" flag and uploaded new binaries. It should work now.
This one works. Nice improvement on the 10GB benchmark (system 4). http://mattmahoney.net/dc/10gb.html
I noticed some minor problems. Last-modified times were not restored on empty directories. Also, the download pc/pcompress script can't find "/home/moinakg" although I figured out how to install it anyway.
moinakg (7th February 2015)
I added it to LTCB. http://mattmahoney.net/dc/text.html#1647
I am using OpenSUSE 13.1 x64 and copied the binaries on the same place where I used to start the tests of Pcompress 3.0.
But if I write "./pcompress -a -l14 TEST_APP app.pz" in the terminal pcompress is running with just 300k of memory, one CPU is used
permanently but no file is being created. What am I doing wrong?
Binaries built for Ubuntu may not work on OpenSUSE. It is probably going into some infinite loop early on. I will build binaries for OpenSUSE and provide a download in a couple of days.
OpenSUSE binary is here:
http://sourceforge.net/projects/bele...d?source=files
Thank you for providing the executable.
But I am afraid the same happens on my OpenSUSE 13.1 (x64) - just a loop and no archive is created.
Used commandline is time ./pcompress -a -l14 TEST_APP app.pz
I edited the batch file:
The testsets folder is where the testsets and your compile with batch files are located.#!/bin/sh
PC_PATH="/home/stephan/Testsets"
LD_LIBRARY_PATH="${PC_PATH}"
export LD_LIBRARY_PATH
exec ${PC_PATH}/pcompress "$@"
I have updated the tarball in the same localtion. It was my mistake. I forgot to include the pcompress binary inside "buildtmp" dir. If you delete that "buildtmp" dir then the script goes into an infinite loop of calling "exec" to itself.
After extracting this tarball you only have to edit the PC_PATH variable.
Stephan Busch (27th February 2015)
this build finally works.
I have put results onlnie.. we have mixed results compared to previous version;
we lose much compression on Camera, PGM/PPM and XML testsets.
No audio compression / delta compression seems to have been applied.
Re latest sqeeze chart results, said to be 'mixed' compared with previous version any idea why? J
some results are better than 3.0 and some are worse..
Have to check. Some of the newer transforms may not be playing well in all cases or some thresholds need to be tweaked.
I would like to test it directly (WCC) using pcompress as main OS windows. Is there any software that allows me to emulate a linux executable on windows?
Nothing like wine- but if you have a reasonably modern pc you can use virtualbox - free, works well on win 64 bit with few cores & n gbytes memory, fair performance. I use it all the time eg for running ubuntu on win 7. It'll run other linuxes too.
Of course we'd really like the fabled & delayed win port. Anyone know why, with moinakg being a linux/c++ and other computer stuff guru (IMO) it isn't really easy for him to port to windows - an afternoon's work to make a beta? Surely there is only memory access & file io plus maybe some os specifics like finding no of cores - rest is C++? Only thing I can think of is that pcompress has a rather large no of separate files, and maybe uses linux libraries such as wavepack & maybe many others - these could be omitted from beta. j
Nania Francesco (2nd March 2015)
Having developed zpaq for Windows and Linux, I can tell you that supporting both OS's is not something you can fix in a day. For a single file compressor and single threading, yes. For an archiver, no.
avitar (3rd March 2015),Nania Francesco (2nd March 2015)
Matt, yes - maybe I should have said an alpha version to get started, in an afternoon... but I yield to your superior knowledge. I was of course trying to be a bit controversial - anyway it shouldn't take more than few days! J
ok, just find these few days yourself
Don't be silly, not few days for a non compression, c++, linux, windows guru but a few 100 I suspect! j
ok, you have one more week to become a guru. i believe it should be easy for smart guy like you! :)
Stop misrepresenting what I say - as we all know to be a guru takes a lifetime. For the author, a few days I still think - I accept that free time is difficult to find for all of us. Still can't understand why it should take so long.
BTW while on this related subject, is there a 64 bit port of freearc? - surely that'd take less than few days, for the the guru ...
Just saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.
In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.
The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.
This seems like it should be a solved problem. For unix-alike systems, tar format is pretty much standard and supports all kinds of filesystem metadata. It is incorporated wholesale into common unix compressed formats, like .tar.gz, .tar.bz2, etc. Incorporating tar into pcompress or zpaq archives might not be feasible for some reason, but it would be a good idea to emulate tar's behavior wherever possible. I'm not sure if there's anything like a tar-equivalent for Windows.Quote Originally Posted by Matt Mahoney View PostThe archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.
To the extent that this problem remains unsolved, you have to wonder if perhaps it's not a good idea.
Yes, there is GNU tar for Windows, which solves many of these problems.
and btw, why don't use an FB, except for its early alpha status? i think it has the same main feature as pcompress - join main files toghether and run dedup+compression... although OTOH zpaq is even better
Last edited by Bulat Ziganshin; 5th March 2015 at 13:24.
Thanks for clarifying the details. The system-specific stuff are the bulk of the porting work that needs to be done. However, in Pcompress I am using a modified copy of Libarchive for archiving in pax-extended format (superseeds tar format) which already handles all the archiving issues in a platform neutral way. Libarchive builds and works on Windows, so I do not have much of a trouble on that front.Quote Originally Posted by Matt Mahoney View PostJust saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.
In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.
The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.
The trouble I can see are the assembler code in the bundled crypto routines (AES, Salsa20, BLAKE2, SHA2). Yasm works on windows but I will have to build using my patched version of Yasm. All the crypto routines and most of the compression code are in the source tree. The only external dependencies involve WavPack, Zlib and Bzip2. So yes, there is a bit of work involved but not quite the massive amount that you mentioned.