Results 121 to 150 of 163

Thread: pcompress, a deduplication/compression utility

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

4th February 2015, 01:55 #121
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
The binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.
Reply With Quote Reply With Quote
4th February 2015, 15:43 #122
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Quote Originally Posted by Matt Mahoney View Post

The binaries in buildtmp are v3.0, not v3.1. I wondered why there wasn't any improvement.

Fixed yet? Many (most?) people here don't want to (or can't!) mess with compiling linux sources - be good to keep the binary/version nos updated after every new source code update. In the continued absence of the win version that'd be best way to get feedback on pcompress.

Also re the avx2 haswell issue, can't it be compiled with the fastest option & then if avx2 not available the binary uses alternate code.

J
Reply With Quote Reply With Quote
5th February 2015, 08:27 #123
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
Fixed the download, it was my oversight. Now the binaries are built on Ubuntu with AVX disabled, so it should run on non-Haswell processors. Also the binaries are now 3.1 version.

I have written explicit code in Pcompress where multiple hand-optimized (ASM or Intrinsics) variants exist for different processor capabilities and it detects the processor at runtime. This is called CPU dispatching. See for example the AES stuff which checks for AES-NI or Vector (VPAES) instruction capability:
https://github.com/moinakg/pcompress...ypto_aes.c#L81

This is the code which probes processor capabilities using the cpuid instruction at runtime:
https://github.com/moinakg/pcompress/blob/master/utils/cpuid.c

However the Gcc compiler can auto-vectorize normal C code loops and arithmetic computations. In that case it just uses the max processor capability that has been specified (via optimization flags) and generates machine code for that. I think there is a Gcc feature under development (or in bleeding edge versions) that will generate multiple CPU type variants and generate additional checks to do automatic cpu dispatching. The stable version in Ubuntu does not appear to have that capability.
Reply With Quote Reply With Quote
5th February 2015, 20:13 #124
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I still get "illegal instruction" with the 3.1 Ubuntu binary. I'm testing on a Core i7 M620 (has sse4.2).
Reply With Quote Reply With Quote
6th February 2015, 00:30 #125
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
Strange. Let me check.
Reply With Quote Reply With Quote
6th February 2015, 20:06 #126
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
I rebuilt everything with "-march=core2" flag and uploaded new binaries. It should work now.
Reply With Quote Reply With Quote
7th February 2015, 05:59 #127
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
This one works. Nice improvement on the 10GB benchmark (system 4). http://mattmahoney.net/dc/10gb.html

I noticed some minor problems. Last-modified times were not restored on empty directories. Also, the download pc/pcompress script can't find "/home/moinakg" although I figured out how to install it anyway.
Reply With Quote Reply With Quote
Thanks:

moinakg (7th February 2015)
9th February 2015, 21:51 #128
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I added it to LTCB. http://mattmahoney.net/dc/text.html#1647
Reply With Quote Reply With Quote
20th February 2015, 22:29 #129
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
I am using OpenSUSE 13.1 x64 and copied the binaries on the same place where I used to start the tests of Pcompress 3.0.
But if I write "./pcompress -a -l14 TEST_APP app.pz" in the terminal pcompress is running with just 300k of memory, one CPU is used
permanently but no file is being created. What am I doing wrong?
Reply With Quote Reply With Quote
21st February 2015, 09:39 #130
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
Binaries built for Ubuntu may not work on OpenSUSE. It is probably going into some infinite loop early on. I will build binaries for OpenSUSE and provide a download in a couple of days.
Reply With Quote Reply With Quote
24th February 2015, 19:23 #131
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
OpenSUSE binary is here:
http://sourceforge.net/projects/bele...d?source=files
Reply With Quote Reply With Quote
25th February 2015, 03:25 #132
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
Thank you for providing the executable.
But I am afraid the same happens on my OpenSUSE 13.1 (x64) - just a loop and no archive is created.
Used commandline is time ./pcompress -a -l14 TEST_APP app.pz

I edited the batch file:

#!/bin/sh

PC_PATH="/home/stephan/Testsets"
LD_LIBRARY_PATH="${PC_PATH}"
export LD_LIBRARY_PATH

exec ${PC_PATH}/pcompress "$@"

The testsets folder is where the testsets and your compile with batch files are located.
Reply With Quote Reply With Quote
27th February 2015, 09:19 #133
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
I have updated the tarball in the same localtion. It was my mistake. I forgot to include the pcompress binary inside "buildtmp" dir. If you delete that "buildtmp" dir then the script goes into an infinite loop of calling "exec" to itself.

After extracting this tarball you only have to edit the PC_PATH variable.
Reply With Quote Reply With Quote
Thanks:

Stephan Busch (27th February 2015)
28th February 2015, 19:59 #134
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
this build finally works.
I have put results onlnie.. we have mixed results compared to previous version;
we lose much compression on Camera, PGM/PPM and XML testsets.
No audio compression / delta compression seems to have been applied.
Reply With Quote Reply With Quote
28th February 2015, 21:50 #135
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Re latest sqeeze chart results, said to be 'mixed' compared with previous version any idea why? J
Reply With Quote Reply With Quote
28th February 2015, 23:06 #136
Stephan Busch

View Profile

View Forum Posts

Private Message

Visit Homepage
Stephan Busch is offline
Tester
Stephan Busch's Avatar

Join Date

May 2008

Location

Bremen, Germany

Posts

879

Thanks

476

Thanked 176 Times in 86 Posts
some results are better than 3.0 and some are worse..
Reply With Quote Reply With Quote
1st March 2015, 13:23 #137
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
Have to check. Some of the newer transforms may not be playing well in all cases or some thresholds need to be tweaked.
Reply With Quote Reply With Quote
2nd March 2015, 17:06 #138
Nania Francesco

View Profile

View Forum Posts

Private Message

Visit Homepage
Nania Francesco is offline
Tester
Nania Francesco's Avatar

Join Date

May 2008

Location

Italy

Posts

1,726

Thanks

266

Thanked 218 Times in 129 Posts
I would like to test it directly (WCC) using pcompress as main OS windows. Is there any software that allows me to emulate a linux executable on windows?
Reply With Quote Reply With Quote
2nd March 2015, 18:45 #139
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Nothing like wine- but if you have a reasonably modern pc you can use virtualbox - free, works well on win 64 bit with few cores & n gbytes memory, fair performance. I use it all the time eg for running ubuntu on win 7. It'll run other linuxes too.

Of course we'd really like the fabled & delayed win port. Anyone know why, with moinakg being a linux/c++ and other computer stuff guru (IMO) it isn't really easy for him to port to windows - an afternoon's work to make a beta? Surely there is only memory access & file io plus maybe some os specifics like finding no of cores - rest is C++? Only thing I can think of is that pcompress has a rather large no of separate files, and maybe uses linux libraries such as wavepack & maybe many others - these could be omitted from beta. j
Reply With Quote Reply With Quote
Thanks:

Nania Francesco (2nd March 2015)
2nd March 2015, 20:41 #140
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Having developed zpaq for Windows and Linux, I can tell you that supporting both OS's is not something you can fix in a day. For a single file compressor and single threading, yes. For an archiver, no.
Reply With Quote Reply With Quote
Thanks (2):

avitar (3rd March 2015),Nania Francesco (2nd March 2015)
3rd March 2015, 00:18 #141
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Matt, yes - maybe I should have said an alpha version to get started, in an afternoon... but I yield to your superior knowledge. I was of course trying to be a bit controversial - anyway it shouldn't take more than few days! J
Reply With Quote Reply With Quote
3rd March 2015, 01:06 #142
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
ok, just find these few days yourself
Reply With Quote Reply With Quote
3rd March 2015, 02:22 #143
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Don't be silly, not few days for a non compression, c++, linux, windows guru but a few 100 I suspect! j
Reply With Quote Reply With Quote
3rd March 2015, 04:25 #144
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
ok, you have one more week to become a guru. i believe it should be easy for smart guy like you! :)
Reply With Quote Reply With Quote
4th March 2015, 16:54 #145
avitar

View Profile

View Forum Posts

Private Message
avitar is offline
Member

Join Date

Sep 2011

Location

uk

Posts

239

Thanks

202

Thanked 17 Times in 12 Posts
Stop misrepresenting what I say - as we all know to be a guru takes a lifetime. For the author, a few days I still think - I accept that free time is difficult to find for all of us. Still can't understand why it should take so long.

BTW while on this related subject, is there a 64 bit port of freearc? - surely that'd take less than few days, for the the guru ...
Reply With Quote Reply With Quote
4th March 2015, 17:23 #146
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Just saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.

In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.

The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.
Reply With Quote Reply With Quote
4th March 2015, 23:46 #147
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Matt Mahoney View Post

The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.

This seems like it should be a solved problem. For unix-alike systems, tar format is pretty much standard and supports all kinds of filesystem metadata. It is incorporated wholesale into common unix compressed formats, like .tar.gz, .tar.bz2, etc. Incorporating tar into pcompress or zpaq archives might not be feasible for some reason, but it would be a good idea to emulate tar's behavior wherever possible. I'm not sure if there's anything like a tar-equivalent for Windows.

To the extent that this problem remains unsolved, you have to wonder if perhaps it's not a good idea.
Reply With Quote Reply With Quote
5th March 2015, 00:35 #148
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Yes, there is GNU tar for Windows, which solves many of these problems.
Reply With Quote Reply With Quote
5th March 2015, 13:21 #149
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
and btw, why don't use an FB, except for its early alpha status? i think it has the same main feature as pcompress - join main files toghether and run dedup+compression... although OTOH zpaq is even better

Last edited by Bulat Ziganshin; 5th March 2015 at 13:24.
Reply With Quote Reply With Quote
5th March 2015, 15:18 #150
moinakg

View Profile

View Forum Posts

Private Message
moinakg is offline
Member

Join Date

Nov 2012

Location

Bangalore

Posts

114

Thanks

9

Thanked 46 Times in 31 Posts
Quote Originally Posted by Matt Mahoney View Post

Just saying pcompress source code is about 500 files. Doing anything at all to it won't be simple.

In zpaq to handle Windows and Linux I need separate code for multi-threading, directory traversal and creation, file deletion, reading and setting file dates and attributes and permissions, handling Unicode filenames, detecting number of processors, memory, and system time, error reporting, reading from the terminal, getting file sizes and seeking over 2 GB, random number generation, and interfacing to assembler. The archive format has to represent file names, dates, attributes, and permissions in an OS neutral format and know how to handle Windows alternate streams, junctions, links, and ACLs, as well as Linux devices, pipes, sockets, symbolic links, and hard links and what to do if you compress in one OS and extract in the other. And that's just 10K lines in 3 source files.

The compression and encryption code is about the only thing that is not OS dependent. In pcompress, most of the compression comes from other libraries (zlib, lzma, bsc, bzip2, lz4, lzfx) and likewise for encryption. So basically, a Windows port would be like writing the whole thing all over again. I think it is safe to say that it won't happen.

Thanks for clarifying the details. The system-specific stuff are the bulk of the porting work that needs to be done. However, in Pcompress I am using a modified copy of Libarchive for archiving in pax-extended format (superseeds tar format) which already handles all the archiving issues in a platform neutral way. Libarchive builds and works on Windows, so I do not have much of a trouble on that front.

The trouble I can see are the assembler code in the bundled crypto routines (AES, Salsa20, BLAKE2, SHA2). Yasm works on windows but I will have to build using my patched version of Yasm. All the crypto routines and most of the compression code are in the source tree. The only external dependencies involve WavPack, Zlib and Bzip2. So yes, there is a bit of work involved but not quite the massive amount that you mentioned.
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

eXdupe - data deduplication archiver

By Lasse Reinhold in forum Data Compression

Replies: 85
Last Post: 10th January 2025, 04:20
native deduplication in Windows 8 x64

By jimbow in forum Data Compression

Replies: 6
Last Post: 31st October 2012, 00:57
A Microsoft study on deduplication

By m^2 in forum Data Compression

Replies: 1
Last Post: 5th May 2011, 19:15
Remote diff utility

By Shelwien in forum Data Compression

Replies: 2
Last Post: 6th September 2009, 16:37

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 10:50.

Thread: pcompress, a deduplication/compression utility

Thanks:

Thanks:

Thanks:

Thanks (2):

Similar Threads

eXdupe - data deduplication archiver

native deduplication in Windows 8 x64

A Microsoft study on deduplication

Remote diff utility

Posting Permissions