Results 331 to 360 of 885

Thread: Tree alpha v0.1 download

- Show Printable Version
- Email this Page…
- Advanced Search
- Linear Mode
- Switch to Hybrid Mode
- Switch to Threaded Mode

30th September 2014, 12:17 #331
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

Memory mapping seems like a good idea, but from what I am seeing it is not available with standard MinGW 32 bit gcc. But then again, the time to move up from 32 bit gcc under Windows may be approaching.

I don't know a whole lot about how the Windows OS handles I/O. I do know that writing one giant buffer doesn't work so well. Tree did that originally and had a pretty long stall at the end to flush the buffer, much like I seem to be seeing with LzTurbo.

I'm not very familiar with MinGW, but memory-mapping should be part of the Windows API, so if you can call Windows API functions, it should just work. The compiler doesn't need to support it. I've built and run code that uses mmap in Cygwin and it worked fine.

I/O isn't supposed to be difficult. All that's really important is that buffering exists at various levels and reads and writes go to and from the device in blocks. If you call system calls directly, you should read and write in chunks (ideally multiples of the system's block size, which is always a power of 2); the reason is that system calls are expensive and too many will cause a performance hit. If you use stdio, you shouldn't have to think about this. Buffering a huge amount and writing it all at once uses lots of memory and it cuts off the opportunity to pipeline (it takes time to physically write data to the device and this can happen in parallel with computation if you start writing while the computation is still underway).
Reply With Quote Reply With Quote
30th September 2014, 19:18 #332
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

I'm not very familiar with MinGW, but memory-mapping should be part of the Windows API, so if you can call Windows API functions, it should just work. The compiler doesn't need to support it. I've built and run code that uses mmap in Cygwin and it worked fine.

I/O isn't supposed to be difficult. All that's really important is that buffering exists at various levels and reads and writes go to and from the device in blocks. If you call system calls directly, you should read and write in chunks (ideally multiples of the system's block size, which is always a power of 2); the reason is that system calls are expensive and too many will cause a performance hit. If you use stdio, you shouldn't have to think about this. Buffering a huge amount and writing it all at once uses lots of memory and it cuts off the opportunity to pipeline (it takes time to physically write data to the device and this can happen in parallel with computation if you start writing while the computation is still underway).

I saw this on minGW's site:http://www.mingw.org/node/21.

Yes, I/O shouldn't be difficult. That's why the 7 second stall when decompressing with LzTurbo surprises me.
Reply With Quote Reply With Quote

1st October 2014, 00:13 #333

Matt Mahoney

Matt Mahoney is offline

Expert

Matt Mahoney's Avatar

Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 3,271
Thanks: 315; Thanked 841 Times in 506 Posts

I was testing tree v0.12 in 64 bit Linux. On a small text file (enwik6, 1 MB), TreeCompress.exe under Wine 1.6 gives an out of memory error. Next I compiled with gcc 4.8.2 -O3 and I get a segmentation fault. Then I tried options -O3 -m32 and finally it runs, but enwik8 gives fast but poor compression and enwik9 gives an error (symbol limit exceeded). Here is the output:

Code:

matt@matt-Latitude-E6510:~$ testtree enwik8
TreePreEncode enwik8 enwik8.tpre
Pre and capital encoding 100000000 bytes
Wrote 1 byte header and 103809101 encoded bytes in 450.853 seconds.
real	0m0.454s
user	0m0.333s
sys	0m0.120s
TreeCompress enwik8.tpre enwik8.tcom
Allocated 1778384896 bytes for data processing
Read 103809102 byte input file
cap encoded: 1, UTF8 compliant 1
Found 103430933 symbols, 0 defines, maximum unicode value 0x28b46
1: 103430933 syms, dict. size 0, 4.9009 bits/sym, o0e 63362903 bytes
Common prefix scan 34/34 79.0 - 2ffff.2ffff 
Run time 1034 seconds.
real	0m1.038s
user	0m0.888s
sys	0m0.148s
TreeBitEncode enwik8.tcom enwik8.tree
cap encoded: 1, UTF8_compliant 1
Read 103430933 symbols including 0 definition symbols
Encoded 103430933 level 1 symbols 
Compressed file size: 78293312 bytes, dictionary size: 6039 symbols
elapsed time = 2181.430000 seconds.
real	0m2.376s
user	0m1.989s
sys	0m0.196s
TreeDecode enwik8.tree enwik8.out
Decompressed 100000000 bytes in 11178482 msec
real	0m5.594s
user	0m11.080s
sys	0m0.100s
 97660 -rw-rw-r-- 1 matt matt 100000000 Sep 30 15:58 enwik8.out
101380 -rw-rw-r-- 1 matt matt 103809102 Sep 30 15:57 enwik8.tcom
101384 -rw-rw-r-- 1 matt matt 103809102 Sep 30 15:57 enwik8.tpre
 76460 -rw-rw-r-- 1 matt matt 78293312 Sep 30 15:58 enwik8.tree
a1fa5ffddb56f4953e226637dabbb36a enwik8
a1fa5ffddb56f4953e226637dabbb36a enwik8.out
matt@matt-Latitude-E6510:~$ testtree enwik9
TreePreEncode enwik9 enwik9.tpre
Pre and capital encoding 1000000000 bytes
Wrote 1 byte header and 1041507612 encoded bytes in 6221.491 seconds.
real	0m24.686s
user	0m4.428s
sys	0m1.795s
TreeCompress enwik9.tpre enwik9.tcom
Allocated 1778384896 bytes for data processing
Read 1041507613 byte input file
cap encoded: 1, UTF8 compliant 1
Found 1039028503 symbols, 0 defines, maximum unicode value 0x28b4e
1: 1039028503 syms, dict. size 0, 4.9718 bits/sym, o0e 645730267 bytes
Common prefix scan 634/634 30fb.0 - 2ffff.2ffff 
Run time 6834 seconds.
real	0m20.720s
user	0m5.751s
sys	0m1.085s
TreeBitEncode enwik9.tcom enwik9.tree
cap encoded: 1, UTF8_compliant 1
ERROR - symbol limit of 250000000 symbols exceeded
real	0m4.531s
user	0m2.404s
sys	0m0.605s
TreeDecode enwik9.tree enwik9.out
fopen error file 'enwik9.tree' not found
real	0m1.422s
user	0m0.006s
sys	0m0.000s
 0 -rw-rw-r-- 1 matt matt 0 Sep 30 15:59 enwik9.out
1017104 -rw-rw-r-- 1 matt matt 1041507613 Sep 30 15:59 enwik9.tcom
1017104 -rw-rw-r-- 1 matt matt 1041507613 Sep 30 15:59 enwik9.tpre
e206c3450ac99950df65bf70ef61a12d enwik9
d41d8cd98f00b204e9800998ecf8427e enwik9.out
rm: cannot remove ‘enwik9.tree’: No such file or directory

Here is the test script.

Code:

echo TreePreEncode 1ドル 1ドル.tpre
time TreePreEncode 1ドル 1ドル.tpre
echo TreeCompress 1ドル.tpre 1ドル.tcom
time TreeCompress 1ドル.tpre 1ドル.tcom
echo TreeBitEncode 1ドル.tcom 1ドル.tree
time TreeBitEncode 1ドル.tcom 1ドル.tree
echo TreeDecode 1ドル.tree 1ドル.out
time TreeDecode 1ドル.tree 1ドル.out
ls -las 1ドル.*
md5sum 1ドル 1ドル.out
rm 1ドル.tpre 1ドル.tcom 1ドル.tree 1ドル.out

Also, times are not calculated correctly by the programs. When I compile I get warnings about printf format errors (long passed as "%u") which matters because in 64 bit gcc, long is 64 bits.

Reply With Quote Reply With Quote

1st October 2014, 00:48 #334
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Let me get back to you. I just installed minGW64 last night and had to change some variables to 64 bits to get the code running.
Reply With Quote Reply With Quote
1st October 2014, 05:07 #335
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

I saw this on minGW's site:http://www.mingw.org/node/21.

I've seen that, too. With Cygwin you can use the POSIX APIs, so you can call mmap like in Linux. With MinGW, you'll have to find the native Win32 version and call that instead.
Reply With Quote Reply With Quote
1st October 2014, 11:40 #336
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

I've seen that, too. With Cygwin you can use the POSIX APIs, so you can call mmap like in Linux. With MinGW, you'll have to find the native Win32 version and call that instead.

Now that I have the 64-bit compiler, I think I may have more options. But I'm not sure it would have much impact on TreeDecode since it already has a separate thread that reads the decoded symbols from a circular buffer, looks up the dictionary string and sends the data to output via ~1 MB ping-pong buffers.
Reply With Quote Reply With Quote
1st October 2014, 11:52 #337
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by Matt Mahoney View Post

I was testing tree v0.12 in 64 bit Linux. On a small text file (enwik6, 1 MB), TreeCompress.exe under Wine 1.6 gives an out of memory error. Next I compiled with gcc 4.8.2 -O3 and I get a segmentation fault. Then I tried options -O3 -m32 and finally it runs, but enwik8 gives fast but poor compression and enwik9 gives an error (symbol limit exceeded).

Also, times are not calculated correctly by the programs. When I compile I get warnings about printf format errors (long passed as "%u") which matters because in 64 bit gcc, long is 64 bits.

Thanks for the detailed problem report. I am sorry trying to build a 64-bit executable didn't go very well. The attached version of TreeCompress has updated code and works when a 64-bit executable is built with MinGW64 and works with MinGW. I can't build 32-bit executables with MinGW64, but I think it is a setup problem. I do not know if it will work under Wine - I used %Iu in printf's. The pointers could use some clean-up work and this version uses the same memory size for 32 or 64 bit code. The MAX_MEMORY_USAGE const can be increased before making a 64-bit build if you want to try more memory, which does seem to work. The results are not exactly the same, probably because memory structure sizes have changed even though overall memory allocation is the same for the 32 and 64 bit executables.

Attached Files Attached Files

File Type: rar TreeCompress64.rar (51.9 KB, 190 views)
Reply With Quote Reply With Quote
1st October 2014, 12:55 #338
dnd

View Profile

View Forum Posts

Private Message

Visit Homepage
dnd is offline
Member

Join Date

Mar 2013

Location

Worldwide

Posts

751

Thanks

97

Thanked 300 Times in 215 Posts
Thanks Sportman.A few numbers say more than 1000 words.

Homepage - Twitter - Github
Reply With Quote Reply With Quote
2nd October 2014, 00:03 #339
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I compiled the new TreeCompress64 in Linux and it works.
enwik8 -> 21,974,704, 0.4+7299+3.4 sec, 1.8 sec. (2.67 GHz M620).
Reply With Quote Reply With Quote
Thanks (2):

Kennon Conrad (2nd October 2014),surfersat (2nd October 2014)
2nd October 2014, 01:09 #340
Sportman

View Profile

View Forum Posts

Private Message

Visit Homepage
Sportman is offline
Member

Join Date

Aug 2008

Location

Planet Earth

Posts

1,496

Thanks

219

Thanked 770 Times in 513 Posts
Input:
1,000,000,000 - enwik9

Ouput:

Harddisk:
269,503,577 bytes, 20.647 sec. - 9.669 sec., Lzturbo -32 -p1
250,730,516 bytes, 1083.884 sec. - 11.688 sec., Lzturbo -29 -p1
202,241,389 bytes, 1124.972 sec. - 12.045 sec., Lzturbo -39 -p1
193,605,881 bytes, 1317.063 sec. - 19.420 sec., Lzturbo -49 -p1
177,321,380 bytes, xxxx.xxx sec. - 13.581 sec., TreeComp v0.12 -t1

SSD:
269,503,577 bytes, 19.497 sec. - 3.622 sec., Lzturbo -32 -p1
250,730,516 bytes, 1073.668 sec. - 3.672 sec., Lzturbo -29 -p1
202,241,389 bytes, 1114.174 sec. - 4.169 sec., Lzturbo -39 -p1
193,605,881 bytes, 1305.626 sec. - 11.493 sec., Lzturbo -49 -p1
177,321,380 bytes, xxxx.xxx sec. - 11.215 sec., TreeComp v0.12 -t1

RAM DISK:
269,503,577 bytes, 19.235 sec. - 1.981 sec., Lzturbo -32 -p1
250,730,516 bytes, 1069.845 sec. - 1.694 sec., Lzturbo -29 -p1
202,241,389 bytes, 1111.677 sec. - 2.477 sec., Lzturbo -39 -p1
193,605,881 bytes, 1302.485 sec. - 9.624 sec., Lzturbo -49 -p1
177,321,380 bytes, xxxx.xxx sec. - 11.240 sec., TreeComp v0.12 -t1

cleanmem before every run.
Reply With Quote Reply With Quote
Thanks (3):

Kennon Conrad (2nd October 2014),Matt Mahoney (2nd October 2014),surfersat (2nd October 2014)
5th October 2014, 01:03 #341
Sportman

View Profile

View Forum Posts

Private Message

Visit Homepage
Sportman is offline
Member

Join Date

Aug 2008

Location

Planet Earth

Posts

1,496

Thanks

219

Thanked 770 Times in 513 Posts
Input:
1,000,000,000 bytes - enwik9

Output:

Harddisk:
269,503,577 bytes, 13.549 sec. - 9.404 sec., Lzturbo -32 -p2
250,730,516 bytes, 1083.920 sec. - 11.536 sec., Lzturbo -29 -p2
202,241,389 bytes, 1126.518 sec. - 12.110 sec., Lzturbo -39 -p2
193,605,881 bytes, 1316.625 sec. - 19.524 sec., Lzturbo -49 -p2
177,321,380 bytes, xxxx.xxx sec. - 10.681 sec., TreeComp v0.12

SSD:
269,503,577 bytes, 11.220 sec. - 2.377 sec., Lzturbo -32 -p2
250,730,516 bytes, 1074.534 sec. - 3.560 sec., Lzturbo -29 -p2
202,241,389 bytes, 1114.699 sec. - 4.213 sec., Lzturbo -39 -p2
193,605,881 bytes, 1305.269 sec. - 11.452 sec., Lzturbo -49 -p2
177,321,380 bytes, xxxx.xxx sec. - 7.465 sec., TreeComp v0.12

RAM DISK:
269,503,577 bytes, 11.060 sec. - 1.269 sec., Lzturbo -32 -p2
250,730,516 bytes, 1071.082 sec. - 1.705 sec., Lzturbo -29 -p2
202,241,389 bytes, 1114.523 sec. - 2.398 sec., Lzturbo -39 -p2
193,605,881 bytes, 1303.752 sec. - 9.625 sec., Lzturbo -49 -p2
177,321,380 bytes, xxxx.xxx sec. - 7.436 sec., TreeComp v0.12
Reply With Quote Reply With Quote
Thanks (2):

Kennon Conrad (5th October 2014),Paul W. (13th October 2014)
5th October 2014, 06:31 #342
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
Tested tree 0.12 compiled with gcc 4.8.2 -O3 on enwik9 in 64 bit Ubuntu. TreeCompress gave a segmentation fault after 2.6 days but the output was still OK.

enwik9.tpre -> 1,041,507,613, 4.4 sec
enwik9.tcom -> 371,204,373, 226246 sec, gives segmentation fault
enwik9.tree -> 177,542,372, 29 sec (note different size)
enwik9.out -> 16.8 sec (verifies OK)

http://mattmahoney.net/dc/text.html#1775
Reply With Quote Reply With Quote
5th October 2014, 10:52 #343
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by Matt Mahoney View Post

Tested tree 0.12 compiled with gcc 4.8.2 -O3 on enwik9 in 64 bit Ubuntu. TreeCompress gave a segmentation fault after 2.6 days but the output was still OK.

enwik9.tpre -> 1,041,507,613, 4.4 sec
enwik9.tcom -> 371,204,373, 226246 sec, gives segmentation fault
enwik9.tree -> 177,542,372, 29 sec (note different size)
enwik9.out -> 16.8 sec (verifies OK)

http://mattmahoney.net/dc/text.html#1775

Is this the same build that ran okay for enwik8? If so, that is odd.

I have not seen this problem on Windows. Since the output gets saved periodically during compression, there are two lines of code at the end of the file that get executed for the first time when the program finishes, both at the end of TreeCompress.c. The first is the free(char_buffer) command. The second is the fprintf for the execution time. I suspect the free call is what is causing the segmentation fault, but cannot be sure without more information or installing Ubuntu. If you still have the enwik9.tcom file, it would be interesting to try running it through TreeCompress again with and without the free command commented out.

The different/larger size was caused by the program having fewer tree nodes available since pointers are now 8 bytes instead of 4. It tries compensating by building the common prefix tree in smaller pieces, but in this case, there probably weren't enough nodes available to fit some of the individual pieces (for instance, it may have not been able to fit all the common prefixes starting with two spaces).
Reply With Quote Reply With Quote
5th October 2014, 17:19 #344
Bulat Ziganshin

View Profile

View Forum Posts

Private Message

Visit Homepage
Bulat Ziganshin is offline
Programmer Bulat Ziganshin's Avatar

Join Date

Mar 2007

Location

Uzbekistan

Posts

4,739

Thanks

866

Thanked 789 Times in 424 Posts
memalloc problems are greatly spotted by the valgrind
Reply With Quote Reply With Quote
7th October 2014, 00:58 #345
Matt Mahoney

View Profile

View Forum Posts

Private Message

Visit Homepage
Matt Mahoney is offline
Expert
Matt Mahoney's Avatar

Join Date

May 2008

Location

Melbourne, Florida, USA

Posts

3,271

Thanks

315

Thanked 841 Times in 506 Posts
I didn't look at the source code but the end of output for TreeCompress looked like this:

Code:

Common prefix scan 16/19 4d6238.0 - 57d925.78b78b Created 4791050 of 21753068 string nodes Common prefix scan 17/19 57d926.0 - 641ed3.78b78b Created 4773095 of 21753068 string nodes Common prefix scan 18/19 641ed4.0 - 731f64.78b78b Created 4756924 of 21753068 string nodes Common prefix scan 19/19 731f65.0 - 78b78b.78b78b Created 1638985 of 21753068 string nodes Common prefix scan 19/19 731f65.0 - 78b78b.78b78b, score[30000] = 22.65 /home/matt/bin/testtree: line 4: 19521 Segmentation fault (core dumped) TreeCompress 1ドル.tpre 1ドル.tcom real 3770m46.907s user 3757m50.924s sys 2m20.234s

I compiled the new version of TreeCompress for enwik9. I didn't test enwik8 again with it.

It seems like since your program is designed for 32 bits you could use ints instead of pointers so the 64 bit version doesn't take more space.
Last edited by Matt Mahoney; 7th October 2014 at 01:01.
Reply With Quote Reply With Quote
Thanks:

Kennon Conrad (7th October 2014)
7th October 2014, 06:32 #346
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by Matt Mahoney View Post

I didn't look at the source code but the end of output for TreeCompress looked like this:

Code:

Common prefix scan 16/19 4d6238.0 - 57d925.78b78b Created 4791050 of 21753068 string nodes Common prefix scan 17/19 57d926.0 - 641ed3.78b78b Created 4773095 of 21753068 string nodes Common prefix scan 18/19 641ed4.0 - 731f64.78b78b Created 4756924 of 21753068 string nodes Common prefix scan 19/19 731f65.0 - 78b78b.78b78b Created 1638985 of 21753068 string nodes Common prefix scan 19/19 731f65.0 - 78b78b.78b78b, score[30000] = 22.65 /home/matt/bin/testtree: line 4: 19521 Segmentation fault (core dumped) TreeCompress 1ドル.tpre 1ドル.tcom real 3770m46.907s user 3757m50.924s sys 2m20.234s

I compiled the new version of TreeCompress for enwik9. I didn't test enwik8 again with it.

It seems like since your program is designed for 32 bits you could use ints instead of pointers so the 64 bit version doesn't take more space.

Thanks, that's all I needed to understand what happened. The program crashed when building the new dictionary entries data structure for one-pass substitution of new dictionary entries because the pointers in the structure are twice as big as before.

I'm not sure that 64 bit programs are guaranteed to allocate memory with addresses <4G. I have changed most of the structure pointers to structure indexes, which gets the memory usage back down but requires an address calculation when the index is used.

I will try to post an updated 64-bit version, but it won't be right away. I need to make a few more changes to make the process identical.
Reply With Quote Reply With Quote
8th October 2014, 08:32 #347
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

Thanks, that's all I needed to understand what happened. The program crashed when building the new dictionary entries data structure for one-pass substitution of new dictionary entries because the pointers in the structure are twice as big as before.

I'm not sure that 64 bit programs are guaranteed to allocate memory with addresses <4G. I have changed most of the structure pointers to structure indexes, which gets the memory usage back down but requires an address calculation when the index is used.

I don't see any way around that when building for 64-bit. Pointers will be 8 bytes.

I will try to post an updated 64-bit version, but it won't be right away. I need to make a few more changes to make the process identical.
Reply With Quote Reply With Quote
8th October 2014, 10:12 #348
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

I don't see any way around that when building for 64-bit. Pointers will be 8 bytes.

The problem with creating a 64-bit version of TreeCompress v0.12 is that it won't produce identical results without increasing RAM usage and/or run time. The main reason for going to 64-bit code was to provide more RAM for compression. One of these days though, I need to try making a 64-bit version of TreeDecode. Using 64 bit values would make the bit stream handling much easier.

Last edited by Kennon Conrad; 8th October 2014 at 10:15.
Reply With Quote Reply With Quote
9th October 2014, 15:35 #349
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

The problem with creating a 64-bit version of TreeCompress v0.12 is that it won't produce identical results without increasing RAM usage and/or run time. The main reason for going to 64-bit code was to provide more RAM for compression. One of these days though, I need to try making a 64-bit version of TreeDecode. Using 64 bit values would make the bit stream handling much easier.

What's wrong with replacing pointers with 32-bit array indexes? That won't work if the array has more than 4G elements, but how often does that happen, anyway?

Array indexing might be slower than using pointers directly, but I'd try it before assuming it will be too slow. The machine code honestly might change less than you think and come out almost equivalent. Even if your assumptions are on target, it might cost less than you expect.

Last edited by nburns; 9th October 2014 at 15:39.
Reply With Quote Reply With Quote
9th October 2014, 19:38 #350
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

What's wrong with replacing pointers with 32-bit array indexes? That won't work if the array has more than 4G elements, but how often does that happen, anyway?

Array indexing might be slower than using pointers directly, but I'd try it before assuming it will be too slow. The machine code honestly might change less than you think and come out almost equivalent. Even if your assumptions are on target, it might cost less than you expect.

My only issue is that I think slowing down 32 bit code so that 64 bit code (that won't run faster) can produce the same results is silly. The time penalty is not that much.
Reply With Quote Reply With Quote
12th October 2014, 00:45 #351
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

My only issue is that I think slowing down 32 bit code so that 64 bit code (that won't run faster) can produce the same results is silly. The time penalty is not that much.

When AMD came out with the 64-bit extensions to x86, they were supposed to have other overdue improvements, like access to additional registers. So there may be other benefits to building for 64-bits.
Reply With Quote Reply With Quote
12th October 2014, 01:13 #352
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

When AMD came out with the 64-bit extensions to x86, they were supposed to have other overdue improvements, like access to additional registers. So there may be other benefits to building for 64-bits.

For either my A8-5500 or i7-4790K with TreeCompress and TreeDecode using minGW, execution time has been longer with 64 bit executables. There are extra calculations for addresses that I haven't been able to get around yet. For TreeDecode, all I really want is to be able to shift 64 bit registers, but I can't seem to get there without ending up with a lot more overhead from what appears to be relative addressing.
Reply With Quote Reply With Quote
12th October 2014, 03:31 #353
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

For either my A8-5500 or i7-4790K with TreeCompress and TreeDecode using minGW, execution time has been longer with 64 bit executables. There are extra calculations for addresses that I haven't been able to get around yet. For TreeDecode, all I really want is to be able to shift 64 bit registers, but I can't seem to get there without ending up with a lot more overhead from what appears to be relative addressing.

That's possible. But it can be hard to tell where the problem really is without lots of digging and experimentation. The calculations happen on the CPU. Going out to memory is dramatically slower, so usually optimizing memory accesses is the biggest concern.
Reply With Quote Reply With Quote
12th October 2014, 21:12 #354
Simon Berger

View Profile

View Forum Posts

Private Message
Simon Berger is offline
Member

Join Date

Oct 2007

Location

Germany, Hamburg

Posts

410

Thanks

0

Thanked 5 Times in 5 Posts
Hi Kennon,

I am mainly reading in this forum since many years.
you have written one of the most exciting algorithms / programs of the last years in my opinion.
It's something original and in it's own league performance-wise in what it does.
Because of this, please focus in optimizing its strength instead of turning it to something we already have a dozen times.
If you would be able to cut compression time by half (which shouldn't be realistic) and loose like up to 20% compression this algorithm wouldn't be anything interesting any more.

Surely a fork of this with new ideas would be cool but please stick to this with the strengths it has. More compression, faster decompression and faster compression without loosing in one of the first two categories would be welcome. ;)

EDIT:
By the way why didn't anyone move this thread to general yet?

Last edited by Simon Berger; 12th October 2014 at 21:20.
Reply With Quote Reply With Quote
13th October 2014, 05:27 #355
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by Simon Berger View Post

Hi Kennon,

I am mainly reading in this forum since many years.
you have written one of the most exciting algorithms / programs of the last years in my opinion.
It's something original and in it's own league performance-wise in what it does.
Because of this, please focus in optimizing its strength instead of turning it to something we already have a dozen times.
If you would be able to cut compression time by half (which shouldn't be realistic) and loose like up to 20% compression this algorithm wouldn't be anything interesting any more.

Surely a fork of this with new ideas would be cool but please stick to this with the strengths it has. More compression, faster decompression and faster compression without loosing in one of the first two categories would be welcome. ;)

EDIT:
By the way why didn't anyone move this thread to general yet?

Thanks for the feedback. I do intend to continue to optimize Tree's strengths, but also to investigate improving compession speed. I am convinced significantly faster algorithms that can produce files with nearly as much compression are possible. You are right, a 20% loss in compression would cause Tree to lose it's best/most unique quality, so that is not interesting to me.
Reply With Quote Reply With Quote
13th October 2014, 07:07 #356
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
I found a bug in TreeBitEncode that causes decoding of small files to crash, so it has been fixed. TreeDecode has been modified to use smaller and more appropriately sized variables, decreasing RAM usage and improving decoding speed by 3-4%.

Instead of making TreeCompress64 run like TreeCompress, but just slower, I went a completely different direction. TreeCompress64 is a very experimental version that merely represents a sidebranch of a path toward linear decision making. The intent is to provide an indication of how much the compression ratio will be impacted by making decisions using only partial data and using a new scoring formula more tailored to linear compression.

Main functional changes:
1. All input symbols are converted to Tree's internal 32-bit token codes when the input is read and converter back when the output is written rather than converting symbols on the fly when the common prefix trees are built. This causes the program to use significantly more RAM to store the data but allows the code to be simpler and improves execution speed.
2. Maximum RAM usage is approximately 6x the input file size rather than about 1.7 GB maximum.
3. Common prefix trees are not built in sections. Instead, the program builds as much of a common prefix tree as will fit in memory and makes tokenization decisions based on only the window of data that was used to build the tree.
4. The scoring formula has been adjusted to promote high frequency strings more/long strings less than before.
5. Token creation management only allows an average of about 3000 unusable symbols per cycle instead of 7500.
6. Adjustments for two instance matches that are near each other have been eliminated.

When compressing enwik9, the first 5 symbols created represent the following strings. The window size for the first decision cycle is about 19 MB. "C" is the capital encode symbol.

1:
</text>
</revision>
</page>
<page>
<title>C
2:
&quote;
3:
</comment>
<text xml:space="preserve">
4:
Cz</timestamp>
<contributor>
<
5.
[http://www.

One other note is that my results have not been 100% consistent. I think they may depend on the runtime libraries.

Results for enwik8:
TreeCompress => 21,976,316 bytes, encode 3916 sec., decode 0.78 sec.
TreeParagraphs + TreeWords + TreeCompress => 22,075,700 bytes, encode 1906 sec., decode 0.76 sec.
TreeCompress64 => 22,196,288 bytes, encode 931 sec., decode 0.76 sec.
TreeParagraphs + TreeWords + TreeCompress64 => 22,304,524 bytes, encode 730 sec., decode 0.74 sec.

Results for enwik9:
TreeCompress => 177,340,072 bytes, encode 116,473 sec., decode 7.5 sec.
TreeParagraphs + TreeWords + TreeCompress => 178,774,864 bytes, encode 36,652 sec., decode 7.3 sec.
TreeCompress64 => 180,516,660 bytes, encode 30,147 sec., decode 7.6 sec.
TreeParagraphs + TreeWords + TreeCompress64 => 180,941,504 bytes, encode 20,834 sec., decode 7.4 sec.

TreeDecode.zip is 10,525 bytes.

Overall results are 45% - 75% faster compression at the cost of 1% - 2% of compressed file size and more RAM for enwik9, but less RAM for enwik8. This is encouraging for single pass compression because it appears that making tokenization decisions based on a small initial window does not hurt compression too much. This is significant for the algorithm I have in mind because I don't have nearly enough RAM to fit all of enwik9's common prefix tree in a memory until a lot of token substitutions have been made.

Do not try TreeCompress64 if your computer doesn't easily have 6x the file size available. It will pretty much freeze everything up.

Attached Files Attached Files

File Type: zip Tree_v0.13.zip (282.8 KB, 227 views)
Reply With Quote Reply With Quote

13th October 2014, 21:37 #357

Matt Mahoney

Matt Mahoney is offline

Expert

Matt Mahoney's Avatar

Join Date: May 2008
Location: Melbourne, Florida, USA
Posts: 3,271
Thanks: 315; Thanked 841 Times in 506 Posts

A 64 bit shift should be faster with a 64 bit compile. The function shift() is 10 instructions in 32 bits and 3 instructions in 64 bit. I cheated a bit by making the shift amount the first parameter so it is already in RCX (at least in Windows). Otherwise there is an extra instruction to move it there.
Code:
C:\tmp>type a.c
unsigned long long shift(int x, unsigned long long a) {
 return a << x;
}
C:\tmp>gcc -O3 -S -m32 a.c
C:\tmp>type a.s
 .file "a.c"
 .text
 .p2align 4,,15
 .globl _shift
 .def _shift; .scl 2; .type 32; .endef
_shift:
 movl 4(%esp), %ecx
 movl 8(%esp), %eax
 movl 12(%esp), %edx
 shldl %cl, %eax, %edx
 sall %cl, %eax
 testb 32,ドル %cl
 je L2
 movl %eax, %edx
 xorl %eax, %eax
L2:
 rep ret
 .ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
C:\tmp>gcc -O3 -S -m64 a.c
C:\tmp>type a.s
 .file "a.c"
 .text
 .p2align 4,,15
 .globl shift
 .def shift; .scl 2; .type 32; .endef
 .seh_proc shift
shift:
 .seh_endprologue
 movq %rdx, %rax
 salq %cl, %rax
 ret
 .seh_endproc
 .ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Also, I posted your results to LTCB. http://mattmahoney.net/dc/text.html#1773

Reply With Quote Reply With Quote

14th October 2014, 07:55 #358

Kennon Conrad

Kennon Conrad is offline

Member

Join Date: Jan 2014
Location: Bothell, Washington, USA
Posts: 704
Thanks: 156; Thanked 186 Times in 109 Posts

Quote Originally Posted by Matt Mahoney View Post
A 64 bit shift should be faster with a 64 bit compile. The function shift() is 10 instructions in 32 bits and 3 instructions in 64 bit. I cheated a bit by making the shift amount the first parameter so it is already in RCX (at least in Windows). Otherwise there is an extra instruction to move it there.
Code:
C:\tmp>type a.c
unsigned long long shift(int x, unsigned long long a) {
 return a << x;
}
C:\tmp>gcc -O3 -S -m32 a.c
C:\tmp>type a.s
 .file "a.c"
 .text
 .p2align 4,,15
 .globl _shift
 .def _shift; .scl 2; .type 32; .endef
_shift:
 movl 4(%esp), %ecx
 movl 8(%esp), %eax
 movl 12(%esp), %edx
 shldl %cl, %eax, %edx
 sall %cl, %eax
 testb 32,ドル %cl
 je L2
 movl %eax, %edx
 xorl %eax, %eax
L2:
 rep ret
 .ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
C:\tmp>gcc -O3 -S -m64 a.c
C:\tmp>type a.s
 .file "a.c"
 .text
 .p2align 4,,15
 .globl shift
 .def shift; .scl 2; .type 32; .endef
 .seh_proc shift
shift:
 .seh_endprologue
 movq %rdx, %rax
 salq %cl, %rax
 ret
 .seh_endproc
 .ident "GCC: (rev5, Built by MinGW-W64 project) 4.8.1"
Also, I posted your results to LTCB. http://mattmahoney.net/dc/text.html#1773
Thanks. I would like to use 64 bit code for shifts, but the problem is that 64-bit code is expanding the code in lots of other places. For instance, the following code
Code:
 temp_input_word = temp_input_word >> 21;
 symbol_bits += symbol_bits_table[temp_input_word];
produces this when compiled for 32 bits:
Code:
 shrl 21,ドル %esi
 movzbl _symbol_bits_table(%esi), %eax
 addb %al, _symbol_bits
and this when compiled for 64 bits:
Code:
 leaq symbol_bits_table(%rip), %rax
 shrl 21,ドル %ebx
 movzbl (%rax,%rbx), %eax
 addb %al, symbol_bits(%rip)

Reply With Quote Reply With Quote

14th October 2014, 08:21 #359
nburns

View Profile

View Forum Posts

Private Message
nburns is offline
Member

Join Date

Feb 2013

Location

San Diego

Posts

1,057

Thanks

54

Thanked 74 Times in 57 Posts
Quote Originally Posted by Kennon Conrad View Post

Thanks. I would like to use 64 bit code for shifts, but the problem is that 64-bit code is expanding the code in lots of other places. For instance, the following code

Code:

temp_input_word = temp_input_word >> 21; symbol_bits += symbol_bits_table[temp_input_word];

produces this when compiled for 32 bits:

Code:

shrl 21,ドル %esi movzbl _symbol_bits_table(%esi), %eax addb %al, _symbol_bits

and this when compiled for 64 bits:

Code:

leaq symbol_bits_table(%rip), %rax shrl 21,ドル %ebx movzbl (%rax,%rbx), %eax addb %al, symbol_bits(%rip)

What's wrong with the 64-bit version? Does that code run slower? The difference is that the 64-bit version moves the table address into a register, and the 32-bit version uses it directly from memory. I'm not sure why, but it might reflect a deliberate change in the instruction set. The new version is more RISC-y.
Reply With Quote Reply With Quote
14th October 2014, 09:33 #360
Kennon Conrad

View Profile

View Forum Posts

Private Message
Kennon Conrad is offline
Member

Join Date

Jan 2014

Location

Bothell, Washington, USA

Posts

704

Thanks

156

Thanked 186 Times in 109 Posts
Quote Originally Posted by nburns View Post

What's wrong with the 64-bit version? Does that code run slower? The difference is that the 64-bit version moves the table address into a register, and the 32-bit version uses it directly from memory. I'm not sure why, but it might reflect a deliberate change in the instruction set. The new version is more RISC-y.

The 64-bit version of TreeDecode takes 7% longer to run. I suspect it is mostly from the extra instructions like the one above. There are a lot of them. I think it would more than wipe out the savings from being able to efficiently perform shifts on 64 bit values. I wonder if I need a different compiler.

Last edited by Kennon Conrad; 14th October 2014 at 09:36.
Reply With Quote Reply With Quote

« Previous Thread | Next Thread »

Similar Threads

How to detect that a PNG has an alpha channel from command line?

By m^2 in forum The Off-Topic Lounge

Replies: 4
Last Post: 2nd December 2012, 03:55
Suffix Tree's internal representation

By Piotr Tarsa in forum Data Compression

Replies: 4
Last Post: 18th December 2011, 08:37
M03 alpha

By michael maniscalco in forum Data Compression

Replies: 6
Last Post: 10th October 2009, 01:31
PIM 2.00 (alpha) is here!!!

By encode in forum Forum Archive

Replies: 46
Last Post: 14th June 2007, 20:27
PIM 2.00 (alpha) overview

By encode in forum Forum Archive

Replies: 21
Last Post: 8th June 2007, 14:41

Posting Permissions

You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
[VIDEO] code is On
HTML code is Off

Forum Rules

All times are GMT +3. The time now is 21:40.

Thread: Tree alpha v0.1 download

Thanks (2):

Thanks (3):

Thanks (2):

Thanks:

Similar Threads

How to detect that a PNG has an alpha channel from command line?

Suffix Tree's internal representation

M03 alpha

PIM 2.00 (alpha) is here!!!

PIM 2.00 (alpha) overview

Posting Permissions