9/30/2011
09-30-11 - String Match Results Part 5 + Conclusion
As they should, they have very constant time operation that goes up pretty steadily from Tornado -3 to -7, because there's a constant number of hash probes per match attempt.
totals : match len : clocks Test_MMC1 : 0.663028 , 231.254818 Test_Hash1 : 0.662718 , 64.888003 Test_Tornado_3 : 0.630377 , 19.658834 Test_Tornado_4 : 0.593174 , 28.456055 Test_Tornado_5 : 0.586540 , 40.546146 Test_Tornado_6 : 0.580042 , 56.841156 Test_Tornado_7 : 0.596584 , 141.432393There may be something wrong with my Tornado wrapper as the -3 matcher actually finds the longest total length. I dunno. The speeds look reasonable. I don't really care much about these approximate matchers because the loss is hard to quantify, so there you go (normally when I see an anomaly like that I would investigate it to make sure I understand why it's happening).
0 = ares_merged.gr2_sec_5.dat 1 = bigship1.x 2 = BOOK1 3 = combined.gr2_sec_3.dat 4 = GrannyRocks_wot.gr2 5 = Gryphon_stripped.gr2 6 = hetero50k 7 = lzt20 8 = lzt21 9 = lzt22 10 = lzt23 11 = lzt24 12 = lzt25 13 = lzt27 14 = lzt28 15 = lzt29 16 = movie_headers.bin 17 = orange_purple.BMP 18 = predsave.bin
Conclusion : I've got to get off string matching so this is probably the end of posts on this topic.
MMC looks promising but has some flaws. There are some cases that trigger a slowness spike in it. Also it has some bad O(N^2) with unbounded match length ("MMC2") so I have to run it with a limit ("MMC1") which removes some of its advantage over LzFind and Hash1 and other approximate matchers. (without the limit it has the advantage of being exact). It's also a GPL at the moment which is a killer.
LzFind doesn't have anything going for it really.
For approximate/small-window matching I don't see any reason to not use the classic Zip hash chain method. I tried a few variants of this, like doing a hash chain to match the first 4 bytes and then link listing off that, and all the variants were worse than the classic way.
For large window / exact matching / optimal parsing, a correct O(N) matcher is the way to go. The suffix-array based matcher is by far the easiest for your kids to implement at home.
old rants
-
2018
(34)
- October (8)
- August (1)
- July (1)
- June (3)
- May (7)
- April (3)
- March (5)
- February (5)
- January (1)
-
2015
(33)
- December (1)
- November (1)
- October (1)
- September (1)
- July (1)
- June (1)
- May (8)
- March (6)
- February (11)
- January (2)
-
2014
(41)
- December (8)
- November (2)
- September (1)
- August (2)
- July (2)
- June (4)
- March (6)
- February (14)
- January (2)
-
2013
(41)
- December (1)
- November (2)
- October (2)
- September (2)
- August (5)
- June (1)
- May (3)
- April (7)
- March (11)
- February (6)
- January (1)
-
2012
(78)
- December (8)
- November (5)
- October (11)
- September (21)
- August (5)
- July (7)
- June (7)
- May (3)
- April (2)
- March (8)
- January (1)
-
2011
(134)
- December (7)
- November (9)
- October (13)
-
September
(16)
- 09-30-11 - String Match Results Part 5 + Conclusion
- 09-30-11 - String Match Results Part 4
- 09-30-11 - String Match Results Part 3
- 09-30-11 - String Match Results Part 2b
- 09-30-11 - String Match Results Part 2
- 09-30-11 - String Match Results Part 1
- 09-30-11 - Don't use memset to zero
- 09-29-11 - Suffix Tries 3 - On Follows with Path C...
- 09-28-11 - String Matching with Suffix Arrays
- 09-28-11 - Algorithm - Next Index with Lower Value
- 09-27-11 - String Match Stress Test
- 09-26-11 - Tiny Suffix Note
- 09-25-11 - More on LZ String Matching
- 09-24-11 - Suffix Tries 2
- 09-24-11 - Suffix Tries 1
- 09-23-11 - Morphing Matching Chain
- August (8)
- July (31)
- June (13)
- May (4)
- April (6)
- March (7)
- February (8)
- January (12)
-
2010
(154)
- December (7)
- November (14)
- October (27)
- September (20)
- August (26)
- July (14)
- June (15)
- May (16)
- April (2)
- March (4)
- February (2)
- January (7)
No comments:
Post a Comment