Return to Revisions

2 of 2

Commonmark migration

edited Jun 10, 2020 at 13:24

Sort your corpus

Before processing your wordlist, sort your corpus into a dictionary by character. For each character, if a corpus entry contains that letter, it gets added to the dictionary entry for that character.

[A]: A, AB, ABC, ABCD
[B]: ABC, BC, BCD, ABCD
[C]: BC, C, CD, BCD, ABCD
[D]: CD, BCD, ABCD, D, DD
etc

For each word you process, get the unique/distinct list of characters (for example, by throwing it into a set).

When checking for matches, you will now be checking against a much smaller list of corpas, which should speed things up a little

answered Mar 1, 2016 at 13:59

Zack

1.9k
11
13

default