Skip to main content
Code Review

Return to Revisions

2 of 2
Commonmark migration

Sort your corpus

Before processing your wordlist, sort your corpus into a dictionary by character. For each character, if a corpus entry contains that letter, it gets added to the dictionary entry for that character.

[A]: A, AB, ABC, ABCD
[B]: ABC, BC, BCD, ABCD
[C]: BC, C, CD, BCD, ABCD
[D]: CD, BCD, ABCD, D, DD
etc

For each word you process, get the unique/distinct list of characters (for example, by throwing it into a set).

When checking for matches, you will now be checking against a much smaller list of corpas, which should speed things up a little

Zack
  • 1.9k
  • 11
  • 13
default

AltStyle によって変換されたページ (->オリジナル) /