Return to Answer

Commonmark migration

edited Jun 10, 2020 at 13:24

There's no link to the original problem description and so it is difficult to judge how tight the solution has to be algorithmically in order to meet the requirements.

200_success has already done an excellent job on the algorithmics. Should the performance of that solution not be enough (competition, coding challenge above beginner level) then there are several improvements that can be applied.

The approach chosen by 200_success divides the problem into two distinct subproblems:

find a possible starting location - i.e., an occurrence of any of the words
for each such candidate location, see if the tail contains the rest of the words in any order

The first part can be sped up by employing any of the usual search algorithms that are capable of handling multiple search patterns simultaneously, like the [Knuth-Morris-Pratt][1]Knuth-Morris-Pratt or [Boyer-Moore][2]Boyer-Moore. If there's no implementation of those at hand, performance can still be improved with a few heuristics inspired by these algorithms. Also, there are other algorithms - like [Shift-And][3]Shift-And a.k.a. bitap - that can deliver decent speed but are easier to implement.

The second part - matching the list of words at the candidate location - is most succinctly formulated in recursive form, as 200_success did: we declare a match if the word at the current location is in the list of words and its tail matches the list of words minus the current word.

A possible improvement here is to track the current state explicitly - for example in the form of a bitmap - instead of rewriting the list of words physically.

For small numbers of words (up to 64) this bitmap can be held in a machine word or integer variable. The zero-based word list index of the word at the current location can thus easily be checked; if that bit in the state bitmap is not yet set then we set it and continue matching the rest. If it is set already then we have a sequence error (two occurrences of the same word). At this point the whole match procedure needs to be reset and restarted at the position after the first occurrence of this word in the current working window.

Of course, there's no need to actually do all the work again, since it can be simulated by manipulating the bitmap and working position. At least that's the basic idea. The devil is in the details; before this optimisation can applied the list of words needs to be checked regarding the consequences of shifted self-overlap. Otherwise the safe route should be chosen, which is to restart the matching procedure at the character right after the current start position.

Efficient matching of words against the list can be arranged in any number of ways; the easiest would be the use of a hash map (so that each of the w words can be associated with an index in the range 0 to w-1). An efficiently implemented word matching logic can be a good approximation for a full-scale optimised search algorithm like KMP for finding candidate locations; it all depends on the runtime requirements which were not specified. [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm [2]: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm [3]: https://en.wikipedia.org/wiki/Bitap_algorithm

There's no link to the original problem description and so it is difficult to judge how tight the solution has to be algorithmically in order to meet the requirements.

The approach chosen by 200_success divides the problem into two distinct subproblems:

find a possible starting location - i.e., an occurrence of any of the words
for each such candidate location, see if the tail contains the rest of the words in any order

The first part can be sped up by employing any of the usual search algorithms that are capable of handling multiple search patterns simultaneously, like the [Knuth-Morris-Pratt][1] or [Boyer-Moore][2]. If there's no implementation of those at hand, performance can still be improved with a few heuristics inspired by these algorithms. Also, there are other algorithms - like [Shift-And][3] a.k.a. bitap - that can deliver decent speed but are easier to implement.

A possible improvement here is to track the current state explicitly - for example in the form of a bitmap - instead of rewriting the list of words physically.

There's no link to the original problem description and so it is difficult to judge how tight the solution has to be algorithmically in order to meet the requirements.

The approach chosen by 200_success divides the problem into two distinct subproblems:

find a possible starting location - i.e., an occurrence of any of the words
for each such candidate location, see if the tail contains the rest of the words in any order

The first part can be sped up by employing any of the usual search algorithms that are capable of handling multiple search patterns simultaneously, like the Knuth-Morris-Pratt or Boyer-Moore. If there's no implementation of those at hand, performance can still be improved with a few heuristics inspired by these algorithms. Also, there are other algorithms - like Shift-And a.k.a. bitap - that can deliver decent speed but are easier to implement.

A possible improvement here is to track the current state explicitly - for example in the form of a bitmap - instead of rewriting the list of words physically.

Source Link

answered Mar 31, 2016 at 9:47

DarthGizka

answered Mar 31, 2016 at 9:47

DarthGizka

2.8k
1
13
30

There's no link to the original problem description and so it is difficult to judge how tight the solution has to be algorithmically in order to meet the requirements.

The approach chosen by 200_success divides the problem into two distinct subproblems:

find a possible starting location - i.e., an occurrence of any of the words
for each such candidate location, see if the tail contains the rest of the words in any order

The first part can be sped up by employing any of the usual search algorithms that are capable of handling multiple search patterns simultaneously, like the [Knuth-Morris-Pratt][1] or [Boyer-Moore][2]. If there's no implementation of those at hand, performance can still be improved with a few heuristics inspired by these algorithms. Also, there are other algorithms - like [Shift-And][3] a.k.a. bitap - that can deliver decent speed but are easier to implement.

A possible improvement here is to track the current state explicitly - for example in the form of a bitmap - instead of rewriting the list of words physically.

default