I have a degenerate string S1=xadcdax
, where x
can be replaced by any of the four characters a
, b
, c
, or d
. This S1
string is match with another non-degerate string S2=dcbaa
.
I can first use brute force to find all possible strings of S1
, which requires 4^n
(n
is the number of degenerates in a string), so the complexity is 4^2
. After that, for each string of S1
matched with S2
by using dynamic programming which requires (i x j)
running time (where i
is the length of S1
and j
is the length of S2
).
But the total running time will be 4^n x ij = 4ij^n
, which is quite slow.
Is there a more efficient algorithm for this?
-
2What is a degenerate string? I haven't heard that term before.Pubby– Pubby2011年10月07日 02:33:33 +00:00Commented Oct 7, 2011 at 2:33
-
degenerate mean certain position of a string is dont know, however, its position can be replaced by certain number of charater.heppy– heppy2011年10月07日 03:08:22 +00:00Commented Oct 7, 2011 at 3:08
-
Hi heppy, I tried to clean up your question to be clearer about what you're asking, but I'm not 100% sure on the title. Feel free to revise.user8– user82011年10月07日 03:46:04 +00:00Commented Oct 7, 2011 at 3:46
-
To those who voted to migrate this to Stack Overflow: questions about algorithms are explicitly on-topic here. Stack Overflow is for programming, not concept questions.user8– user82011年10月07日 04:19:26 +00:00Commented Oct 7, 2011 at 4:19
-
@MarkTrapp - According to the Stack Overflow, the site is also for "a software algorithm". Looks like we have two places for the same thing.rjzii– rjzii2011年10月07日 17:28:53 +00:00Commented Oct 7, 2011 at 17:28
1 Answer 1
EDIT:
This feels like a modified Boyer-Moore string search:
http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_search_algorithm
You're basically trying to prove that matching S2 to different positions of S1 won't work with any set of substitutions (or won't work with the known characters), which allows you to skip them and move on.
In your specific example, you know that you need to match 5 chars. Your string S1 has 2 places where it could be anything, with 5 places in the middle. So, if you can match the left 4 characters in S2 to something that doesn't include an x in S1, or the right 4, you can then test the last character only. (Neither works here.)
xadcdax
?cbaa
dcba?
If you had a string of 5 x's "xxxxx" and all of the characters in S2 where also found in the list of acceptable characters, then you have an automatic match.
xxxxx
?????
To make this generic, you could search for all occurrences of x in S1, then attempt to match on the left or right side to S2, then fill in the blanks as needed. If you had 2 occurrences of x in a row, you'd only need to search the left or right 3 chars of S2 (in your specific case.) To search in the middle, you end up sliding S2 across the x found and attempting to match to the left and right.
??????x??????
dcba?
dcb?a
dc?aa
d?baa
?cbaa
????x??x???
dcba?
dcb?a
dc?aa
d?ba?
?cb?a
dc?aa
d?baa
?cbaa
Only when the characters in S1 that are specific match do you then need to do substitutions.
If all characters in your substitution list are in S2, you don't need to do substitutions at all, the string matches the position.
You can skip lining up positions where S2 has a character not in the substitution list that lands on a degenerate character.
I don't have any pseudo code, so I'm not sure of running time.
-
I'm not really understand the answer you wrote...the answer u wrote is just specify for the example S1 and S2 i gave...how about if i change the S1 and S2? eg: S1:xcax S2: bcbad...please clear my doubt..thanks..heppy– heppy2011年10月07日 04:08:54 +00:00Commented Oct 7, 2011 at 4:08
-
With s1:xcax and s2 of length 5, I wouldn't think that could match. Unless you are trying to find s1 inside of s2? In which case I wrote about the opposite, finding s2 inside of s1.McLeopold– McLeopold2011年10月07日 17:43:39 +00:00Commented Oct 7, 2011 at 17:43