Find all occurrences of all permutations of a shorter string within a longer string

Question 1

Input is two strings, the first string being the long string and the second smaller string being the permuted string.

The output should return the starting indices of all occurrences as well as the permutation of the substring in the longer string.

Case 1)

permutationsInString('abcdcbadefbac', 'abc')
sample output: { '0': 'abc', '4': 'cba', '10': 'bac' }

Case 2)

 permutationsInString('abcabacba', 'abc')
sample output: { '0': 'abc',
'1': 'bca',
'2': 'cab',
'4': 'bac',
'5': 'acb',
'6': 'cba' }

Case 3)

permutationsInString('a', 'abc')
sample output: {}

My solution is Naive and Brute Force, was wondering if there are any other possible ways to reduce the time and space complexity.

function permutationsInString(longString, shortString) {
 let hash = {}; // let hash store the resulting found permutation strings and their respective index in the long string
 let permutations_shortStrings = permutationArr(shortString) ; // permute the short string and store the values in an array called permutations_shortStrings
 for(let i = 0 ; i < permutations_shortStrings.length ; i++) { // iterate through the array of permutations to check if the long string contains any of the values
 if(longString.indexOf(permutations_shortStrings[i]) !== -1) // need to check for -1 as it indicates the substring cannot be found
 hash[longString.indexOf(permutations_shortStrings[i])] = permutations_shortStrings[i];
 }
 return hash ; 
}
function permutationArr(str) 
{ 
 var arr = str.split(''),
 permutations = []; 
 function swap(a, b)
 {
 var tmp = arr[a];
 arr[a] = arr[b];
 arr[b] = tmp;
 }
// Heap's algorithm implementation
 function generate(n) {
 if (n == 1) {
 permutations.push(arr.join().replace(/,/g, ''));
 } else {
 for (var i = 0; i < n; ++i) {
 generate(n - 1);
 swap(n % 2 ? 0 : i, n - 1);
 }
 }
 }
 generate(arr.length);
 return permutations;
}

Question 2

EDIT: (updated solution): I solved this problem using a sliding window approach. A good template for substring problems can be found here SlidingWindow. I also found this helpful for anagram verification between two strings.AnagramCheck.

Question 3

Whitespace, variable extraction, etc.

function permutationsInString(longString, shortString) {
 let hash = {}; // let hash store the resulting found permutation strings and their respective index in the long string
 let permutations_shortStrings = permutationArr(shortString) ; // permute the short string and store the values in an array called permutations_shortStrings
 for(let i = 0 ; i < permutations_shortStrings.length ; i++) { // iterate through the array of permutations to check if the long string contains any of the values
 if(longString.indexOf(permutations_shortStrings[i]) !== -1) // need to check for -1 as it indicates the substring cannot be found
 hash[longString.indexOf(permutations_shortStrings[i])] = permutations_shortStrings[i];
 }
 return hash ; 
}

What happened to the indentation? Why do some ; have a space before them and others not? Why the blank line before the closing }?

permutations_shortStrings[i] is a mouthful. Extracting a local variable with a short intelligible name would improve readability.

longString.indexOf(permutations_shortStrings[i]) is not only a bigger mouthful but also potentially very expensive. A common subexpression which takes non-constant time should definitely be extracted to a local variable.

The comments are largely noise. The only one which contributes to my understanding of the code is the one saying what hash is supposed to contain.

Bug

 permutations.push(arr.join().replace(/,/g, ''));

What output should I expect if I call permutationsArr(",;")? What do I actually get?

This bug can be fixed, and performance improved, as:

 permutations.push(arr.join(""));

Algorithm

My solution is Naive and Brute Force, was wondering if there are any other possible ways to reduce the time and space complexity.

There most certainly are.

Step back to a simpler task: given strings a and b which are guaranteed to be of the same length, is a a permutation of b? Write a solution which doesn't calculate any permutations. Hint:

Count the number of times each character occurs

Now apply that to your search problem. Consider longString.substring(0, shortString.length): is it a permutation of shortString? Now consider advancing one character through longString: you remove longString.charAt(i) and add longString.charAt(shortString.length + i). How can you update the data structures used to test whether it's a permutation of shortString in constant time? Hint:

In addition to counts of the characters, you might find it useful to have a count of the number of characters for which the difference is non-zero.

More sophisticated approaches (modelled on e.g. KMP) can allow you to skip more than one character at a time, but the one I've outlined already gets you down to linear time (subject to certain assumptions) so beyond that it's just a case of improving the constant factor.

Question 4

How would you use KMP to solve this problem? What kind of modifications would you have to make to the algorithm, specifically with the partial match table? I solved the problem using a modification of Rabin-Karp's algorithm

Question 5

When I was in college we used to do that with the so called KMP algorithm. It stands for Knuth-Morris-Pratt, it's an algorithm that does text research aka looking for string into string.

Well I think it's a bit complicated to implement, since I've done it once in C. But it's very efficient since the complexity is in worst case scenario Big oh O(length(longer_string)).

Once you got KMP's algorithm working, you'll just have to search for KMP's algorithm of all your permutation.

Question 6

That's \$O(m!\;n)\$ time, if I've understood correctly what you mean by the last sentence. The problem can be solved in \$O(m+n)\$ time.

Peter Taylor Peter Taylor 24.5k1 gold badge49 silver badges94 bronze badges · Accepted Answer · 2017-06-28 10:57:52Z

Whitespace, variable extraction, etc.

function permutationsInString(longString, shortString) {
 let hash = {}; // let hash store the resulting found permutation strings and their respective index in the long string
 let permutations_shortStrings = permutationArr(shortString) ; // permute the short string and store the values in an array called permutations_shortStrings
 for(let i = 0 ; i < permutations_shortStrings.length ; i++) { // iterate through the array of permutations to check if the long string contains any of the values
 if(longString.indexOf(permutations_shortStrings[i]) !== -1) // need to check for -1 as it indicates the substring cannot be found
 hash[longString.indexOf(permutations_shortStrings[i])] = permutations_shortStrings[i];
 }
 return hash ; 
}

What happened to the indentation? Why do some ; have a space before them and others not? Why the blank line before the closing }?

permutations_shortStrings[i] is a mouthful. Extracting a local variable with a short intelligible name would improve readability.

longString.indexOf(permutations_shortStrings[i]) is not only a bigger mouthful but also potentially very expensive. A common subexpression which takes non-constant time should definitely be extracted to a local variable.

The comments are largely noise. The only one which contributes to my understanding of the code is the one saying what hash is supposed to contain.

Bug

 permutations.push(arr.join().replace(/,/g, ''));

What output should I expect if I call permutationsArr(",;")? What do I actually get?

This bug can be fixed, and performance improved, as:

 permutations.push(arr.join(""));

Algorithm

My solution is Naive and Brute Force, was wondering if there are any other possible ways to reduce the time and space complexity.

There most certainly are.

Step back to a simpler task: given strings a and b which are guaranteed to be of the same length, is a a permutation of b? Write a solution which doesn't calculate any permutations. Hint:

Count the number of times each character occurs

Now apply that to your search problem. Consider longString.substring(0, shortString.length): is it a permutation of shortString? Now consider advancing one character through longString: you remove longString.charAt(i) and add longString.charAt(shortString.length + i). How can you update the data structures used to test whether it's a permutation of shortString in constant time? Hint:

In addition to counts of the characters, you might find it useful to have a count of the number of characters for which the difference is non-zero.

More sophisticated approaches (modelled on e.g. KMP) can allow you to skip more than one character at a time, but the one I've outlined already gets you down to linear time (subject to certain assumptions) so beyond that it's just a case of improving the constant factor.

How would you use KMP to solve this problem? What kind of modifications would you have to make to the algorithm, specifically with the partial match table? I solved the problem using a modification of Rabin-Karp's algorithm

Stack Exchange Network

Find all occurrences of all permutations of a shorter string within a longer string

2 Answers 2

Whitespace, variable extraction, etc.

Bug

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Find all occurrences of all permutations of a shorter string within a longer string

2 Answers 2

Whitespace, variable extraction, etc.

Bug

Algorithm

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions