The purpose of this code is to find the largest palindrome in a string. It works fine when the string is short, but every time I test this code with 1000+ characters in a string, it takes forever to give me the palindrome in it.
PalindromeFinder::PalindromeFinder(){}
string PalindromeFinder::toString(){
stringstream ss;
ss << "largest palindrome seen so far is \"" << this->
getLargestPalindromeFound();
ss << "\" with size " << this->getSizeOfLargestPalindromeFound();
return ss.str();
}
string PalindromeFinder::getLargestPalindromeFound(){
return this->largestPalindromeFound;
}
int PalindromeFinder::getSizeOfLargestPalindromeFound(){
return this->largestPalindromeFound.length();
}
string str="djclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjssssssevessssssfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjfdjclsodjf";
truncateToLargestPalindrome(str);
void PalindromeFinder::truncateToLargestPalindrome(string& inputString){
string empty="";
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::isspace), inputString.end());
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::ispunct), inputString.end());
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::isdigit), inputString.end());
for(int i = 0; i < inputString.length(); i++){
for (int j = inputString.length(); j>=i; j--) {
string subStr = inputString.substr(i, j-i);
if(isPalindrome(subStr) && subStr.length() > empty.length())
empty = subStr;
}
}
}
How can I improve my code so it can handle large strings more quickly?
2 Answers 2
Let's see how your code computes out...
You loop for every position in the input string. Then, you find each possible string starting from there. So, for 1000 characters, that's 1000 + 999 + 998, etc. Which turns out to be \$\frac{1000^2}{2}\$ - or a half-million strings that you check.
Now, for each of those string values, you then check to see if they are a palindrome. Presumably, for each of those strings, in the isPlaindrome
method, you compare the string to the reverse of the string, which means checking each character again.
That's another few iterations of each string, meaning your code ends up with a complexity in the order of cubic: \$O(N^3)\$.
Shavings
Now, there's ways to improve the code quite easily, for example, there's no point in checking (or even substringing) anything that's shorter than the current longest palindrome. In other words, why test 5 letter substrings if you've already found a 6 letter one? That will reduce the work you do, but, it will still be pretty slow.
Creating a string instance for each candidate is also not a great solution - you should find a way to do the palindrome checking on the original string, without making copies of sub-parts.
Still, I can't help but feel that the better solution is to reduce the computational complexity a lot, which makes scaling the code much better.
Algorithm
My suggestion is to change the algorithm. Your current one checks every possible substring in the input to find a palindrome. A better solution would be to only test strings which are "almost known" to be palindromes.
A palindrome is a word where the last half is a reflection of the first (in terms of letters). This implies that there is a middle of the reflection. If you can start at the middle of the reflection, and work outwards, until the reflection is broken, then you can compare what works with what you have already found, and move on.
Since there are even and odd sized palindromes, there are two types of "middle": two characters at the middle for even, and 1 character for odd sized palindromes.
There can be just one "middle" at each character in the input. If you can determine how wide the palindrome is spanning each middle, then you are one step ahead. At worst case, if the string is entirely 1 letter, like aaaaaaaaaaaaa
, then you will have to check a lot of palindromes, and the worst-case performance is \$O(n^2)\$ (which is still a bunch better than your cubic approach). On the other hand, a reasonably random input will typically have mostly 1-letter palindromes, and they will test out very quickly, so a best-case complexity is just \$O(n)\$
I took the liberty of writing up a solution using the approach I describe:
std::string largestPalindrome(std::string input) {
std::string largest = "";
// offset is O or 1 for odd, and even sized palindromes.
for (int offset = 0; offset <= 1; offset++) {
for (int mid = 0; mid + offset < input.length(); mid++) {
// max is the space between the mid and the end
int max = input.length() - offset - mid;
// ... unless the mid is closer to the beginning than the end.
max = max > mid ? mid : max;
for (int i = 0; i < max; i++) {
if (input[mid - i] == input[mid + offset + i]) {
if (i + i + offset + 1 > largest.length()) {
largest = input.substr(mid - i, i + i + offset + 1);
}
} else {
break;
}
}
}
}
return largest;
}
Note, the mid-loop runs only twice (once for even, once for odd). The inner i
loop will tend to be pretty small because it only runs while the strings centered on the mid point are actually palindromes.... and large palindromes will be the exception, not the rule.
There's only 1 big loop, in other words.
Additionally, substr
is only called when there's a larger palindrome than previously seen... it's an uncommon call.
The above code runs the example str
input in 3 milliseconds on my computer (and you can see it running in ideone too)
truncateToLargestPalindrome
void PalindromeFinder::truncateToLargestPalindrome(string& inputString);
And let's start with... why? Your class is called PalindromeFinder. Presumably it should find things. Why is it altering input? And your code isn't truncating inputString
to the "largest palindrome" anyway, it's dropping spaces, digits, and punctuation marks. The signature of this function should likely be:
std::string PaldinromeFinder::findLargestPalindrome(std::string inputString);
Avoid using namespace std;
We take by value so that we can modify it internally.
Now, once we're there, you have three erase-remove-if calls:
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::isspace), inputString.end());
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::ispunct), inputString.end());
inputString.erase(remove_if(inputString.begin(), inputString.end(), ::isdigit), inputString.end());
It will be more efficient to simply do this once and just aggregate the three predicates together. In C++03, just write your own function. In C++11, we can use a lambda. Also, if you add spacing, it'll make the code much easier to read:
inputString.erase(
std::remove_if(inputString.begin(),
inputString.end(),
[](char c){
return ::isdigit(c) || ::isspace(c) || ::ispunct(c);
}),
inputString.end()
);
You probably also want to make it lower-case for good measure, unless you don't want to consider something like "Bb"
a palindrome.
The algorithm also has serious issues, and rolfl covered an example improvement. You can also look into Manacher's algorithm, which will solve this problem in linear time but is certainly not an intuitive solution to the problem.
Rest of interface
You have these two other member functions:
string PalindromeFinder::getLargestPalindromeFound();
int PalindromeFinder::getSizeOfLargestPalindromeFound();
First, I would argue that neither should exist as you should just have a single find
function that takes an input and returns it. However, if you insist on storing the result locally (which makes your class harder to use), then you should still only have one single getter:
std::string const& PalindromeFinder::getLargestPalindromeFound() const;
Notice my different signature. I'm returning the string
by reference-to-const, because there's no need to copy it. The function itself is const
, as it will not modify the class' internals and you should make functions const
where possible. And lastly, you don't need the additional size()
function. The string
returned will know its own size, so it adds no value while adding potential for bugs in the future.
Explore related questions
See similar questions with these tags.
PalindromeFinder
class, so that users can better review your program? \$\endgroup\$isPalindrome
method? \$\endgroup\$