I've been tasked to solve this exercise lately, and this is what I wrote:
object Main {
def isSubstringPalindrome(inp: String)(start: Int, end: Int): Boolean =
(start.compareTo(end), inp(start)==inp(end)) match {
case (0, _) => true
case (1, _) => true
case (-1, true) => isSubstringPalindrome(inp)(start+1, end-1)
case _ => false
}
def vecToPair[T](vec: IndexedSeq[T]): (T, T) = vec match {
case Vector(a, b) => (a, b)
}
def isContained(slice1: (Int, Int))(slice2: (Int, Int)): Boolean = (slice1, slice2) match{
case ((x, y), (w, z)) => x > w && y < z
}
def palindromes(inp: String): Seq[(Int, Int)] = {
val isPalindrome: ((Int, Int)) => Boolean = (isSubstringPalindrome(inp)_).tupled
val palindromeIndexes = inp.zipWithIndex
.groupBy(_._1).values // group by char
.map(_.map(_._2)).filter(_.length > 1) // retain sequence of indexes
.flatMap(_.combinations(2).map(vecToPair))
.filter(isPalindrome).toSeq
.sortBy{case (start, end)=> -(end+1-start)}
palindromeIndexes.filter(slice => ! palindromeIndexes.exists(isContained(slice)_))
}
def main(args: Array[String]) = {
val inp = if (args.isEmpty) "sqrrqabccbatudefggfedvwhijkllkjihxymnnmzpop" else args(0)
palindromes(inp).take(3)
.map{case (start, end) => (inp.slice(start, end+1), start, end+1-start)}.foreach{
case (str, start, length) => println(s"Text: $str, Index: $start, Length: $length")
}
}
}
I tried to care about conciseness and good functional style, tests/TDD wasn't supposedly a focus for this, and my algorithm should already be better than a naive \$ O \left( n^3 \right) \$ (the worst case of the string composed of the same letter all over is still pretty bad, though). The partially applied isSubstringPalindrome
also avoids to wastefully reallocate substrings, and only needs the indexes to work upon.
If performance is a concern, isSubstringPalindrome
can be trivially memoized, which should trade off memory for faster execution on very large strings.
With this taken care of, I've been told:
the solution should be correct, reliable, maintainable, reusable, portable and efficient.
My solution is obviously not ideal: I could've added some scaladoc to explain the code. Another trivial improvement could've been to factorize out end+1-start
into a length
function... that way I could've written something like .sortBy(length).reverse
or .sortBy(-length(_))
.
Also, it would've been nicer if .compareTo
in Scala would return a data Ordering = LT | EQ | GT
just like in Haskell: that way isSubstringPalindrome
would be more explicitly/guarantee in the fact that it's handling all cases: (string not fully checked for palindromeness yet: start < end
, odd length palindrome: start==end
, even length palindrome: end < start
)... but this is not really a shortcoming of my code.
Anyhow, given the algorithm, I think that the code itself is somewhat readable and clean. Functions of only 2/3 statements each, no temporal coupling, etc.
That said, it wasn't appreciated. My suspicion is that they expected some classes to be defined, and gain some insight on my data modeling thought process. If that's so, I'm a bit disappointed, because a test that asks to solve a principally algorithmic problem wouldn't/shouldn't naturally lead to define/create your own data types (anything already available in the stdlib should be enough)... something more real world might've been more appropriate.
I've been given this task by people using Scala professionally. Conversely, I never used Scala at work, so I'd expect that my code might be unidiomatic.
Do you have any other idea on how to improve, even by rewriting it completely with a different algorithm? One of the goals was efficiency, but performance shouldn't be a concern. What I'd like to see this code improved upon is in its readability, cleanliness and general good design.
2 Answers 2
EDIT First attempt at answering was not optimal, so you should read to the end EDIT
Well, it is a bit hard to say, what the people who tasked you with that wanted to see, but
imho that is a lot of code for a relatively simple problem. So, I guess the maintainability requirement might be the culprit.
Also, what is pretty important when doing scala professionally is a solid knowledge of scala collections, their API and maybe how to work with Iterators. E.g a palindrome check is easily done by
string == string.reverse
When working with tuples, accessing values with
._1
._2
and so on is horrible to read. I would almost always prefer to have the values extracted to local vals with meaningful names.Premature optimization is the root of all evil!
That said, here is how I would do something like this:
EDIT BEGIN
BAD, DON'T COPY, MUCH LESS PASTE!
EDIT END
@inline
private[this] def isPalindrome(inp: String): Boolean = inp == inp.reverse
def palindromes(inp: String) = {
(inp.length until 2 by -1).foldLeft(IndexedSeq.empty[String]) {
case (pals, palindromeLength) =>
pals ++ inp.sliding(palindromeLength, 1).collect {
case subString if isPalindrome(subString) && !pals.exists(_.contains(subString)) =>
// only retain substring if it is a palindrome
// and not a substring of an already found palindrome
subString
}.toSeq
}
}
I admit the foldLeft is not exactly easy to read, too, but it is quickly clarified in a review. Otherwise, for a first implementation it should be efficient enough.
There are probably far more string allocations, but unless reducing string allocations is not explicitly required, this code is, well, shorter and more maintainable.
Update/Edit
I'm sorry if I'm annoying anyone, but this didn't leave me alone because something just didn't feel right.
Probably, because my answer is incomplete at best. So here is what got me thinking:
My example code above is in fact quite inefficient as it does a lot of unnecessary work. It searches for palindromes by size starting at 2 ending at input string length. So the lookups for all sizes from longest palindrome length to input string length are just burning cpu time.
The original code is much more efficient. However, there is still one advantage to my code: During a review process (which should be pretty common by now) the flaw is obvious enough that there is a good chance a reviewer will find it. So (imo) it is no crime to write bad code as long as you write it readable.
Thus, from what you wrote in your comments I guess that this
val palindromeIndexes = inp.zipWithIndex
.groupBy(_._1).values // group by char
.map(_.map(_._2)).filter(_.length > 1) // retain sequence of indexes
.flatMap(_.combinations(2).map(vecToPair))
.filter(isPalindrome).toSeq
.sortBy{case (start, end)=> -(end+1-start)}
is what got the bad evaluation, because one has to burn a lot of sugar (and more importantly time) to wrap ones brain around what you are doing here (well, at least I had to, and for my first attempt at an answer I kinda didn't). Now I even did some mini benches.
Now, here are some additional review points on your code:
- The least you could do to improve readability of the code is to extract each statement to a local val with a meaningful name.
- Concerning efficiency, you are instantiating a lot of Vectors, that should be avoided (for input string lengths around 5000 you quickly run into an OOME).
- I already said something about the tuples, but there is another point to the; Tuples are
case classes
, too. Using simple classes to wrap the indices might be slightly cheaper. - Your
printlns
cut of the last character of each palindrome.
And I cannot help myself, I have to offer an alternative solution, that, in my opinion, might have served you better.
/** Extends strings with functions to lookup all palindromes. */
implicit class PalindromeLookup(in: String) {
private[this] val stringLength = in.length
class Palindrome(val start: Int, val end: Int) {
val length = end - start
override def toString = s"Palindrome(${in.substring(start, end)}) from $start to $end, length = $length"
}
object Palindrome {
// doesn't cost much and makes the code look better
def apply(start: Int, end: Int): Palindrome = new Palindrome(start, end)
}
private[this] def pivots = {
@tailrec
def findPalindromePivots(index: Int, result: List[Palindrome]): List[Palindrome] = {
if (index >= stringLength - 1) {
result
} else {
val char = in(index)
val nextIndex = index + 1
val nextNextIndex = index + 2
// TODO: the 'oo' in 'fooof' would be ingored.
// (who needs palindromes of length 2 that are sub-palindromes of longer palindromes?) ;)
if (nextNextIndex < stringLength && char == in(nextNextIndex)) {
findPalindromePivots(nextIndex, Palindrome(index, nextNextIndex + 1) :: result)
} else if (char == in(nextIndex)) {
findPalindromePivots(nextIndex, Palindrome(index, nextIndex + 1) :: result)
} else {
findPalindromePivots(nextIndex, result)
}
}
}
findPalindromePivots(0, Nil)
}
@tailrec
private[this] def expand(palindrome: Palindrome): Palindrome = {
val newStart = palindrome.start - 1
val newEnd = palindrome.end + 1
if (newStart < 0 || newEnd > stringLength) {
palindrome
} else if (in(newStart) == in(newEnd - 1)) {
expand(Palindrome(newStart, newEnd))
} else {
palindrome
}
}
/**
* Finds all palindromes in this classes input string.
*
* The algorithm iterates through the string once to find all palindrome pivots
* and then iterates through the pivots and expands them to their full size.
*/
def palindromes = pivots.map(expand)
}
Using the implicit class you can get the longest palindromes of any input string by e.g.
inputString.palindromes.sortBy(p => -p.length)
Hope this is more helpful.
-
\$\begingroup\$ Thank you. Points 1 and 3 are good ones... that said, your solution is a naive brute force. I'd have probably gone for a naive solution as well, weren't they asking for an
efficient
one. When I wroteperformance shouldn't be a concern
I meant for improvements on my solution or algorithms of comparable efficiency... ignoring efficiency completely obviously leads to much leaner code. I don't really like your point number 2: knowing that you can do a palindromicity check withstring == string.reverse
is not indicative at all of knowledge of the scala collections, api, etc... \$\endgroup\$berdario– berdario2016年02月27日 11:22:35 +00:00Commented Feb 27, 2016 at 11:22 -
\$\begingroup\$ That said, I quite like the fact that your
sliding
goes in decreasingpalindromeLength
, which helps in retaining only the longest palindromes... but I also noticed that your solution gives different outputs for inputs likeabcba_bcb
... it was unclear on the text I received if this should yieldabcba, bcb
(like in my code) or onlyabcba
. I guess that I shouldn't bother that much with such ambiguous and poorly defined exercises \$\endgroup\$berdario– berdario2016年02月27日 11:25:54 +00:00Commented Feb 27, 2016 at 11:25 -
\$\begingroup\$ @berdario concerning point 2, I still think knowledge of collections and their API is a plus, though maybe
foldLeft
would have been a better example. Concerning efficiency, well, as far as I know, efficient is not an absolute term. I think, my approach is not inefficient, that is it does not do anything terribly wasteful using standard API. I never claimed it would be optimal. Hence point 4 ;) \$\endgroup\$Sascha Kolberg– Sascha Kolberg2016年02月27日 12:17:00 +00:00Commented Feb 27, 2016 at 12:17 -
\$\begingroup\$ Concerning the different outputs: I don't see a very complete description of the task you were given in your question nor do you mention anywhere whether or not your solution was judged correct. That's why I didn't bother testing my solution against yours. \$\endgroup\$Sascha Kolberg– Sascha Kolberg2016年02月27日 12:19:25 +00:00Commented Feb 27, 2016 at 12:19
-
\$\begingroup\$ You're right... efficient can be a bit subjective, my personal threshold lays somewhere near choosing brute force over the alternatives. I never got any feedback on the correctness of the code, just that they didn't like its look and structure... Thank you again for giving me your perspective on this :) \$\endgroup\$berdario– berdario2016年02月27日 19:58:56 +00:00Commented Feb 27, 2016 at 19:58
As already mentioned, your code is not easy to read. Also, your algorithm probably isn't nowhere near \$ O \left( n^3 \right) \$ in practice because of groupBy
, combinations
, and many calls to exists
. All those introduce large overhead.
Here's my attempt (with a bug in findLongestPalindromeAtPos
, but the general idea should be clear and I'll try to fix it in my free time):
object BruteForce {
def apply(text: String): Seq[(Int, Int)] = {
val all = findAllPalindromes(text)
removeContainedPalindromes(all).sortBy(palindromeLength)
}
private def findAllPalindromes(text: String) = {
for {
i ← 0.5f until text.length by 0.5f
(from, to) ← findLongestPalindromeAtPos(text, i)
} yield (from, to)
}
private def removeContainedPalindromes(all: Seq[(Int, Int)]) = {
var currentMaxTo = 0
for {
(from, to) ← all.sortBy(palindromeLength).sortBy(palindromePosition)
if to > currentMaxTo
_ = currentMaxTo = to
} yield (from, to)
}
private def findLongestPalindromeAtPos(text: String, pos: Float): Option[(Int, Int)] = {
var from = math.floor(pos - 0.5f).toInt
var to = math.round(pos + 0.5f)
while (palindromeProperty(text, from - 1, to + 1)) {
from -= 1
to += 1
}
if (palindromeProperty(text, from, to)) Some((from, to)) else None
}
private def palindromeProperty(text: String, from: Int, to: Int) =
from >= 0 && to < text.length && to - from > 1 && text(from) == text(to)
private def palindromeLength(range: (Int, Int)) = {
val (start, end) = range
start - end
}
private def palindromePosition(range: (Int, Int)) = {
val (start, _) = range
start
}
}
-
\$\begingroup\$ A thing that I overlooked in my implementation is that
Seq
is apparently not lazy (I got a bit confused sinceseq
s in F# are lazy)... if working onStream
s, thefilter
overexists
would only be executed a constant (with linear worst case) number of times. Similarly, with constant time insertion into aMap
,groupBy
is just linear time (but if that's a tree, it'd be probably n*log(n))... finally,combinations
is actually the main responsible for the complexity of my algorithm, which is to be expected that it'll use quadratic steps in the pathological case. \$\endgroup\$berdario– berdario2016年03月03日 21:48:21 +00:00Commented Mar 3, 2016 at 21:48 -
\$\begingroup\$ My previous comment here is a comment re. your comment that my algorithm is "nowhere near \$ O \left( n^3 \right) \$". In fact I stand by my assertion that (with a lazy implementation for
.filter
, which alas my implementation lacks) it's better than \$ O \left( n^2 \right) \$ but I guess that to formally talk about this, I'd need to do some amortized analysis. (and again: I should stop focusing on performance) \$\endgroup\$berdario– berdario2016年03月03日 22:06:10 +00:00Commented Mar 3, 2016 at 22:06 -
\$\begingroup\$ As for complexity, you find every letter that occurs more than once and for each letter you create every pair of positions. That would be \$ O \left( n^2 \right) \$ operations. And then you check letters between those indices - that gives another power of n (checking from middle would be more efficient). And then you still have to filter smaller palindromes contained in longer ones - that seems to be additional \$ O \left( n^2 \right) \$ in you case. And all those operations on immutable collections in scala probably add the most complexity. \$\endgroup\$Nebril– Nebril2016年03月03日 22:59:40 +00:00Commented Mar 3, 2016 at 22:59
-
\$\begingroup\$ True, but the \$n\$ is not the length of the input, except in the case in which it's just the same letter all over, the combinations would be something more like \$\max O \left(count\left(letter\right)^2\right)\$ \$\endgroup\$berdario– berdario2016年03月03日 23:15:38 +00:00Commented Mar 3, 2016 at 23:15
Explore related questions
See similar questions with these tags.
aab
and(1,8), (0,4)
forabcba_bcb
\$\endgroup\$findAllPalindromes
is \$ O \left(findLongestPalindromeAtPos\right) \times O \left( n \right) \$ (andfindLongestPalindromeAtPost
has worst case \$ O \left( n \right) \,ドル but it's actually going to be way less expensive in non-pathological cases) ... brute force would be generating all the possible substrings, and checking for their palindromicity. You're instead generating only the palindrome substrings pivoted around some characters. Calling it brute force would be a bit of stretch imho. \$\endgroup\$