I an a newcomer in the Scala world, and I would like some advice from more experienced people to find out whether what I am writing goes in the direction of idiomatic code.
In particular, I have to write a simple algorithm that will do the following. We are given an integer number of, say, candies to be distributed, and a list of proportions, which are also integers. We should split the candies into groups, such that the proportions are respected as accurately as possible, given that a single candy cannot be broken up.
For instance, given 17
candies to distribute in proportions of (3, 2, 5)
we should obtain (5, 4, 8)
.
Here is my code, including a quick ScalaCheck test.
import org.scalacheck.Prop._
import org.scalacheck.Gen
object Distribution {
private class IntWithDivision(val self: Int) {
def @/(other: Int): Float = self.toFloat / other
}
private implicit def divisibleInt(int: Int) = new IntWithDivision(int)
def distribute(amount: Int, proportions: Seq[Int]): Seq[Int] = {
val sum = proportions.sum
val byDifect = proportions map { x => (amount * x @/ sum).toInt }
val approximations = (byDifect, proportions).zipped map { (x, y) => x @/ y }
val lowValuesIndices = (
approximations.view.zipWithIndex
sortBy { _._1 }
take (amount - byDifect.sum)
map { _._2 }
).force
val remainders = (
approximations.view.zipWithIndex
map { x => if (lowValuesIndices contains x._2) 1 else 0 }
).force
(byDifect, remainders).zipped map { _ + _ }
}
def main(args: Array[String]) {
val pos = Gen.choose(1, 50000)
val posList = Gen.containerOf[List, Int](pos)
val propSum = forAll(pos, posList) { (amount: Int, proportions: List[Int]) =>
(proportions.size > 0) ==> (amount == distribute(amount, proportions).sum)
}
propSum.check
}
}
In particular, I would like advice on the following points:
- The algorithm goes over the list of proportions and derived lists a few times. I would like to use more laziness, but using
proportions.view
at the beginning and then returning aforce
gives a type error. - The code is more convoluted than I would like. I think it may be simplified knowing the Collections API better.
Any help or suggestion is welcomed.
1 Answer 1
Ok, I'm changing the answer now that I understand what you are doing.
The main problem here is @/
-- while Scala people, in general, don't mind special operators, they don't add operators just because they can either. You can replace @/
with the existing /
just by adding .toFloat
to any one of the terms.
Views aren't often used either, and it's important to have a very good understanding of how they work if you are going to use them, and it's not that easy to gain performance with them, since the machinery they use to support non-strictness is quite heavy, and not everything takes advantage of it. For example, sortBy
will create a new collection before take
and map
are applied.
Views can gain when you have many mapping/slicing steps, and few elements of it are ever used. Most of the time, iterators will gain you much more performance, at the cost of the mutability problems iterators have.
If you want to reduce the number of times you iterate through the list of proportions, there's at least one place where you can simplify:
val remainders = (
approximations.view.zipWithIndex
map { x => if (lowValuesIndices contains x._2) 1 else 0 }
).force
(byDifect, remainders).zipped map { _ + _ }
Can be reduced to
byDifect.zipWithIndex map {
case (bd, i) => bd + (if (lowValuesIndices contains i) 1 else 0)
}
One could also keep byDifect
a Float
, then either use it alone when computing approximation
(instead of zipping stuff), or skip that altogether and put that computation on sortBy
-- incurring the cost of computation O(nlogn) times instead of O(n) times. It would make the code shorter, but whether it would be faster or not is something I'd leave to a benchmark with a real application -- I'm guessing it would depend on actual sizes for proportions
.
So, let's talk a bit about performance. Before Scala 2.10, if you want performance you should avoid methods added through implicits on critical paths. The code you wrote will probably get inlined by JIT. You can also reduce the number of computations by pre-computing amount / sum
, and if you make that amount.toFloat / sum
, then you don't need /@
.
More specifically, views are not guarantees of speed, particularly if the computations are light, such as here. I'd not use them at all, unless I'm specifically optimizing the code.
Doing a fixed size of multiple passes on small data structures is often not a problem. You are not changing the complexity, just losing memory locality. If the data is bigger, you can incur in gc overheads, which are more substantial. If maximum performance is required, just drop immutability and go to mutable arrays.
Finally, contains
is faster on Set
than Seq
-- and, in this particular case, a BitSet
would be way faster. Call it apply
, however, since contains
is a general method on traversables, while set's apply is a fundamental operation. If one of them is less than optimized, it will be contains
.
This is the most idiomatic beginner's code I have ever seen... do you come from another functional language?
-
\$\begingroup\$ Of course we are not allowed to leave any candies, otherwise the problem would be trivial. In any case, I would be happy to know whether the way I have written the algorithm in Scala is idiomatic. \$\endgroup\$Andrea– Andrea2012年08月10日 16:33:07 +00:00Commented Aug 10, 2012 at 16:33
-
\$\begingroup\$ @Andrea Ok, revised the answer. \$\endgroup\$Daniel C. Sobral– Daniel C. Sobral2012年08月10日 20:29:21 +00:00Commented Aug 10, 2012 at 20:29
-
\$\begingroup\$ Thank you very much for your advice! I also do not like an exceeding use of operators in public APIs, but I figured in this case it was all kept private, just make some lines more readable. As you suggest, it is better to keep
amount/sum
as a float and avoid the issue altogether. As for my background, I mostly come from Python and Javascript, but I have dabbled with Clojure, Scheme and Haskell before. \$\endgroup\$Andrea– Andrea2012年08月11日 09:38:21 +00:00Commented Aug 11, 2012 at 9:38 -
\$\begingroup\$ @DanielC.Sobral, I'd like to abuse of that answer to bring this question to your attention: codereview.stackexchange.com/questions/101339/kosaraju-in-scala. Thanks in advance if you have any time for that :-) \$\endgroup\$Bacon– Bacon2015年08月28日 01:55:51 +00:00Commented Aug 28, 2015 at 1:55
assignedCandies/assignedProportion
is lowest, and repeat. It is easy to see that in this way, the no kid will get more than one leftover, so I directly compute those kids having the lowest value and assing one candy to all of them. \$\endgroup\$