I have a rating system where users can vote between 1-6. I'd like to normalize the votes a bit to better accommodate for extreme voting, where individuals tend to vote as high or as low as possible. To do this, I have the following algorithm which seems to work well. However, it's extremely inefficient and not very well written.
What might I do to improve it?
protected function calculateWeightedAverage($ratings)
{
$values = array_values($ratings);
sort($values);
$count = count($values);
$weights = array_fill(0, $count, 1);
$out = (int) ($count / 3);
$out2 = pow($out + 1, 2);
$max = $count - 1;
$min = 0;
for ($i = 0; $i < $out; $i++) {
$j = $i + 1;
$weights[$min + $i] = $j * $j / $out2;
$weights[$max - $i] = $j * $j / $out2;
}
$sum = array_sum($weights);
for ($i = 0; $i < $count; $i++) {
$weights[$i] /= $sum;
}
$rating = 0;
for ($i = 0; $i < $count; $i++) {
$rating += $values[$i] * $weights[$i];
}
return $rating;
}
3 Answers 3
Your code is really good and I can't see any significant way to improve it. One minor improvement is to reduce the number of loops from 3 to 2:
protected function calculateWeightedAverage($ratings)
{
$sum = 0;
$rating = 0;
$values = array_values($ratings);
sort($values);
$count = count($values);
$out = (int) ($count / 3);
$out2 = pow($out + 1, 2);
for ($i=0; $i<$count; $i++) {
if ($i < $out) {
$weights[$i] = pow(($i+1),2)/$out2;
} elseif ($i > $count-$out-1) {
$weights[$i] = pow($count-$i,2)/$out2;
} else {
$weights[$i] = 1;
}
$sum += $weights[$i];
}
for ($i = 0; $i < $count; $i++) {
$rating += $values[$i] * $weights[$i]/$sum;
}
return $rating;
}
protected function calculateWeightedAverage($ratings)
{
$values = array_values($ratings);
sort($values);
$count = count($values);
$out = (int) ($count / 3);
$out2 = pow($out + 1, 2);
$max = $count - 1;
$min = 0;
$weights = array_fill(0, $count, 1);
for ($i = 0; $i < $out; $i++) {
$j = $i + 1;
$weights[$min + $i] = $j * $j / $out2;
$weights[$max - $i] = $j * $j / $out2;
}
Try removing the $min +
. It is nice for the reader but may be slower for the computer to execute.
$sum = array_sum($weights);
for ($i = 0; $i < $count; $i++) {
$weights[$i] /= $sum;
}
This loop is unnecessary since you can simply return $rating / $sum
.
$rating = 0;
for ($i = 0; $i < $count; $i++) {
$rating += $values[$i] * $weights[$i];
}
Since you are looping over the arrays anyway, you could also compute the $sum
in this loop. Measure it which one is faster or more readable.
return $rating;
}
Since this algorithm is pretty straight-forward and not really slow, maybe the slow part is the part where you get your $ratings
from. When they come from a database, I would worry more about that part.
I don't know how well PHP optimizes the code. If it doesn't, it may even help to extract the $j * $j / $out2
(which you use twice) into a temporary variable.
As with all performance-related topics: measure, and make sure you measure the realistic and relevant things.
It's pretty hard to understand what this code does. There's a lot going on and the code doesn't communicate well what it's doing. Like the variable names $out
and $out2
look completely arbitrary.
My main suggestion is to split this function into two:
- one where you calculate the weights, and
- another where you calculate the average using these weights.
This way it would become clear that the creation of $weights
array only depends on the count of values, not on the values themselves.
Additionally the function only uses $ratings
assoc-array to extract values from it - therefore I would recommend only passing in the values in the first place (it shouldn't be the job of the averaging function to extract the values to be worked on).
$ratings
and give us something concrete. \$\endgroup\$$ratings
comes in as an array[userid => vote, ... ]
but we're usingarray_values()
so$values
is just an array of integers 1-6. \$\endgroup\$count / 3
) and lower third items in the array, giving majority weight to values in the middle. For instance, if the votes were[1, 4, 7]
, the weight of the 1 and 7 would be decreased before calculating the average. If it were[2, 2, 3, 4, 4, 7]
the weights of the 2, 2, second 4, and 7 would be lowered. \$\endgroup\$