1
\$\begingroup\$

I am doing a college assignment where I have to count the number of inversions in an array/list. An inversion is defined as a tuple where i<j and a[i]>a[j]. For example in this array

val arr = Array(3, 1, 2, 4)

The inversions are (3,1), (3,2). So total count of inversions is 2.

I wrote a divide and conquer algorithm to find this. it works for small arrays. but for large inputs like this one

https://github.com/abhsrivastava/ArrayInversions/blob/master/src/main/resources/inversions.txt

there is always an out of memory error. My code is below

import scala.io.Source
import Sort._
object CountInversions extends App {
 val data = Source.fromResource("inversions.txt").getLines.map(_.toInt).toList
 val inversions = countInversions(data)
 println(s"number of inversions ${inversions}")
 def countInversions(input: List[Int]): Int = {
 input match {
 case Nil => 0
 case List(a) => 0
 case _ =>
 val mid = input.size / 2
 val left = input.slice(0, mid)
 val right = input.slice(mid, input.size)
 val l1 = countInversions(left)
 val l2 = countInversions(right)
 val l3 = splitInversions(left, right)
 l1 + l2 + l3
 }
 }
 // assuming l1 and l2 are almost of same size.
 // total complexity 2(nlogn + n)
 def splitInversions(l1: List[Int], l2: List[Int]): Int = {
 val sortedL1 = mergeSort(l1) // nlogn
 val sortedL2 = mergeSort(l2) // nlogn
 (sortedL1, sortedL2) match {
 case (Nil, Nil) => 0
 case (Nil, _) => 0
 case (_, Nil) => 0
 case (_, _) if sortedL1.head > sortedL2.head =>
 val result = splitInversions(sortedL1, sortedL2.tail)
 sortedL1.size + result
 case (_, _) =>
 splitInversions(sortedL1.tail, sortedL2)
 }
 }
}

I am not posting the code for mergeSort here. it is just a simple merge sort.

My objective is to be able to determine the inversions in O(nlogn) time and be able to process the large file. I also want to keep my code functional.

How can I optimize my code?

asked Jul 8, 2018 at 17:02
\$\endgroup\$
0

1 Answer 1

2
\$\begingroup\$

A few things I've noticed.

  1. You use List to represent the input data. Lists are inefficient (linear) for things like input.size (which you do twice on the same input) and input.slice().
  2. Both countInversions() and splitInversions() are recursive but they're not tail recursive, so that's going to eat up stack space.
  3. splitInversions() sends its passed parameters to mergeSort() which means that every time it calls itself (recurses) it re-sorts already sorted data.
  4. Your calculations return an Int, which is going to be too small for a large data set such as the inversions.txt you've linked to.

But generally I find the whole thing rather too complicated.

Here's an alternative algorithm that is smaller, faster, and more memory efficient.

def countInversions(input :Vector[Int]) :Long = {
 val idxBuf = input.indices.sortBy(input).toBuffer
 input.indices.foldLeft(0L){ case (sum,x) =>
 val idx = idxBuf.indexOf(x)
 idxBuf.remove(idx) //side effect
 sum + idx
 }
}
answered Aug 10, 2018 at 22:21
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.