2
\$\begingroup\$

I would like a code-review. Not so much on if the implementation is good or effecient, it's probably not, more on code style and readability.

import java.lang.Exception
import java.nio.ByteBuffer
import java.security.MessageDigest
import java.util.*
import kotlin.math.abs
fun main() {
 val filterSize = 1_000_000
 val numberOfEntries = 100_000
 val filter = BloomFilter(filterSize, numberOfHashes = 4)
 val entriesInFilter = Array(numberOfEntries) { randomString() }
 val entriesNotInFilter = Array(numberOfEntries) { randomString() }
 for (entry in entriesInFilter)
 filter.add(entry)
 val confusionMatrix = ConfusionMatrix(filter, entriesInFilter, entriesNotInFilter)
 confusionMatrix.printReport()
 if (confusionMatrix.falseNegativeRate > 0.0) {
 throw Exception("This should not happen, if it does the implementation of the bloom filter is wrong.")
 }
}
class BloomFilter(private val size: Int, numberOfHashes: Int) {
 private val flags = BitSet(size)
 private val salts = IntArray(numberOfHashes) { it }.map { it.toString() }
 private val sha = MessageDigest.getInstance("SHA-1")
 fun add(entry: String) {
 for (salt in salts) {
 val index = hashedIndex(entry, salt)
 flags.set(index)
 }
 }
 fun maybeExists(entry: String): Boolean {
 for (salt in salts) {
 val index = hashedIndex(entry, salt)
 if (!flags[index]) {
 return false
 }
 }
 return true
 }
 private fun hashedIndex(entry: String, salt: String): Int {
 val salted = entry + salt
 val hash = sha.digest(salted.toByteArray())
 val wrapped = ByteBuffer.wrap(hash)
 return abs(wrapped.int) % size
 }
}
class ConfusionMatrix(filter: BloomFilter, entriesInFilter: Array<String>, entriesNotInFilter: Array<String>) {
 private val inFilterCount = entriesInFilter.size
 private val notInFilterCount = entriesNotInFilter.size
 private var truePositiveCount = 0
 private var trueNegativeCount = 0
 private var falsePositiveCount = 0
 private var falseNegativeCount = 0
 val accuracyRate by lazy { (truePositiveCount + trueNegativeCount).toDouble() / (notInFilterCount + inFilterCount) }
 val misclassificationRate by lazy { 1.0 - accuracyRate }
 val truePositiveRate by lazy { truePositiveCount.toDouble() / inFilterCount }
 val trueNegativeRate by lazy { trueNegativeCount.toDouble() / notInFilterCount }
 val falsePositiveRate by lazy { falsePositiveCount.toDouble() / notInFilterCount }
 val falseNegativeRate by lazy { falseNegativeCount.toDouble() / inFilterCount }
 init {
 countTruePositiveAndFalseNegative(entriesInFilter, filter)
 countFalsePositiveAndTrueNegative(entriesNotInFilter, filter)
 }
 private fun countTruePositiveAndFalseNegative(entriesInFilter: Array<String>, filter: BloomFilter) {
 for (entryInFilter in entriesInFilter) {
 if (filter.maybeExists(entryInFilter)) {
 truePositiveCount++
 } else {
 falseNegativeCount++
 }
 }
 }
 private fun countFalsePositiveAndTrueNegative(entriesNotInFilter: Array<String>, filter: BloomFilter) {
 for (entryNotInFilter in entriesNotInFilter) {
 if (filter.maybeExists(entryNotInFilter)) {
 falsePositiveCount++
 } else {
 trueNegativeCount++
 }
 }
 }
 fun printReport() {
 val dataRows = mapOf(
 "Accuracy" to accuracyRate,
 "Misclassification rate" to misclassificationRate,
 "True positive rate" to truePositiveRate,
 "True negative rate" to trueNegativeRate,
 "False positive rate" to falsePositiveRate,
 "False negative rate" to falseNegativeRate
 )
 val printer = Printer(dataRows)
 printer.print()
 }
}
class Printer(private val dataRows: Map<String, Double>) {
 private val spacing = 2
 private val longestLabelLength = getLongestString(dataRows.keys, default=50) + spacing
 private val stringBuilder = StringBuilder()
 private fun getLongestString(labels: Set<String>, default: Int): Int {
 return labels.map { it.length }.max() ?: default
 }
 fun print() {
 for ((label, value) in dataRows) {
 printLabel(label)
 printPadding(label)
 printFormattedValue(value)
 println()
 }
 }
 private fun printLabel(label: String) {
 print("$label:")
 }
 private fun printPadding(label: String) {
 val paddingNeeded = longestLabelLength - label.length
 stringBuilder.clear()
 for (x in 0 until paddingNeeded) stringBuilder.append(" ")
 print(stringBuilder.toString())
 }
 private fun printFormattedValue(value: Double) {
 val width6digits2 = "%6.2f"
 val percentage = String.format(width6digits2, value * 100) + "%"
 print(percentage)
 }
}
private fun randomString(): String {
 return UUID.randomUUID().toString()
}
asked Aug 22, 2020 at 18:44
\$\endgroup\$
6
  • 1
    \$\begingroup\$ Why are you using lazy for trivial division operations? \$\endgroup\$ Commented Aug 26, 2020 at 18:18
  • \$\begingroup\$ Yes it's probably overkill. I don't like them to be var or even private set as var, as they are only set once which is exactly was lazy does when called and then saves for later. In C# we have the readonly modifier where fields can be set only in object initialization, but then still remain public and communicate clearly that this is constant after initialization. Does Kotlin have similar, what would you suggest? \$\endgroup\$ Commented Aug 27, 2020 at 19:16
  • 1
    \$\begingroup\$ = is uglier than by lazy { }? Or if you just define them as Doubles, you can set them one time in the init block, like you described doing in C#. \$\endgroup\$ Commented Aug 27, 2020 at 19:18
  • \$\begingroup\$ But then I have to make them var with private set. Which 1) gives two lines instead of one for each property and 2) gives the impression that accuracyRate can change after initialization. var accuracyRate = 0.0 private set \$\endgroup\$ Commented Aug 27, 2020 at 19:28
  • 1
    \$\begingroup\$ You don't have to make it a var. pl.kotl.in/1yNSwYOwh \$\endgroup\$ Commented Aug 27, 2020 at 19:31

2 Answers 2

2
\$\begingroup\$

Here's how I'd clean up the ConfusionMatrix class. I don't know anything about this algorithm, but this should be equivalent code. You can calculate and set these read-only values at their declaration sites if you do them in order. So all parameters can be val and you don't need lazy, which wraps your property in a Lazy class. There are no custom getters and there are no setters, so the whole class is immutable and compact with no references to anything else once it's instantiated.

class ConfusionMatrix(filter: BloomFilter, entriesInFilter: Array<String>, entriesNotInFilter: Array<String>) {
 private val inFilterCount = entriesInFilter.size
 private val notInFilterCount = entriesNotInFilter.size
 private val truePositiveCount = entriesInFilter.count { filter.maybeExists(it) }
 private val falseNegativeCount = entriesInFilter.size - truePositiveCount
 private val falsePositiveCount = entriesNotInFilter.count { filter.maybeExists(it) }
 private val trueNegativeCount = entriesNotInFilter.size - truePositiveCount
 val accuracyRate = (truePositiveCount + trueNegativeCount).toDouble() / (notInFilterCount + inFilterCount)
 val misclassificationRate = 1.0 - accuracyRate
 val truePositiveRate = truePositiveCount.toDouble() / inFilterCount 
 val trueNegativeRate = trueNegativeCount.toDouble() / notInFilterCount
 val falsePositiveRate = falsePositiveCount.toDouble() / notInFilterCount
 val falseNegativeRate = falseNegativeCount.toDouble() / inFilterCount
 fun printReport() {
 val dataRows = mapOf(
 "Accuracy" to accuracyRate,
 "Misclassification rate" to misclassificationRate,
 "True positive rate" to truePositiveRate,
 "True negative rate" to trueNegativeRate,
 "False positive rate" to falsePositiveRate,
 "False negative rate" to falseNegativeRate
 )
 val printer = Printer(dataRows)
 printer.print()
 }
}

Knowing nothing of the algorithm, I'd say BloomFilter is pretty clean, but you could more naturally write the declaration of salts like this:

private val salts = (0..numberOfHashes).map { it.toString() }

or

private val salts = (0..numberOfHashes).map(Int::toString)

The second form is usually preferred to lambdas when there's a function that exactly matches the required signature because it shows the type. Not really helpful here, but helpful in a chain of functional calls to make it more readable later.

In your main method, a couple of little tips...

When you want to do some sort of logging type of action without side effects as you are assigning something to a variable, you can use also. It kind of de-emphasizes it for someone reading your code, especially if it's some action that takes a few lines of code. It's not all that useful here since this is so simple, but might be handy for you in other situations.

val confusionMatrix = ConfusionMatrix(filter, entriesInFilter, entriesNotInFilter)
 also { it.printReport() }

And there's a function for asserting something and throwing a runtime exception if it fails, so your last bit can be cleaned up:

require(confusionMatrix.falseNegativeRate > 0.0) {
 "This should not happen, if it does the implementation of the bloom filter is wrong."
}
answered Aug 27, 2020 at 20:04
\$\endgroup\$
0
\$\begingroup\$

After looking at it a bit

hashedIndex does many things. It salts the input, hashes it, wraps it and makes sure it fits into the size. Could it be split up and more clear what is happening?

The confusion matrix seems like a general mathy thing, why does it have a direct dependency on a BloomFilter and its data? Try to come up with some way of decoupling these so the confusion matrix can be reused for other statistical purposes.

countTruePositiveAndFalseNegative and countFalsePositiveAndTrueNegative looks a lot like repetition, can the logic be moved to a single implementation?

None of the classes implement interfaces or abstract methods, so using them would require a dependency to the concrete implementation, making the depende unnecessarily difficult to test and change.

There is a possible divide by zero issue if inFilterCount or notInFilterCount is zero.

answered Aug 23, 2020 at 16:32
\$\endgroup\$
1
  • \$\begingroup\$ I got around the dependency from ConfusionMatrix to Bloomfilter by using a lambda that "predicts", which I think is a good use of lambdas compared to a solution using e.g. adapter pattern, where a lot of boilerplate would be created. \$\endgroup\$ Commented Aug 25, 2020 at 19:02

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.