5
\$\begingroup\$

I recently implemented a Bloom Filter in Haskell and, although I am not new to functional programming, I am a beginner when it comes to Haskell itself.

I'll gladly take any feedback regarding the implementation or the code style. I tried to stick to the Haskell conventions regarding function and variable naming, but I may have gotten some stuff wrong.

You will notice that I'm using my own implementation of a Bitset, you can assume it behaves like a normal one (I sure hope so).

module DataStructures.BloomFilter (empty, insert, member, DataStructures.BloomFilter.null) where
import System.Random (random, StdGen)
import qualified DataStructures.Bitset as Bitset
import Data.Hashable (hashWithSalt)
-- n: expected number of items in the Bloom Filter
-- p: acceptable probability of a false positive
-- m: max number of bits the Bloom Filter will use
-- k: number of hashing functions
data BloomFilter = BloomFilter {
 n :: Int, p :: Float, bitset :: Bitset.Bitset, m :: Int, k :: Int, hashSeed :: Int
} deriving (Eq, Show)
getMaxSize :: Int -> Float -> Int
getMaxSize n p = abs $ ceiling $ fromIntegral n * (log p) / (log (1 / (log 2 ^ 2)))
getNumberOfHashFunctions :: Int -> Int -> Int
getNumberOfHashFunctions n m = round $ fromIntegral (m `div` n) * log 2
empty :: Int -> Float -> StdGen -> BloomFilter
empty n p randGen =
 let m = getMaxSize n p
 k = getNumberOfHashFunctions n m
 hashSeed = fst $ random randGen
 in BloomFilter n p Bitset.empty m k hashSeed
null :: BloomFilter -> Bool
null = Bitset.null . bitset
getHashes :: Show a => BloomFilter -> a -> [Int]
getHashes bloomFilter elem =
 let str = show elem
 seed = hashSeed bloomFilter
 maxSize = m bloomFilter
 in (`mod` maxSize) . abs . (`hashWithSalt` str) . (seed +) <$> [1..(k bloomFilter)]
insert :: Show a => BloomFilter -> a -> BloomFilter
insert bloomFilter elem =
 let hashes = getHashes bloomFilter elem
 newBitset = Bitset.insertMany (bitset bloomFilter) hashes
 in bloomFilter { bitset = newBitset }
-- Returns whether an element *may be* present in the bloom filter.
-- This function can yield false positives, but not false negatives.
member :: Show a => BloomFilter -> a -> Bool
member bloomFilter elem =
 let hashes = getHashes bloomFilter elem
 bs = bitset bloomFilter
 in all (Bitset.member bs) hashes

I would have liked to use a murmur3 hashing function but all Haskell implementations use types that I'm not yet familiar with (Word32, ByteString).

asked Mar 21, 2020 at 18:52
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

Instead of transforming everything to String using Show just for the purpose of hashing it, you should constraint element types to be Hashable instead:

getHashes :: Hashable a => BloomFilter -> a -> [Int]
getHashes bloomFilter elem =
 let seed = hashSeed bloomFilter
 maxSize = m bloomFilter
 in (`mod` maxSize) . abs . (`hashWithSalt` elem) . (seed +) <$> [1..(k bloomFilter)]

I'd also use maxSize and numFuns as the field names in BloomFilter and then use RecordWildCards:

getHashes :: Hashable a => BloomFilter -> a -> [Int]
getHashes BloomFilter{..} elem = map nthHash [1..numFuns]
 where
 nthHash n = abs (hashWithSalt (n + hashSeed) elem) `mod` maxSize 

Or maybe even nicer:

getHashes :: Hashable a => BloomFilter -> a -> [Int]
getHashes BloomFilter{..} elem = 
 [ abs (hashWithSalt (i + hashSeed) elem) `mod` maxSize | i <- [1..numFuns] ]
answered May 25, 2020 at 11:40
\$\endgroup\$
1
  • \$\begingroup\$ Oh I did not know about the RecordWildCards thing. Very cool, thanks! \$\endgroup\$ Commented May 26, 2020 at 14:53
2
\$\begingroup\$

I only have one minor nitpick: the definition of getMaxSize becomes clearer if we use logBase:

abs $ ceiling $ fromIntegral n * (log p) / (log (1 / (log 2 ^ 2)))

becomes

abs $ ceiling $ fromIntegral n * (- 0.5) * logBase (log 2) p

We can use the identity ceiling (-x) == - floor(x) to get

abs . floor $ fromIntegral n * 0.5 * logBase (log 2) p
answered May 1, 2020 at 12:08
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.