I have this little exercise writing a binary trie in F#. In C# it can be done in few lines, but here it became more complicated than I had expected (especially the add
function):
open System
type BitTrie =
| BitTrie of Zero:BitTrie * One:BitTrie
| NullTrie
module BitTrie =
let ubound = 31 // 32 bits integer
let add value trie =
let rec adder shift curTrie =
if shift < 0 then curTrie
else
match (value >>> shift) &&& 1, curTrie with
| 0, BitTrie(zero, one) ->
match zero with
| BitTrie(_) -> BitTrie(adder (shift - 1) zero, one)
| NullTrie -> BitTrie(adder (shift - 1) (BitTrie(NullTrie, NullTrie)), one)
| 1, BitTrie(zero, one) ->
match one with
| BitTrie(_) -> BitTrie(zero, adder (shift - 1) one)
| NullTrie -> BitTrie(zero, adder (shift - 1) (BitTrie(NullTrie, NullTrie)))
| bit, _ -> raise (InvalidOperationException(sprintf "Invalid bit value: %d" bit))
adder ubound trie
let contains value trie =
let rec searcher shift curTrie =
if shift < 0 then curTrie <> NullTrie
else
match (value >>> shift) &&& 1, curTrie with
| _, NullTrie -> false
| 0, BitTrie(zero, _) -> searcher (shift - 1) zero
| 1, BitTrie(_, one) -> searcher (shift - 1) one
| bit, _ -> raise (InvalidOperationException(sprintf "Invalid bit value: %d" bit))
searcher ubound trie
let isEmpty trie =
match trie with
| NullTrie -> true
| BitTrie(NullTrie, NullTrie) -> true
| BitTrie(_) -> false
let create () = BitTrie(NullTrie, NullTrie)
let addValues values trie =
values |> Seq.fold (fun accTrie value -> add value accTrie) trie
let fromValues values = create() |> addValues values
let toSeq trie =
let rec iterator value curTrie =
seq {
match curTrie with
| NullTrie -> ()
| BitTrie(NullTrie, NullTrie) -> yield value
| BitTrie(zero, one) ->
yield! (iterator (value <<< 1) zero)
yield! (iterator ((value <<< 1) + 1) one)
}
iterator 0 trie
let toList trie = trie |> toSeq |> Seq.toList
let toArray trie = trie |> toSeq |> Seq.toArray
let iter action trie = trie |> toSeq |> Seq.iter action
I know this pattern match
bit, _ -> raise (InvalidOperationException(sprintf "Invalid bit value: %d" bit))
will never happen, but hate to leave the compiler unsatisfied.
Here are some test cases:
let testBitTrie () =
let values = [ 0b1011001; 0b1101101; 0b1011011; Int32.MaxValue; (Int32.MaxValue >>> 16) - 1; 0; -1; Int32.MinValue ]
let trie = values |> BitTrie.fromValues
trie |> BitTrie.iter (fun v -> printfn "%s -> %d" (Convert.ToString(v, 2)) v)
printfn ""
trie
|> BitTrie.addValues [ 35; 45 ]
|> BitTrie.iter (fun v -> printfn "%s -> %d" (Convert.ToString(v, 2)) v)
//values @ [1; 2; 4; 100] |> Seq.iter (fun v -> printfn "%d: %b" v (trie |> BitTrie.contains (v)))
let solveXorMax () =
//let rand = new Random(1);
//let values = Enumerable.Range(0, 5000).Select(fun x -> rand.Next(rand.Next(0, 10000000))).ToList();
//let trie = values |> BitTrie.fromValues
let values = [ 3; 10; 5; 2; 25; 8 ]
let trie = values |> BitTrie.fromValues
let folder (max: int) value =
let rec iter shift xor curTrie : int =
if shift <= 0 then Math.Max(max, xor)
else
match (value >>> shift) &&& 1, curTrie with
| 0, BitTrie(zero, one) ->
match zero, one with
| _, BitTrie(_) -> iter (shift - 1) (xor + (1 <<< shift)) one
| BitTrie(_), _ -> iter (shift - 1) xor zero
| 1, BitTrie(zero, one) ->
match zero, one with
| BitTrie(_), _ -> iter (shift - 1) (xor + (1 <<< shift)) zero
| _, BitTrie(_) -> iter (shift - 1) xor one
iter BitTrie.ubound 0 trie
let max = values |> Seq.fold folder (Int32.MinValue)
printfn "%d" max
solveXorMax ()
finds the maximum xor value of two values in the input set - taken from this question
I'm interested in any comment, but especially if you can see a smarter/shorter approach for the add
function. My self-imposed restriction was to use a discriminated union type as data type for the trie node.
1 Answer 1
Big Issue
You have a problem with empty Tries: toSeq
thinks they contain one element: it returns seq { 0 }
for an empty Trie.
Seq.length (BitTrie.toSeq (BitTrie.create ())) = 1
This 0
value then goes away when you add anything. I think create
should probably be producing a NullTrie
. There's also no need for it to be a method; I'd prefer an empty
value, but at that point you could just rename
NullTrieto
EmptyTrieand everyone should be happy. Moving the union into the module would let you just use
Empty` without stepping on any toes.
I can't work out what NullTrie
is really meant to do otherwise; except produce confusing exceptions: BitTrie.add 0 NullTrie
will throw your InvalidOperationException(sprintf "Invalid bit value: %d" bit)
.
I think you can fix this by 'ensuring' curTrie
is not a NullTrie
. You can probably use active patterns to resolve this nicely, but I forgot how to use those long ago, so instead I would add an ensure
(I'm sure you can think of a better name, but it's late and I can't) function...
let private ensure trie =
match trie with
| Empty -> BitTrie(Empty, Empty)
| _ -> trie
... and then modify the match
in add
(which allows us to simplify it somewhat):
match (value >>> shift) &&& 1, (ensure curTrie) with
| 0, BitTrie(zero, one) ->
BitTrie(adder (shift - 1) zero, one)
| 1, BitTrie(zero, one) ->
BitTrie(zero, adder (shift - 1) one)
| bit, _ -> raise (InvalidOperationException(sprintf "Invalid bit value: %d" bit))
But there is still a big problem, because BitTrie(Empty, Empty)
contains nothing, but apparently isn't empty. This reveals a serious design issue: the meaning of the Trie
is tied to ubound
but you can produce 'invalid' Tries, and trivially so. At this point I would give up on the public BitTrie
DU, and instead implement a self-contained class where I can encapsulate and control everything, providing methods for producing Tries with a valid and meaningful state.
One option you could consider is making BitTrie
a ternary union, where you have a type indicating whether the value is present or not, so that it is unambigous. This, however, goes funny because you are using a classical Trie for integers. For example, 00000
= 0
, but these sequences would have different positions in a Trie
. I think your self-impose constraint might need to be relaxed if you want a sound API.
Other
Using a variant of the
ensure
method above which returns just a tuple rather than aBitTrie
, you can remove the need to cover theempty
case inadd
. Then if you really wanted to get rid of the exception, you could go with(value >>> shift) &&& 1 > 0
and usefalse
andtrue
instead of0
and1
. The same can be done incontains
, but looking at it I think it is less clear.It seems odd to provide
BitTrie.iter
specifically (as opposed tomap
orfold
or whatever); I'd be inclined to only providetoSeq
, and just leave the consumer to callSeq.iter
themselves.As always, I would prefer all public types and functions had inline documentation.
toSeq
, for example, probably wants to document any guarantees you will make about ordering (which presently is based on the bit values, so starts with 0 and ends with -1, which some might think odd).
-
\$\begingroup\$ Thanks a lot. I've read and learned. Your suggestions about
add
and an introduction of aBit
DU have removed the complexity from that function. I've also introduced a leaf node - indicating a valid right most bit of a contained number. You may be right about making a class-type implementation - but it was not the objective of this exercise - and besides that I like the DU/module approach. I hope you get a little more up votes... \$\endgroup\$user73941– user739412019年07月28日 17:43:56 +00:00Commented Jul 28, 2019 at 17:43