Optimize hash OCaml

Question 1

I am running a simple OCaml program which opens a CSV file with a pseudo dict-reader and hashes "key" + "value" (where key and values are strings). Then some counts are evaluated on the hashes (but it is not really relevant for what follows).

After a quick look at the default OCaml profiler (gprof), I noticed that my program was mostly spending time in hashing elements (I don't know what caml_page_table_lookup does though).

Each sample counts as 0.01 seconds.
 % cumulative self self total 
 time seconds seconds calls s/call s/call name 
 8.27 4.12 4.12 97999951 0.00 0.00 caml_hash
 7.53 7.88 3.76 97999951 0.00 0.00 caml_hash_mix_string
 5.12 10.43 2.55 136894800 0.00 0.00 caml_page_table_lookup

The hashtables are seldomly called in my code :

let get_indices dict n = Hashtbl.fold (fun k v acc -> ((Hashtbl.hash (k^v)) mod n) :: acc) dict [] ;;

And here :

let dict_reader file_path = 
 let line_stream = read_lines file_path in
 let header = split_line (Stream.next line_stream) in
 Stream.from
 (fun _ ->
 try Some (to_dict header (split_line (Stream.next line_stream))) with End_of_file -> None)

Is there a smart way to optimize these bottlenecks ? I feel like there is something redundant in evaluating hashes and storing values in hashtables...

Question 2

The hash function by itself is already optimized and very fast. It is hard to make it any faster.

In the code snippets, that you provided, I see only one optimization opportunity: instead of first concatenating a key and values and then hashing the result, it might be more optimal to hash the parts separately and then merge them with xor:

 Hashtbl.hash k lxor Hashtbl.hash v

Also, other than explicit calls to Hashtbl.hash each search on a Hashtbl (e.g., Hashtbl.find also calls the hash function). So you might be the case, that you actually didn't show the actual place, that is responsible for the most calls to hash. (Look at the detailed gprof output, it will provide a context for the calls).

The caml_page_table_lookup is an internal OCaml function for memory management (it also uses hashing, but it doesn't call caml_hash function). An optimization with the string concatenation should also lessen the impact from this function.

Finally, the hash function is only responsible for the 8% of the performance cost. It is not the bottleneck. If you think that your program is too slow, then possibly you need to find ways to make it more efficient algorithmically, rather then trying to apply microoptmizations.

If you will post the whole code, it might be possible, that we will point to a real problem in your code, that causes the performance problems.

Question 3

Wow thanks ! the speedup was impressive. The execution gained roughly 10% - the amount shown in the profiler !

Question 4

I posted the complete version of the code here : codereview.stackexchange.com/questions/141418/…

ivg ivg 3411 silver badge5 bronze badges · Accepted Answer · 2016-09-02 16:53:26Z

The hash function by itself is already optimized and very fast. It is hard to make it any faster.

In the code snippets, that you provided, I see only one optimization opportunity: instead of first concatenating a key and values and then hashing the result, it might be more optimal to hash the parts separately and then merge them with xor:

 Hashtbl.hash k lxor Hashtbl.hash v

Also, other than explicit calls to Hashtbl.hash each search on a Hashtbl (e.g., Hashtbl.find also calls the hash function). So you might be the case, that you actually didn't show the actual place, that is responsible for the most calls to hash. (Look at the detailed gprof output, it will provide a context for the calls).

The caml_page_table_lookup is an internal OCaml function for memory management (it also uses hashing, but it doesn't call caml_hash function). An optimization with the string concatenation should also lessen the impact from this function.

Finally, the hash function is only responsible for the 8% of the performance cost. It is not the bottleneck. If you think that your program is too slow, then possibly you need to find ways to make it more efficient algorithmically, rather then trying to apply microoptmizations.

If you will post the whole code, it might be possible, that we will point to a real problem in your code, that causes the performance problems.

Wow thanks ! the speedup was impressive. The execution gained roughly 10% - the amount shown in the profiler !
I posted the complete version of the code here : codereview.stackexchange.com/questions/141418/…

Stack Exchange Network

Optimize hash OCaml

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Optimize hash OCaml

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions