Addons/math/mt/Benchmarking
Objective
Benchmark GEMM, TRSM and GESV methods from J primitives, mt addon, BLIS and OpenBLAS library wrappers in both single-threaded and multi-threaded environment.
Preparation
Prepare external libraries with methods to compare:
user@host:~/lib> ls -1 libblis_threads=1.so libblis_threads=n.so libopenblas_threads=1.so libopenblas_threads=n.so
By handiwork
To estimate performance, a raw data from the test log can be used e.g.:
load 'math/mt' mkmat=. _1 1 0 3 _6 4&gemat_mt_ log=. mkmat testbasicmm_mt_ 2 # 1000 'sts tms'=. 0 4 { log
In the code snippet above, various matrix-multiply methods were tested by random float 1000*1000 matrices in single-threaded environment. Sentences executed were saved into 2-rank string array sts (one sentence per row), and estimated execution durations were saved into tms vector (one atom per sentence):
sts ; ,. tms +-----------------+--------+ |(+/ .*) |0.000345| |mp |0.000346| |dgemmnn_mtbla_ |0.003074| |... |... | +-----------------+--------+
See log format in mt.ijs file. An execution duration for each sentence is estimated as proposed in [1] : "the minimum run-time of 3-5 executions of the program when the machine is lightly loaded.".
Having problem sizes given and execution durations produced, it's possible to compute any other indicators e.g. FLOPS or "duration per value".
By customized script
But developing a specialized code can make benchmarking process far more simple and convenient. Place the script File:Bmk.ijs into ~temp/bmk.ijs file and run it:
user@host:~/j9.6> ./jconsole.sh load '~temp/bmk.ijs' nn=. 100 liso4dhs_mt_ 100 60 NB. repeat for n=100..6000 with step 100 bmk_mttmp_ nn ... (output is skipped)
This script's execution will result in creating 6 text files with numeric data (3 matrix methods * 2 thread modes (single/multi)) and 6 corresponding graph files (.pdf when was run within jconsole or .png when was run within Qt Jconsole):
user@host:~/j-user/temp> ls -1 bmk_* bmk_GEMM_threads=1.dat bmk_GEMM_threads=1.pdf bmk_GEMM_threads=n.dat bmk_GEMM_threads=n.pdf bmk_GESV_threads=1.dat bmk_GESV_threads=1.pdf bmk_GESV_threads=n.dat bmk_GESV_threads=n.pdf bmk_TRSM_threads=1.dat bmk_TRSM_threads=1.pdf bmk_TRSM_threads=n.dat bmk_TRSM_threads=n.pdf
Bmk GEMM threads=1.png Bmk GEMM threads=n.png Bmk TRSM threads=1.png Bmk TRSM threads=n.png Bmk GESV threads=1.png Bmk GESV threads=n.png
References
- ↑ Magne Haveraaen, Hogne Hundvebakke. Some Statistical Performance Estimation Techniques for Dynamic Machines. Appeared in Weihai Yu & al. (eds.): Norsk Informatikk-konferanse 2001, Tapir, Trondheim Norway 2001, pp. 176-185. URL: https://www.ii.uib.no/saga/papers/perfor-5d.pdf