Importance of cleanup
In analyzing the importance of good cleanup for performance, it is necessary
to recognize the various types that can occur. The cleanup that user's can
supply to ATLAS is
one dimensional cleanup, i.e., only one of the three
possible dimensions is less than
$N_B$. There is also 2 and 3 dimensional
cleanup. To give an idea of the relative importance of various catagories
of computation, it is roughly true that the matmul kernel is a cubic cost,
the one dimensional cleanup is a square cost, the two dimensional cleanup
is a linear cost, and the three dimensional cleanup is
$O(1)$.
This is shown more formally below. Define $M_r = M$ mod $N_B$, let
$m, n, k$ be the dimensional arguments to the gemmK and/or cleanup,
and remember that matrix multiplication takes 2ドル M N K$ flops, and we see
that the flop count for each catagory is:
- $\bf m=n=k=N_B$:
[画像:$\lfloor \frac{M}{N_B} \rfloor \lfloor \frac{N}{N_B} \rfloor \lfloor \frac{K}{N_B} \rfloor 2 {N_B}^3 \Longrightarrow \lfloor \frac{N}{N_B} \rfloor ^3 ~2 {N_B}^3$]
-
$\bf m < N_B, n = k = N_B$:
[画像:$\lfloor \frac{N}{N_B} \rfloor \lfloor \frac{K}{N_B} \rfloor 2 M_r {N_B}^2 \Longrightarrow \lfloor \frac{N}{N_B} \rfloor ^2 ~2 N_r {N_B}^2$]
-
$\bf n < N_B, m = k = N_B$:
[画像:$\lfloor \frac{M}{N_B} \rfloor \lfloor \frac{K}{N_B} \rfloor 2 N_r {N_B}^2 \Longrightarrow \lfloor \frac{N}{N_B} \rfloor ^2 ~2 N_r {N_B}^2$]
-
$\bf k < N_B, m = n = N_B$:
[画像:$\lfloor \frac{M}{N_B} \rfloor \lfloor \frac{N}{N_B} \rfloor 2 K_r {N_B}^2 \Longrightarrow \lfloor \frac{N}{N_B} \rfloor ^2 ~2 N_r {N_B}^2$]
-
$\bf m < N_B, n < N_B, k = N_B$:
[画像:$\lfloor \frac{K}{N_B} \rfloor 2 M_r N_r {N_B} \Longrightarrow \lfloor \frac{N}{N_B} \rfloor 2 {N_r}^2 N_B$]
-
$\bf m < N_B, k < N_B, n = N_B$:
[画像:$\lfloor \frac{N}{N_B} \rfloor 2 M_r K_r {N_B} \Longrightarrow \lfloor \frac{N}{N_B} \rfloor 2 {N_r}^2 N_B$]
-
$\bf n < N_B, k < N_B, m = N_B$:
[画像:$\lfloor \frac{M}{N_B} \rfloor 2 N_r K_r {N_B} \Longrightarrow \lfloor \frac{N}{N_B} \rfloor 2 {N_r}^2 N_B$]
-
$\bf m < N_B, n < N_B, k < N_B$:
[画像:2ドル M_r N_r K_r \Longrightarrow 2 {N_r}^3$]
Note that the simplified equations to the right of
$\Longrightarrow$
assume the square case, i.e. $M = K = N$. The above analysis can now
be grouped into the catagories of interest as in:
The simplified equations to the right of the
$<$ above provide a safe
upper bound on cleanup cost by setting
$N_r = N_B$
(in reality,
0ドル < N_r < N_B$, of course).
With this analysis, we can easily see why it is not important for the
user to be able to contribute 2D and 3D cleanup cases: remember that
all of these kernels are for ATLAS's large-case gemm. ATLAS has a
seperate small-case gemm, which is invoked when the problem is so small
that the 2ドル N^2$ copy cost is significant compared to the 2ドル N^3$ computational
costs. So, in the cases where the $O(N)$ 2D cleanup or $O(1)$ 3D cleanup
costs are prohibitive, this large-case gemm will probably not be used anyway.
Clint Whaley
2012年07月10日