So, there are actually three
gemmK kernels (corresponding to different
$\beta$ values), and perform the operations:
$C \leftarrow A^T B$,
[画像:$C \leftarrow A^T B + C$],
[画像:$C \leftarrow A^T B + \beta C$]. All input arrays (
$A, B, C$) are
column-major (they are still used as performance kernels for row-major
BLAS as well, so don't worry). Additionally,
$A^T$ and
$B$ are in block-major
format, such that
$lda = ldb = M = N = K = N_B$.
Subsections
Clint Whaley
2012年07月10日