343 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
0
votes
0
answers
92
views
Run R with cuBLAS backend on Ubuntu
I'm trying to follow this guide to have R exploit the cuBLAS BLAS library.
But I seem to fail: when I run sessionInfo(), it is still linked against the openBLAS package.
What am I doing wrong?
...
1
vote
1
answer
125
views
Unexpected result with cublasStrmm (cublas triangular matmul)
With the following test example, the output matrix doesn't give the desired output or maybe I'm misunderstanding certain parameters:
#include <cstdio>
#include <cublas_v2.h>
#include <...
A. K.'s user avatar
- 39.4k
3
votes
1
answer
194
views
Using cublas<T>gemmStridedBatched with row major matrices
I'm trying to use cublasSgemmStridedBatched in C++ to compute batched matrix multiplications between two sets of matrices (inputs x1 and x2), but I am struggling to match the expected output. I cannot ...
2
votes
0
answers
155
views
Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?
I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...
1
vote
1
answer
88
views
How Does CublasComputeType_t affect the input and output data types of the tensor core?
I am a bit confused about the impact of cublasComputeType_t on computation when using the cublasGemmEx API.
For example, my A, B, and C matrices are all of type float.
When cublasComputeType_t=...
0
votes
0
answers
50
views
Does cublas perform a real transpose operation during execution gemm api (such as non-linear reading or transposing into LDS)? [duplicate]
I know that executing a standalone transpose operation in CUDA is expensive, so I'm curious: What is the meaning of the cublasOperation_t parameter in the cublas gemm API? Does cublas perform a real ...
0
votes
1
answer
162
views
Eigen Vectors mismatch by cuBLAS and Eigen lib
I want to do the EVD wth this 4x4 covariance matrix:
cuDoubleComplex m_cov_[16] = {
make_cuDoubleComplex(2.0301223848037391, 3.4235008150792548e-17),
make_cuDoubleComplex(1....
0
votes
1
answer
429
views
CUBLAS matrix multiplication with row-major data
I read some related posts here, and success using do the row majored matrixes multiplication with cuBLAS:
A*B (column majored) = B*A (row majored)
I write a wrapper to do this so that I can pass row ...
1
vote
1
answer
753
views
Comparing performance among custom cuda kernel, cublas and cutensor
I've made the following CUDA tests to compare the performance numbers of (square) matrix multiplication, running on Ubuntu 24.04 with the GPU card Quadro T1000 Mobile of compute capability 7.5 (arch=...
sof's user avatar
- 9,777
0
votes
0
answers
97
views
cuBLAS matrix multiplication results differ from NumPy [duplicate]
I'm trying to perform matrix multiplication using cuBLAS and compare the results with NumPy. However, I'm getting different results between the two. Here's my code:
#include <iostream>
#include &...
-1
votes
1
answer
2k
views
How can i fix gpu error of llama_cpp_python?
When I set n_gpu_layer to 1, i can see the following response:
To learn Python, you can consider the following options:
1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$▲さんかく▅
`▅☻↑↨►☻...
4
votes
1
answer
4k
views
No GPU support while running llama-cpp-python inside a docker container
I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container.
Following this repo for installation of llama_cpp_python==0.2.6.
DOCKERFILE
# Use the ...
4
votes
1
answer
3k
views
How can I install llama-cpp-python with cuBLAS using poetry?
I can install llama cpp with cuBLAS using pip as below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python
However, I don't know how to install it with cuBLAS when ...
0
votes
1
answer
250
views
Compiling CUDA sample program
I have a very simple CUDA program that refuses to compile
This is main.cpp
#include <iostream>
#include <cstdlib>
#include "/opt/cuda/targets/x86_64-linux/include/cuda_runtime.h"
...
0
votes
1
answer
205
views
cublas direct fortran c-binding using cublas.lib
I'm attempting to set up an interface to use cublas.lib in fortran without any separate c-code. I have seen a few examples of this and tried to duplicate those but I having trouble.
Both of these ...