Newest 'cublas' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

343 questions

0 votes

0 answers

94 views

Run R with cuBLAS backend on Ubuntu

I'm trying to follow this guide to have R exploit the cuBLAS BLAS library. But I seem to fail: when I run sessionInfo(), it is still linked against the openBLAS package. What am I doing wrong? ...

robertspierre's user avatar

robertspierre

5,712

asked Dec 12, 2025 at 17:45

1 vote

1 answer

125 views

Unexpected result with cublasStrmm (cublas triangular matmul)

With the following test example, the output matrix doesn't give the desired output or maybe I'm misunderstanding certain parameters: #include <cstdio> #include <cublas_v2.h> #include <...

A. K.'s user avatar

A. K.

39.5k

asked May 29, 2025 at 22:13

3 votes

1 answer

195 views

Using cublas<T>gemmStridedBatched with row major matrices

I'm trying to use cublasSgemmStridedBatched in C++ to compute batched matrix multiplications between two sets of matrices (inputs x1 and x2), but I am struggling to match the expected output. I cannot ...

mantle core's user avatar

mantle core

asked Apr 28, 2025 at 22:57

2 votes

0 answers

155 views

Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?

I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...

dentry's user avatar

dentry

asked Mar 3, 2025 at 14:05

1 vote

1 answer

88 views

How Does CublasComputeType_t affect the input and output data types of the tensor core?

I am a bit confused about the impact of cublasComputeType_t on computation when using the cublasGemmEx API. For example, my A, B, and C matrices are all of type float. When cublasComputeType_t=...

Shui_'s user avatar

Shui_

asked Oct 17, 2024 at 10:08

0 votes

0 answers

50 views

Does cublas perform a real transpose operation during execution gemm api (such as non-linear reading or transposing into LDS)? [duplicate]

I know that executing a standalone transpose operation in CUDA is expensive, so I'm curious: What is the meaning of the cublasOperation_t parameter in the cublas gemm API? Does cublas perform a real ...

lifeng zhang's user avatar

lifeng zhang

asked Aug 21, 2024 at 1:26

0 votes

1 answer

162 views

Eigen Vectors mismatch by cuBLAS and Eigen lib

I want to do the EVD wth this 4x4 covariance matrix: cuDoubleComplex m_cov_[16] = { make_cuDoubleComplex(2.0301223848037391, 3.4235008150792548e-17), make_cuDoubleComplex(1....

Weimin Chan's user avatar

Weimin Chan

asked Aug 5, 2024 at 18:03

0 votes

1 answer

436 views

CUBLAS matrix multiplication with row-major data

I read some related posts here, and success using do the row majored matrixes multiplication with cuBLAS: A*B (column majored) = B*A (row majored) I write a wrapper to do this so that I can pass row ...

Weimin Chan's user avatar

Weimin Chan

asked Jul 21, 2024 at 18:56

1 vote

1 answer

756 views

Comparing performance among custom cuda kernel, cublas and cutensor

I've made the following CUDA tests to compare the performance numbers of (square) matrix multiplication, running on Ubuntu 24.04 with the GPU card Quadro T1000 Mobile of compute capability 7.5 (arch=...

sof's user avatar

sof

9,777

asked Jul 4, 2024 at 12:19

0 votes

0 answers

97 views

cuBLAS matrix multiplication results differ from NumPy [duplicate]

I'm trying to perform matrix multiplication using cuBLAS and compare the results with NumPy. However, I'm getting different results between the two. Here's my code: #include <iostream> #include &...

musako's user avatar

musako

1,367

asked Apr 11, 2024 at 17:51

-1 votes

1 answer

2k views

How can i fix gpu error of llama_cpp_python?

When I set n_gpu_layer to 1, i can see the following response: To learn Python, you can consider the following options: 1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$▲さんかく▅ `▅☻↑↨►☻...

Phương Nguyễn's user avatar

Phương Nguyễn

asked Jan 18, 2024 at 14:04

4 votes

1 answer

4k views

No GPU support while running llama-cpp-python inside a docker container

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...

Pratyush's user avatar

Pratyush

asked Nov 23, 2023 at 6:09

4 votes

1 answer

3k views

How can I install llama-cpp-python with cuBLAS using poetry?

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...

KimuGenie's user avatar

KimuGenie

asked Nov 23, 2023 at 2:43

0 votes

1 answer

250 views

Compiling CUDA sample program

I have a very simple CUDA program that refuses to compile This is main.cpp #include <iostream> #include <cstdlib> #include "/opt/cuda/targets/x86_64-linux/include/cuda_runtime.h" ...

Sean's user avatar

Sean

asked Nov 9, 2023 at 20:27

0 votes

1 answer

205 views

cublas direct fortran c-binding using cublas.lib

I'm attempting to set up an interface to use cublas.lib in fortran without any separate c-code. I have seen a few examples of this and tried to duplicate those but I having trouble. Both of these ...

js1's user avatar

js1

asked Sep 2, 2023 at 21:32

15 30 50 per page

2 3 4 5

...

23 Next

CollectivesTM on Stack Overflow

Run R with cuBLAS backend on Ubuntu

Unexpected result with cublasStrmm (cublas triangular matmul)

Using cublas<T>gemmStridedBatched with row major matrices

Does cuBLAS support mixed precision matrix multiplication in the form C[f32] = A[bf16] * B[f32]?

How Does CublasComputeType_t affect the input and output data types of the tensor core?

Does cublas perform a real transpose operation during execution gemm api (such as non-linear reading or transposing into LDS)? [duplicate]

Eigen Vectors mismatch by cuBLAS and Eigen lib

CUBLAS matrix multiplication with row-major data

Comparing performance among custom cuda kernel, cublas and cutensor

cuBLAS matrix multiplication results differ from NumPy [duplicate]

How can i fix gpu error of llama_cpp_python?

No GPU support while running llama-cpp-python inside a docker container

How can I install llama-cpp-python with cuBLAS using poetry?

Compiling CUDA sample program

cublas direct fortran c-binding using cublas.lib

Hot Network Questions