Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
0 votes
0 answers
92 views

I'm trying to follow this guide to have R exploit the cuBLAS BLAS library. But I seem to fail: when I run sessionInfo(), it is still linked against the openBLAS package. What am I doing wrong? ...
1 vote
1 answer
125 views

With the following test example, the output matrix doesn't give the desired output or maybe I'm misunderstanding certain parameters: #include <cstdio> #include <cublas_v2.h> #include <...
A. K.'s user avatar
  • 39.4k
3 votes
1 answer
194 views

I'm trying to use cublasSgemmStridedBatched in C++ to compute batched matrix multiplications between two sets of matrices (inputs x1 and x2), but I am struggling to match the expected output. I cannot ...
mantle core's user avatar
2 votes
0 answers
155 views

I'm concerning mixed precision in deep learning LLM. The intermediates are mostly F32 and weights could be any other type like BF16, F16, even quantized type Q8_0, Q4_0. it would be much useful if ...
1 vote
1 answer
88 views

I am a bit confused about the impact of cublasComputeType_t on computation when using the cublasGemmEx API. For example, my A, B, and C matrices are all of type float. When cublasComputeType_t=...
0 votes
0 answers
50 views

I know that executing a standalone transpose operation in CUDA is expensive, so I'm curious: What is the meaning of the cublasOperation_t parameter in the cublas gemm API? Does cublas perform a real ...
0 votes
1 answer
162 views

I want to do the EVD wth this 4x4 covariance matrix: cuDoubleComplex m_cov_[16] = { make_cuDoubleComplex(2.0301223848037391, 3.4235008150792548e-17), make_cuDoubleComplex(1....
Weimin Chan's user avatar
0 votes
1 answer
429 views

I read some related posts here, and success using do the row majored matrixes multiplication with cuBLAS: A*B (column majored) = B*A (row majored) I write a wrapper to do this so that I can pass row ...
Weimin Chan's user avatar
1 vote
1 answer
753 views

I've made the following CUDA tests to compare the performance numbers of (square) matrix multiplication, running on Ubuntu 24.04 with the GPU card Quadro T1000 Mobile of compute capability 7.5 (arch=...
sof's user avatar
  • 9,777
0 votes
0 answers
97 views

I'm trying to perform matrix multiplication using cuBLAS and compare the results with NumPy. However, I'm getting different results between the two. Here's my code: #include <iostream> #include &...
musako's user avatar
  • 1,367
-1 votes
1 answer
2k views

When I set n_gpu_layer to 1, i can see the following response: To learn Python, you can consider the following options: 1. Online Courses: Websites like Coursera, edX, Codecadem♠♦♥◄!▬$さんかく▅ `▅☻↑↨►☻...
4 votes
1 answer
4k views

I'm trying to run llama index with llama cpp by following the installation docs but inside a docker container. Following this repo for installation of llama_cpp_python==0.2.6. DOCKERFILE # Use the ...
4 votes
1 answer
3k views

I can install llama cpp with cuBLAS using pip as below: CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python However, I don't know how to install it with cuBLAS when ...
0 votes
1 answer
250 views

I have a very simple CUDA program that refuses to compile This is main.cpp #include <iostream> #include <cstdlib> #include "/opt/cuda/targets/x86_64-linux/include/cuda_runtime.h" ...
Sean's user avatar
  • 881
0 votes
1 answer
205 views

I'm attempting to set up an interface to use cublas.lib in fortran without any separate c-code. I have seen a few examples of this and tried to duplicate those but I having trouble. Both of these ...
js1's user avatar
  • 3

15 30 50 per page
1
2 3 4 5
...
23

AltStyle によって変換されたページ (->オリジナル) /