738 questions
- Bountied 0
- Unanswered
- Frequent
- Score
- Trending
- Week
- Month
- Unanswered (my tags)
Best practices
0
votes
0
replies
29
views
How to extract nested loop features from CUDA kernels for LLM-based optimization?
Question
I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...
-2
votes
1
answer
80
views
cmake generating a bad command line option for CUDA in MSVC on Windows [closed]
Cmake build is producing this error message,
nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified
when running this command like that itself generates:...
0
votes
1
answer
125
views
Why does NVCC not optimize ldexpf with a constexpr power-of-two exponent into a simple fmul?
Consider the following CUDA code:
enum { p = 5 };
__device__ float adjust_mul(float x) { return x * (1 << p); }
__device__ float adjust_ldexpf(float x) { return ldexpf(x, p); }
I would expect ...
5
votes
3
answers
193
views
Why don't multi-character literals respect architecture endianness?
With GCC on Intel x86 , and similarly with NVCC (Cuda),
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
int main() {
uint32_t v = 'abcd';
uint32_t w = ...
Basj's user avatar
- 47.6k
4
votes
1
answer
224
views
How are constexpr device variables accessible from host?
My colleague came across this situation where global __device__ constexpr variables are accessible from both the host and the device.
#include <array>
#include <cstdio>
__device__ ...
7
votes
1
answer
2k
views
NVCC compilation error: exception specification is incompatible with that of previous function "cospi"
When I try to compile a simple CUDA program, e.g. the vectorAdd sample, I get errors about incompatible exception specifications:
$ nvcc -I ../../../Common/ -ccbin g++-13 -Wno-deprecated-gpu-targets ...
3
votes
0
answers
85
views
How can I use NVCC to detect signed/unsigned issues in CUDA kernel code?
When working with large datasets, illegal memory access errors often arise because of truncated integer ranges. Catching sign changes as well as signed-vs-unsigned comparisons could help prevent this.
...
Richard's user avatar
- 62.9k
1
vote
2
answers
169
views
CUDA: curand_uniform() distribution not as random as expected
My goal is to use curand_uniform() to have every kernel thread generate a single random number. I am testing the randomness my program generates by treating each generated numbers as an index into a ...
1
vote
1
answer
193
views
How can I get NVCC to error upon implicit integer downcasting/truncation?
In CUDA it's both easy to implicitly downcast/truncate integers and surprisingly common for programmers to do so.
I would like CUDA to raise an error when implicit downcasting occurs.
Consider this ...
Richard's user avatar
- 62.9k
3
votes
0
answers
69
views
How to Compile a .cuh File with Template Functions Using NVCC and a .cpp File with MSVC in a Single Project?
I have an A.cuh file that contains template functions and their implementations. I also have a file main.cpp, which calls some functions from A.cuh.
For the program to work correctly, A.cuh must be ...
3
votes
0
answers
152
views
Different behaviors of clang/nvcc when compiling CUDA code
I have a code named max.cu
int main() {
int a = max(1, 2);
}
When I use nvcc to compile it, seems that it finds the builtin max function in global namespace and succeeds. The output of objdump is:
...
0
votes
1
answer
117
views
Does nvcc mangle names differently in its error messages than it does in the compiled code it creates
I am on Windows using VS2022. I have built a library using CUDA built in the MSVC GUI (not executing directly from a command line) and in it I export a device function. Then a have another .cu file ...
-3
votes
1
answer
78
views
Ubuntu24.04 cudaGetDevice Failed
My program Like this:
#include <cuda_runtime_api.h>
int main() {
int id = -1;
int state = cudaGetDevice(&id);
if (state != cudaSuccess) std::cout << "Failed!";
...
0
votes
1
answer
140
views
Allocate executable memory and execute it in CUDA
I'd like to allocate executable memory in CUDA, write SASS/CUBIN code there, and then execute this code. On the CPU for Linux systems, this is quite easy and well-documented -- just a combination of ...
0
votes
1
answer
540
views
If I tell NVCC to -gencode arch=native, what do I use for the code= argument?
Suppose my machine has GPUs with compute capabilities XX and YY. Having read:
https://stackoverflow.com/a/35657430/1593077
I know I can call nvcc like so:
nvcc \
-o myapp \
-gencode arch=...