Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
Best practices
0 votes
0 replies
29 views

Question I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...
-2 votes
1 answer
80 views

Cmake build is producing this error message, nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified when running this command like that itself generates:...
alfC's user avatar
  • 16.8k
0 votes
1 answer
125 views

Consider the following CUDA code: enum { p = 5 }; __device__ float adjust_mul(float x) { return x * (1 << p); } __device__ float adjust_ldexpf(float x) { return ldexpf(x, p); } I would expect ...
5 votes
3 answers
193 views

With GCC on Intel x86 , and similarly with NVCC (Cuda), #include <stdio.h> #include <stdint.h> #include <stdbool.h> int main() { uint32_t v = 'abcd'; uint32_t w = ...
Basj's user avatar
  • 47.6k
4 votes
1 answer
224 views

My colleague came across this situation where global __device__ constexpr variables are accessible from both the host and the device. #include <array> #include <cstdio> __device__ ...
Hari's user avatar
  • 1,995
7 votes
1 answer
2k views

When I try to compile a simple CUDA program, e.g. the vectorAdd sample, I get errors about incompatible exception specifications: $ nvcc -I ../../../Common/ -ccbin g++-13 -Wno-deprecated-gpu-targets ...
3 votes
0 answers
85 views

When working with large datasets, illegal memory access errors often arise because of truncated integer ranges. Catching sign changes as well as signed-vs-unsigned comparisons could help prevent this. ...
Richard's user avatar
  • 62.9k
1 vote
2 answers
169 views

My goal is to use curand_uniform() to have every kernel thread generate a single random number. I am testing the randomness my program generates by treating each generated numbers as an index into a ...
1 vote
1 answer
193 views

In CUDA it's both easy to implicitly downcast/truncate integers and surprisingly common for programmers to do so. I would like CUDA to raise an error when implicit downcasting occurs. Consider this ...
Richard's user avatar
  • 62.9k
3 votes
0 answers
69 views

I have an A.cuh file that contains template functions and their implementations. I also have a file main.cpp, which calls some functions from A.cuh. For the program to work correctly, A.cuh must be ...
Drem Lit's user avatar
3 votes
0 answers
152 views

I have a code named max.cu int main() { int a = max(1, 2); } When I use nvcc to compile it, seems that it finds the builtin max function in global namespace and succeeds. The output of objdump is: ...
0 votes
1 answer
117 views

I am on Windows using VS2022. I have built a library using CUDA built in the MSVC GUI (not executing directly from a command line) and in it I export a device function. Then a have another .cu file ...
-3 votes
1 answer
78 views

My program Like this: #include <cuda_runtime_api.h> int main() { int id = -1; int state = cudaGetDevice(&id); if (state != cudaSuccess) std::cout << "Failed!"; ...
Ustinian's user avatar
  • 121
0 votes
1 answer
140 views

I'd like to allocate executable memory in CUDA, write SASS/CUBIN code there, and then execute this code. On the CPU for Linux systems, this is quite easy and well-documented -- just a combination of ...
aahlback's user avatar
  • 130
0 votes
1 answer
540 views

Suppose my machine has GPUs with compute capabilities XX and YY. Having read: https://stackoverflow.com/a/35657430/1593077 I know I can call nvcc like so: nvcc \ -o myapp \ -gencode arch=...

15 30 50 per page
1
2 3 4 5
...
50

AltStyle によって変換されたページ (->オリジナル) /