Newest 'nvcc' Questions

Stack Overflow

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

738 questions

Newest Active Bountied Unanswered

Best practices

0 votes

0 replies

29 views

How to extract nested loop features from CUDA kernels for LLM-based optimization?

Question I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...

yuxuan-z's user avatar

yuxuan-z

asked Nov 25, 2025 at 16:34

-2 votes

1 answer

80 views

cmake generating a bad command line option for CUDA in MSVC on Windows [closed]

Cmake build is producing this error message, nvcc fatal : A single input file is required for a non-link phase when an outputfile is specified when running this command like that itself generates:...

alfC's user avatar

alfC

16.8k

asked Aug 2, 2025 at 1:18

0 votes

1 answer

125 views

Why does NVCC not optimize ldexpf with a constexpr power-of-two exponent into a simple fmul?

Consider the following CUDA code: enum { p = 5 }; __device__ float adjust_mul(float x) { return x * (1 << p); } __device__ float adjust_ldexpf(float x) { return ldexpf(x, p); } I would expect ...

einpoklum's user avatar

einpoklum

138k

asked Jul 20, 2025 at 15:09

5 votes

3 answers

193 views

Why don't multi-character literals respect architecture endianness?

With GCC on Intel x86 , and similarly with NVCC (Cuda), #include <stdio.h> #include <stdint.h> #include <stdbool.h> int main() { uint32_t v = 'abcd'; uint32_t w = ...

Basj's user avatar

Basj

47.6k

asked Jun 8, 2025 at 8:52

4 votes

1 answer

224 views

How are constexpr device variables accessible from host?

My colleague came across this situation where global __device__ constexpr variables are accessible from both the host and the device. #include <array> #include <cstdio> __device__ ...

Hari's user avatar

Hari

1,995

asked May 14, 2025 at 18:34

7 votes

1 answer

2k views

NVCC compilation error: exception specification is incompatible with that of previous function "cospi"

When I try to compile a simple CUDA program, e.g. the vectorAdd sample, I get errors about incompatible exception specifications: $ nvcc -I ../../../Common/ -ccbin g++-13 -Wno-deprecated-gpu-targets ...

einpoklum's user avatar

einpoklum

138k

asked Apr 26, 2025 at 20:52

3 votes

0 answers

85 views

How can I use NVCC to detect signed/unsigned issues in CUDA kernel code?

When working with large datasets, illegal memory access errors often arise because of truncated integer ranges. Catching sign changes as well as signed-vs-unsigned comparisons could help prevent this. ...

Richard's user avatar

Richard

62.9k

asked Mar 22, 2025 at 0:12

1 vote

2 answers

169 views

CUDA: curand_uniform() distribution not as random as expected

My goal is to use curand_uniform() to have every kernel thread generate a single random number. I am testing the randomness my program generates by treating each generated numbers as an index into a ...

wxz's user avatar

wxz

2,636

asked Mar 16, 2025 at 3:19

1 vote

1 answer

193 views

How can I get NVCC to error upon implicit integer downcasting/truncation?

In CUDA it's both easy to implicitly downcast/truncate integers and surprisingly common for programmers to do so. I would like CUDA to raise an error when implicit downcasting occurs. Consider this ...

Richard's user avatar

Richard

62.9k

asked Mar 14, 2025 at 1:00

3 votes

0 answers

69 views

How to Compile a .cuh File with Template Functions Using NVCC and a .cpp File with MSVC in a Single Project?

I have an A.cuh file that contains template functions and their implementations. I also have a file main.cpp, which calls some functions from A.cuh. For the program to work correctly, A.cuh must be ...

Drem Lit's user avatar

Drem Lit

asked Jan 26, 2025 at 16:20

3 votes

0 answers

152 views

Different behaviors of clang/nvcc when compiling CUDA code

I have a code named max.cu int main() { int a = max(1, 2); } When I use nvcc to compile it, seems that it finds the builtin max function in global namespace and succeeds. The output of objdump is: ...

Conless's user avatar

Conless

asked Jan 1, 2025 at 4:34

0 votes

1 answer

117 views

Does nvcc mangle names differently in its error messages than it does in the compiled code it creates

I am on Windows using VS2022. I have built a library using CUDA built in the MSVC GUI (not executing directly from a command line) and in it I export a device function. Then a have another .cu file ...

Rich Tanenbaum's user avatar

Rich Tanenbaum

asked Dec 12, 2024 at 6:25

-3 votes

1 answer

78 views

Ubuntu24.04 cudaGetDevice Failed

My program Like this: #include <cuda_runtime_api.h> int main() { int id = -1; int state = cudaGetDevice(&id); if (state != cudaSuccess) std::cout << "Failed!"; ...

Ustinian's user avatar

Ustinian

asked Dec 2, 2024 at 15:21

0 votes

1 answer

140 views

Allocate executable memory and execute it in CUDA

I'd like to allocate executable memory in CUDA, write SASS/CUBIN code there, and then execute this code. On the CPU for Linux systems, this is quite easy and well-documented -- just a combination of ...

aahlback's user avatar

aahlback

asked Nov 22, 2024 at 15:05

0 votes

1 answer

540 views

If I tell NVCC to -gencode arch=native, what do I use for the code= argument?

Suppose my machine has GPUs with compute capabilities XX and YY. Having read: https://stackoverflow.com/a/35657430/1593077 I know I can call nvcc like so: nvcc \ -o myapp \ -gencode arch=...

einpoklum's user avatar

einpoklum

138k

asked Nov 15, 2024 at 20:52

15 30 50 per page

2 3 4 5

...

50 Next

CollectivesTM on Stack Overflow

How to extract nested loop features from CUDA kernels for LLM-based optimization?

cmake generating a bad command line option for CUDA in MSVC on Windows [closed]

Why does NVCC not optimize ldexpf with a constexpr power-of-two exponent into a simple fmul?

Why don't multi-character literals respect architecture endianness?

How are constexpr device variables accessible from host?

NVCC compilation error: exception specification is incompatible with that of previous function "cospi"

How can I use NVCC to detect signed/unsigned issues in CUDA kernel code?

CUDA: curand_uniform() distribution not as random as expected

How can I get NVCC to error upon implicit integer downcasting/truncation?

How to Compile a .cuh File with Template Functions Using NVCC and a .cpp File with MSVC in a Single Project?

Different behaviors of clang/nvcc when compiling CUDA code

Does nvcc mangle names differently in its error messages than it does in the compiled code it creates

Ubuntu24.04 cudaGetDevice Failed

Allocate executable memory and execute it in CUDA

If I tell NVCC to -gencode arch=native, what do I use for the code= argument?

Hot Network Questions