Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
2 votes
1 answer
199 views

I’m trying to launch a captured CUDA Graph from inside a regular CUDA kernel (i.e., device-side graph launch). From the NVIDIA blog on device graph launch, it seems this should be supported on newer ...
0 votes
1 answer
204 views

I am capturing some work on a CUDA stream, then instantiating the resulting graph (or graph template rather) and running it again. I am running CUDA 12.6.85 with driver version 535.54.03 (the driver ...
0 votes
2 answers
308 views

I want to use CUDA Graph in my CUDA project, but there aren’t many complete examples online. So, I directly referred to the official API to implement it, but I keep encountering a segmentation fault. ...
1 vote
1 answer
770 views

I have a loop where I launch multiple kernels with interdependencies using events and streams. Here’s the original loop without CUDA graphs: for (int i= 1; i<= 1024 ; i++) { // origin stream ...
2 votes
2 answers
879 views

I have a CUDA program with multiple interdependent streams, and I want to convert it to use CUDA graphs to reduce launch overhead and improve performance. My program involves launching three kernels (...
0 votes
1 answer
163 views

My understanding of cudaGraphInstantiateFlagUseNodePriority is to prioritize the kernel calls. i.e. we have three independent kernels in cudaGraph first, second & third, and let each kernel waits ...
0 votes
1 answer
788 views

Investigating possible solutions for this problem, I thought about using CUDA graphs' host execution nodes (cudaGraphAddHostNode). I was hoping to have the option to block and unblock streams on the ...
3 votes
1 answer
593 views

I want to check for an error flag living in managed memory that might have been written by a kernel running on a certain stream. Depending on the error flag I need to throw an exception. I would ...
Raul's user avatar
  • 41
0 votes
1 answer
115 views

The CUDA graph API exposes a function call for adding a "batch memory operations" node to a graph: CUresult cuGraphAddBatchMemOpNode ( CUgraphNode* phGraphNode, CUgraph hGraph, ...
0 votes
1 answer
59 views

cuDeviceGetGraphMemAttribute() takes a void pointer to a result variable. But - what type does it expect the pointed-to value to be? The documentation (for CUDA v12.0) doesn't say. I'm guessing it's ...
0 votes
1 answer
63 views

Consider the CUDA graphs API function cuFindNodeInClone(). The documentation says, that it: Returns: CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE This seems problematic to me. How can I tell whether the ...
-1 votes
1 answer
489 views

I'm using the following the code to learn about how to use "CUDA graphs". The parameter NSTEP is set as 1000, and the parameter NKERNEL is set as 20. The kernel function shortKernel has ...
Jackson's user avatar
0 votes
1 answer
646 views

I am testing out cuda graphs. My graph is as follows. the code for this is as follows #include <cstdio> #include <cstdlib> #include <fstream> #include <iostream> #include <...
0 votes
1 answer
3k views

I am trying to implement a cuda graph experiment. There are three kernels, kernel_0, kernel_1, and kernel_2. They will be executed sequentially and have dependencies. Right now I am going to only ...
kingwales's user avatar
  • 139
2 votes
2 answers
2k views

I am using cuda graph stream capture API to implement a small demo with multi streams. Referenced by the CUDA Programming Guide here, I wrote the complete code. In my knowledge, kernelB should execute ...

15 30 50 per page
1
2

AltStyle によって変換されたページ (->オリジナル) /