Newest 'cuda-graphs' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

23 questions

2 votes

1 answer

199 views

Executing a CUDA Graph from a CUDA kernel

I’m trying to launch a captured CUDA Graph from inside a regular CUDA kernel (i.e., device-side graph launch). From the NVIDIA blog on device graph launch, it seems this should be supported on newer ...

Mohammad Siavashi's user avatar

Mohammad Siavashi

1,292

asked Oct 13 at 11:38

0 votes

1 answer

204 views

Trying to end stream capture fails due to "unjoined work"; but synchronizing fails when capture is in progress

I am capturing some work on a CUDA stream, then instantiating the resulting graph (or graph template rather) and running it again. I am running CUDA 12.6.85 with driver version 535.54.03 (the driver ...

einpoklum's user avatar

einpoklum

138k

asked Mar 9 at 15:21

0 votes

2 answers

308 views

CUDA graph cudaKernelNodeParams kernelParams

I want to use CUDA Graph in my CUDA project, but there aren’t many complete examples online. So, I directly referred to the official API to implement it, but I keep encountering a segmentation fault. ...

Weimin Chan's user avatar

Weimin Chan

asked Oct 8, 2024 at 6:11

1 vote

1 answer

770 views

CUDA Graph Execution Taking Longer Than Original Kernel Launch Loop

I have a loop where I launch multiple kernels with interdependencies using events and streams. Here’s the original loop without CUDA graphs: for (int i= 1; i<= 1024 ; i++) { // origin stream ...

Photos's user avatar

Photos

asked Jun 20, 2024 at 15:55

2 votes

2 answers

879 views

How to Use CUDA Graphs with Interdependent Streams and Dynamic Parameters?

I have a CUDA program with multiple interdependent streams, and I want to convert it to use CUDA graphs to reduce launch overhead and improve performance. My program involves launching three kernels (...

Photos's user avatar

Photos

asked Jun 20, 2024 at 7:06

0 votes

1 answer

163 views

Behavior of cudaGraphInstantiateFlagUseNodePriority

My understanding of cudaGraphInstantiateFlagUseNodePriority is to prioritize the kernel calls. i.e. we have three independent kernels in cudaGraph first, second & third, and let each kernel waits ...

user9684153's user avatar

user9684153

asked Aug 21, 2023 at 11:21

0 votes

1 answer

788 views

Is it possible to execute more than one CUDA graph's host execution node in different streams concurrently?

Investigating possible solutions for this problem, I thought about using CUDA graphs' host execution nodes (cudaGraphAddHostNode). I was hoping to have the option to block and unblock streams on the ...

surabax's user avatar

surabax

asked Mar 15, 2023 at 2:17

3 votes

1 answer

593 views

Catching an exception thrown from a callback in cudaLaunchHostFunc

I want to check for an error flag living in managed memory that might have been written by a kernel running on a certain stream. Depending on the error flag I need to throw an exception. I would ...

Raul's user avatar

Raul

asked Jan 17, 2023 at 11:15

0 votes

1 answer

115 views

What should I set the flags field of CUDA_BATCH_MEM_OP_NODE_PARAMS?

The CUDA graph API exposes a function call for adding a "batch memory operations" node to a graph: CUresult cuGraphAddBatchMemOpNode ( CUgraphNode* phGraphNode, CUgraph hGraph, ...

einpoklum's user avatar

einpoklum

138k

asked Jan 15, 2023 at 21:48

0 votes

1 answer

59 views

What type should be pointed to for the result of cuDeviceGetGraphMemAttribute()?

cuDeviceGetGraphMemAttribute() takes a void pointer to a result variable. But - what type does it expect the pointed-to value to be? The documentation (for CUDA v12.0) doesn't say. I'm guessing it's ...

einpoklum's user avatar

einpoklum

138k

asked Dec 24, 2022 at 20:15

0 votes

1 answer

63 views

How can I tell whether a copy-node search failed, or whether my node or graph are invalid?

Consider the CUDA graphs API function cuFindNodeInClone(). The documentation says, that it: Returns: CUDA_SUCCESS, CUDA_ERROR_INVALID_VALUE This seems problematic to me. How can I tell whether the ...

einpoklum's user avatar

einpoklum

138k

asked Dec 17, 2022 at 22:22

-1 votes

1 answer

489 views

CUDA graph does not run as expected

I'm using the following the code to learn about how to use "CUDA graphs". The parameter NSTEP is set as 1000, and the parameter NKERNEL is set as 20. The kernel function shortKernel has ...

Jackson's user avatar

Jackson

asked Dec 16, 2022 at 3:59

0 votes

1 answer

646 views

simple cuda graph example doesn't product expected result

I am testing out cuda graphs. My graph is as follows. the code for this is as follows #include <cstdio> #include <cstdlib> #include <fstream> #include <iostream> #include <...

M46f988b814's user avatar

M46f988b814

asked Nov 3, 2022 at 21:37

0 votes

1 answer

3k views

Error with a captured CUDA graph and asynchronous memory allocations in a loop [closed]

I am trying to implement a cuda graph experiment. There are three kernels, kernel_0, kernel_1, and kernel_2. They will be executed sequentially and have dependencies. Right now I am going to only ...

kingwales's user avatar

kingwales

asked Jul 23, 2022 at 3:30

2 votes

2 answers

2k views

Using multi streams in cuda graph, the execution order is uncontrolled

I am using cuda graph stream capture API to implement a small demo with multi streams. Referenced by the CUDA Programming Guide here, I wrote the complete code. In my knowledge, kernelB should execute ...

poohRui's user avatar

poohRui

asked May 17, 2022 at 3:27

15 30 50 per page

2 Next

CollectivesTM on Stack Overflow

Executing a CUDA Graph from a CUDA kernel

Trying to end stream capture fails due to "unjoined work"; but synchronizing fails when capture is in progress

CUDA graph cudaKernelNodeParams kernelParams

CUDA Graph Execution Taking Longer Than Original Kernel Launch Loop

How to Use CUDA Graphs with Interdependent Streams and Dynamic Parameters?

Behavior of cudaGraphInstantiateFlagUseNodePriority

Is it possible to execute more than one CUDA graph's host execution node in different streams concurrently?

Catching an exception thrown from a callback in cudaLaunchHostFunc

What should I set the flags field of CUDA_BATCH_MEM_OP_NODE_PARAMS?

What type should be pointed to for the result of cuDeviceGetGraphMemAttribute()?

How can I tell whether a copy-node search failed, or whether my node or graph are invalid?

CUDA graph does not run as expected

simple cuda graph example doesn't product expected result

Error with a captured CUDA graph and asynchronous memory allocations in a loop [closed]

Using multi streams in cuda graph, the execution order is uncontrolled

Hot Network Questions