Newest 'nvidia' Questions

1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

3,752 questions

-5 votes

1 answer

56 views

Performance Degradation of LAMMPS with Increased MPI Ranks on a A100 GPU [closed]

I tested the performance of LAMMPS with DeepMD-kit for MD simulations on an HPC cluster. The job was allocated 8 CPUs, 64 GB of RAM, and one A100 GPU. I observed that when running with mpirun -np 1 ...

link89's user avatar

link89

2,007

asked Dec 24 at 1:01

Tooling

0 votes

1 replies

60 views

How to dynamically estimate maximum number of cameras my GPU can handle for YOLOv8 inference?

I’m trying to simulate multiple camera streams feeding into a YOLOv8l model on a single GPU and monitor real-time hardware utilization. My setup: Single GPU (48GB VRAM, CUDA-enabled) YOLOv8l model ...

Madesh Prasad's user avatar

Madesh Prasad

asked Dec 22 at 12:23

1 vote

1 answer

476 views

How to correctly install JAX with CUDA on Linux when `jax[cuda12_pip]` consistently falls back to the CPU version?

I am trying to install JAX with GPU support on a powerful, dedicated Linux server, but I am stuck in what feels like a Catch-22 where every official installation method fails in a different way, ...

PowerPoint Trenton's user avatar

PowerPoint Trenton

asked Nov 12 at 9:36

3 votes

1 answer

156 views

Unable to run CUDA program in google colab

I am trying to run basic CUDA program in google colab but its not giving kernel output. Below are the steps what I tried: Changed run type to T4 GPU. !pip install nvcc4jupyter %load_ext ...

Digvijay Singh Thakur's user avatar

Digvijay Singh Thakur

3,351

asked Nov 6 at 7:52

1 vote

1 answer

66 views

How to debug cuda in Visual Studio with "step over"

I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022. I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint....

Imagination Youth's user avatar

Imagination Youth

asked Oct 31 at 2:36

0 votes

1 answer

465 views

TensorFlow not detecting NVIDIA GPU (RTX 3050, CUDA 12.7, TF 2.20.0) [duplicate]

I’ve been trying to get TensorFlow to use my GPU on Windows, and even though everything seems installed correctly, it shows 0 available GPUs. System setup Windows 11 RTX 3050 Laptop GPU NVIDIA driver ...

Houssem Eddine's user avatar

Houssem Eddine

asked Oct 26 at 22:21

1 vote

0 answers

216 views

Why does "Command Buffer Full" appear in PyTorch CUDA kernel launches?

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show "Command Buffer Full". This causes the cudaLaunchKernel time to become very long, as shown ...

plznobug's user avatar

plznobug

asked Oct 23 at 12:36

2 votes

0 answers

326 views

jax plugin configuration error: Exception when calling jax_plugins.xla_cuda12.initialize()

I am using WSL2 on windows 10. I have NVIDIA graphics card. I recently installed GPU jax using the command pip install -U "jax[cuda12]". This completed successfully, but when I run any jax ...

DrMittal's user avatar

DrMittal

asked Oct 14 at 14:14

2 votes

1 answer

197 views

Executing a CUDA Graph from a CUDA kernel

I’m trying to launch a captured CUDA Graph from inside a regular CUDA kernel (i.e., device-side graph launch). From the NVIDIA blog on device graph launch, it seems this should be supported on newer ...

Mohammad Siavashi's user avatar

Mohammad Siavashi

1,292

asked Oct 13 at 11:38

0 votes

1 answer

103 views

CPU-GPU producer-consumer pattern using unified memory but GPU is in spin loop

I am trying to implement producer consumer problem in GPU-CPU. Required for some other project. GPU requests some data via Unified memory to CPU. CPU copies that data to a specific location in global ...

Chinmaya Bhat K K's user avatar

Chinmaya Bhat K K

asked Sep 30 at 18:38

3 votes

1 answer

125 views

TensorRT PWC-Net Causing 2.4km Trajectory Error in iSLAM - Original PyTorch Works Fine

Problem Statement My iSLAM system works correctly with the original PyTorch PWC-Net but produces catastrophic trajectory errors (2.4km ATE RMSE) when I replace it with a TensorRT-converted version. ...

Unknown's user avatar

Unknown

asked Sep 19 at 11:57

0 votes

0 answers

153 views

TensorRT DLA Engine Build Fails for PWC-Net on Jetson NX - Missing Layer Support?

I'm converting a PWC-Net optical flow model to run on Jetson NX DLA using the iSLAM framework, but the TensorRT engine build fails during DLA optimization. Environment Hardware: NVIDIA Jetson NX ...

Unknown's user avatar

Unknown

asked Sep 15 at 7:33

0 votes

0 answers

73 views

Using a scalar tensor as image source for a Holoscan HolovizOp

I am attempting to write my own holoscan::Operator for creating some images that should be displayed as a short video using a holoscan::ops::HolovizOp. So I compose()-d an application flow: add_flow(...

Markus-Hermann's user avatar

Markus-Hermann

1,061

asked Sep 12 at 6:28

0 votes

1 answer

170 views

How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

I want to quantitatively measure the memory bandwidth utilization and SM utilization of a CUDA program for performance analysis and regression testing. My approach so far: Compute the theoretical ...

plznobug's user avatar

plznobug

asked Sep 5 at 10:48

2 votes

1 answer

157 views

How to define "pool" for Nvidia holoscan::ops::FormatConverterOp

I am trying to get the holoscan example "bring your own model" https://docs.nvidia.com/holoscan/sdk-user-guide/examples/byom.html to run, translating it from Python into CPP. One necessary ...

Markus-Hermann's user avatar

Markus-Hermann

1,061

asked Aug 27 at 11:52

15 30 50 per page

2 3 4 5

...

251 Next

CollectivesTM on Stack Overflow

Performance Degradation of LAMMPS with Increased MPI Ranks on a A100 GPU [closed]

How to dynamically estimate maximum number of cameras my GPU can handle for YOLOv8 inference?

How to correctly install JAX with CUDA on Linux when `jax[cuda12_pip]` consistently falls back to the CPU version?

Unable to run CUDA program in google colab

How to debug cuda in Visual Studio with "step over"

TensorFlow not detecting NVIDIA GPU (RTX 3050, CUDA 12.7, TF 2.20.0) [duplicate]

Why does "Command Buffer Full" appear in PyTorch CUDA kernel launches?

jax plugin configuration error: Exception when calling jax_plugins.xla_cuda12.initialize()

Executing a CUDA Graph from a CUDA kernel

CPU-GPU producer-consumer pattern using unified memory but GPU is in spin loop

TensorRT PWC-Net Causing 2.4km Trajectory Error in iSLAM - Original PyTorch Works Fine

TensorRT DLA Engine Build Fails for PWC-Net on Jetson NX - Missing Layer Support?

Using a scalar tensor as image source for a Holoscan HolovizOp

How to correctly monitor a program’s GPU memory bandwidth utilization and SM utilization? (DCGM DRAM_ACTIVE vs in-program bandwidth differs a lot)

How to define "pool" for Nvidia holoscan::ops::FormatConverterOp

Hot Network Questions