Skip to main content
Stack Overflow
  1. About
  2. For Teams
Filter by
Sorted by
Tagged with
-5 votes
1 answer
56 views

I tested the performance of LAMMPS with DeepMD-kit for MD simulations on an HPC cluster. The job was allocated 8 CPUs, 64 GB of RAM, and one A100 GPU. I observed that when running with mpirun -np 1 ...
link89's user avatar
  • 2,007
Tooling
0 votes
1 replies
60 views

I’m trying to simulate multiple camera streams feeding into a YOLOv8l model on a single GPU and monitor real-time hardware utilization. My setup: Single GPU (48GB VRAM, CUDA-enabled) YOLOv8l model ...
1 vote
1 answer
476 views

I am trying to install JAX with GPU support on a powerful, dedicated Linux server, but I am stuck in what feels like a Catch-22 where every official installation method fails in a different way, ...
3 votes
1 answer
156 views

I am trying to run basic CUDA program in google colab but its not giving kernel output. Below are the steps what I tried: Changed run type to T4 GPU. !pip install nvcc4jupyter %load_ext ...
1 vote
1 answer
66 views

I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022. I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint....
0 votes
1 answer
465 views

I’ve been trying to get TensorFlow to use my GPU on Windows, and even though everything seems installed correctly, it shows 0 available GPUs. System setup Windows 11 RTX 3050 Laptop GPU NVIDIA driver ...
1 vote
0 answers
216 views

I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show "Command Buffer Full". This causes the cudaLaunchKernel time to become very long, as shown ...
2 votes
0 answers
326 views

I am using WSL2 on windows 10. I have NVIDIA graphics card. I recently installed GPU jax using the command pip install -U "jax[cuda12]". This completed successfully, but when I run any jax ...
2 votes
1 answer
197 views

I’m trying to launch a captured CUDA Graph from inside a regular CUDA kernel (i.e., device-side graph launch). From the NVIDIA blog on device graph launch, it seems this should be supported on newer ...
0 votes
1 answer
103 views

I am trying to implement producer consumer problem in GPU-CPU. Required for some other project. GPU requests some data via Unified memory to CPU. CPU copies that data to a specific location in global ...
3 votes
1 answer
125 views

Problem Statement My iSLAM system works correctly with the original PyTorch PWC-Net but produces catastrophic trajectory errors (2.4km ATE RMSE) when I replace it with a TensorRT-converted version. ...
0 votes
0 answers
153 views

I'm converting a PWC-Net optical flow model to run on Jetson NX DLA using the iSLAM framework, but the TensorRT engine build fails during DLA optimization. Environment Hardware: NVIDIA Jetson NX ...
0 votes
0 answers
73 views

I am attempting to write my own holoscan::Operator for creating some images that should be displayed as a short video using a holoscan::ops::HolovizOp. So I compose()-d an application flow: add_flow(...
0 votes
1 answer
170 views

I want to quantitatively measure the memory bandwidth utilization and SM utilization of a CUDA program for performance analysis and regression testing. My approach so far: Compute the theoretical ...
2 votes
1 answer
157 views

I am trying to get the holoscan example "bring your own model" https://docs.nvidia.com/holoscan/sdk-user-guide/examples/byom.html to run, translating it from Python into CPP. One necessary ...

15 30 50 per page
1
2 3 4 5
...
251

AltStyle によって変換されたページ (->オリジナル) /