Real-time NVIDIA GPU monitoring dashboard. Web-based, no SSH required.
Python Docker License: MIT NVIDIA
GPU Hot DashboardMonitor a single machine or an entire cluster with the same Docker image.
Single machine:
docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
Multiple machines:
# On each GPU server docker run -d --gpus all -p 1312:1312 -e NODE_NAME=$(hostname) ghcr.io/psalias2006/gpu-hot:latest # On a hub machine (no GPU required) docker run -d -p 1312:1312 -e GPU_HOT_MODE=hub -e NODE_URLS=http://server1:1312,http://server2:1312,http://server3:1312 ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312
Older GPUs: Add -e NVIDIA_SMI=true
if metrics don't appear.
From source:
git clone https://github.com/psalias2006/gpu-hot
cd gpu-hot
docker-compose up --build
Requirements: Docker + NVIDIA Container Toolkit
- Real-time metrics (sub-second)
- Automatic multi-GPU detection
- Process monitoring (PID, memory usage)
- Historical charts (utilization, temperature, power, clocks)
- System metrics (CPU, RAM)
- Scale from 1 to 100+ GPUs
Metrics: Utilization, temperature, memory, power draw, fan speed, clock speeds, PCIe info, P-State, throttle status, encoder/decoder sessions
Environment variables:
NVIDIA_VISIBLE_DEVICES=0,1 # Specific GPUs (default: all) NVIDIA_SMI=true # Force nvidia-smi mode for older GPUs GPU_HOT_MODE=hub # Set to 'hub' for multi-node aggregation (default: single node) NODE_NAME=gpu-server-1 # Node display name (default: hostname) NODE_URLS=http://host:1312... # Comma-separated node URLs (required for hub mode)
Backend (core/config.py
):
UPDATE_INTERVAL = 0.5 # Polling interval PORT = 1312 # Server port
GET / # Dashboard GET /api/gpu-data # JSON metrics
socket.on('gpu_data', (data) => { // Updates every 0.5s (configurable) // Contains: data.gpus, data.processes, data.system });
gpu-hot/ βββ app.py # Flask + WebSocket server βββ core/ β βββ config.py # Configuration β βββ monitor.py # NVML GPU monitoring β βββ handlers.py # WebSocket handlers β βββ routes.py # HTTP routes β βββ metrics/ β βββ collector.py # Metrics collection β βββ utils.py # Metric utilities βββ static/ β βββ js/ β β βββ charts.js # Chart configs β β βββ gpu-cards.js # UI components β β βββ socket-handlers.js # WebSocket + rendering β β βββ ui.js # View management β β βββ app.js # Init β βββ css/styles.css βββ templates/index.html βββ Dockerfile βββ docker-compose.yml
No GPUs detected:
nvidia-smi # Verify drivers work docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi # Test Docker GPU access
Hub can't connect to nodes:
curl http://node-ip:1312/api/gpu-data # Test connectivity sudo ufw allow 1312/tcp # Check firewall
Performance issues: Increase UPDATE_INTERVAL
in core/config.py
PRs welcome. Open an issue for major changes.
MIT - see LICENSE