Project-HAMi/HAMi-core

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
.vscode		.vscode
dockerfiles		dockerfiles
docs/images		docs/images
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
README_CN.md		README_CN.md
build.sh		build.sh

Repository files navigation

HAMi-core —— Hook library for CUDA Environments

English | 中文

Introduction

HAMi-core is the in-container gpu resource controller, it has beed adopted by HAMi, volcano

Features

HAMi-core has the following features:

Virtualize device meory
Limit device utilization by self-implemented time shard
Real-time device utilization monitor

image

Design

HAMi-core operates by Hijacking the API-call between CUDA-Runtime(libcudart.so) and CUDA-Driver(libcuda.so), as the figure below:

Build in Docker

make build-in-docker

Usage

CUDA_DEVICE_MEMORY_LIMIT indicates the upper limit of device memory (eg 1g,1024m,1048576k,1073741824)

CUDA_DEVICE_SM_LIMIT indicates the sm utility percentage of each device

# Add 1GiB memory limit and set max SM utility to 50% for all devices
export LD_PRELOAD=./libvgpu.so
export CUDA_DEVICE_MEMORY_LIMIT=1g
export CUDA_DEVICE_SM_LIMIT=50

If you run CUDA applications locally, please create the local directory first.

mkdir /tmp/vgpulock/

If you have updated `CUDA_DEVICE_MEMORY_LIMIT` or `CUDA_DEVICE_SM_LIMIT`, please delete the local cache file.

rm /tmp/cudevshr.cache


## Docker Images
```bash
# Build docker image
docker build . -f=dockerfiles/Dockerfile -t cuda_vmem:tf1.8-cu90
# Configure GPU device and library mounts for container
export DEVICE_MOUNTS="--device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidia-uvm:/dev/nvidia-uvm --device /dev/nvidiactl:/dev/nvidiactl"
export LIBRARY_MOUNTS="-v /usr/cuda_files:/usr/cuda_files -v $(which nvidia-smi):/bin/nvidia-smi"
# Run container and check nvidia-smi output
docker run ${LIBRARY_MOUNTS} ${DEVICE_MOUNTS} -it \
 -e CUDA_DEVICE_MEMORY_LIMIT=2g \
 -e LD_PRELOAD=/libvgpu/build/libvgpu.so \
 cuda_vmem:tf1.8-cu90 \
 nvidia-smi

After running, you will see nvidia-smi output similar to the following, showing memory limited to 2GiB:

...
[HAMI-core Msg(1:140235494377280:libvgpu.c:836)]: Initializing.....
Mon Dec 2 04:38:12 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3060 Off | 00000000:03:00.0 Off | N/A |
| 30% 36C P8 7W / 170W | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
[HAMI-core Msg(1:140235494377280:multiprocess_memory_limit.c:497)]: Calling exit handler 1

Log

Use environment variable LIBCUDA_LOG_LEVEL to set the visibility of logs

LIBCUDA_LOG_LEVEL	description
0	errors only
1(default),2	errors,warnings,messages
3	infos,errors,warnings,messages
4	debugs,errors,warnings,messages

Test Raw APIs

./test/test_alloc

About

HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container

Releases

No releases published

Packages

No packages published

Contributors 19

+ 5 contributors

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project-HAMi/HAMi-core

Folders and files

Latest commit

History

Repository files navigation

HAMi-core —— Hook library for CUDA Environments

Introduction

Features

Design

Build in Docker

Usage

Log

Test Raw APIs

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors 19

Languages

Project-HAMi/HAMi-core

Folders and files

Latest commit

History

Repository files navigation

HAMi-core —— Hook library for CUDA Environments

Introduction

Features

Design

Build in Docker

Usage

Log

Test Raw APIs

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 19

Languages

Packages