ROCm runtime: configurable weight cache limit and arena chunk size via environment variables#397

Open

gundemirbas wants to merge 1 commit into

antirez:main from

gundemirbas:pr/rocm-runtime-changes

Open

ROCm runtime: configurable weight cache limit and arena chunk size via environment variables #397
gundemirbas wants to merge 1 commit into
antirez:main from
gundemirbas:pr/rocm-runtime-changes

Conversation

@gundemirbas

@gundemirbas gundemirbas commented Jun 11, 2026 •

edited

Loading

Copy link

Copy Markdown

Summary

Make cuda_model_cache_limit_bytes() and cuda_model_arena_chunk_bytes() configurable through environment variables, replacing the previous hardcoded defaults. This improves flexibility for systems with constrained VRAM (e.g., Strix Halo with 128 GB unified memory).

Changes

`cuda_model_cache_limit_bytes()`

Added DS4_CUDA_WEIGHT_CACHE_LIMIT_GB environment variable support.
When set, the cache limit is value * 1 GiB.
Falls back to UINT64_MAX (unlimited) when unset, preserving the original behavior.

`cuda_model_arena_chunk_bytes(uint64_t need)`

Added DS4_CUDA_WEIGHT_ARENA_CHUNK_MB environment variable support.
Default remains 1792 MB when unset.
Clamped to [256 MB, 8192 MB] range to prevent pathological allocations.
New alignment logic: when the requested need exceeds half the chunk size, the function now aligns to 64 MB boundaries instead of the previous 256 MB alignment. This reduces wasted padding for large allocation requests.
The existing 256 MB alignment for need > bytes is preserved as a fallback.

Motivation

On systems with limited unified memory like the Strix Halo, a hardcoded 1792 MB chunk size and unlimited cache can quickly exhaust available memory. These environment variables allow users to tune allocation behavior without recompilation.

Testing

DS4_CUDA_WEIGHT_CACHE_LIMIT_GB=40 limits total weight cache to 40 GB.
DS4_CUDA_WEIGHT_ARENA_CHUNK_MB=512 sets arena chunk size to 512 MB.
Values outside [256, 8192] are silently clamped.
Unset variables retain the original defaults (UINT64_MAX and 1792 MB).


 span chunk aligment fix

3ae966a

@gundemirbas gundemirbas mentioned this pull request

Jun 11, 2026

AMD strix-halo coordinator/worker OOM #387

Open

@yiakwy-xpu-ml-framework-team

yiakwy-xpu-ml-framework-team commented Jun 13, 2026

Copy link

Copy Markdown

We have resolve cache problem in #402

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm runtime: configurable weight cache limit and arena chunk size via environment variables#397

ROCm runtime: configurable weight cache limit and arena chunk size via environment variables #397
gundemirbas wants to merge 1 commit into
antirez:main from
gundemirbas:pr/rocm-runtime-changes

Conversation

@gundemirbas gundemirbas commented Jun 11, 2026 •

edited

Loading

Uh oh!

Summary

Changes

`cuda_model_cache_limit_bytes()`

`cuda_model_arena_chunk_bytes(uint64_t need)`

Motivation

Testing

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

@gundemirbas gundemirbas commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

cuda_model_cache_limit_bytes()

cuda_model_arena_chunk_bytes(uint64_t need)

Motivation

Testing

Uh oh!

yiakwy-xpu-ml-framework-team commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

@gundemirbas gundemirbas commented Jun 11, 2026 •

edited

Loading

`cuda_model_cache_limit_bytes()`

`cuda_model_arena_chunk_bytes(uint64_t need)`