-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ROCm runtime: configurable weight cache limit and arena chunk size via environment variables#397
Open
gundemirbas wants to merge 1 commit into
Open
ROCm runtime: configurable weight cache limit and arena chunk size via environment variables #397gundemirbas wants to merge 1 commit into
gundemirbas wants to merge 1 commit into
Conversation
yiakwy-xpu-ml-framework-team
commented
Jun 13, 2026
We have resolve cache problem in #402
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
Summary
Make
cuda_model_cache_limit_bytes()andcuda_model_arena_chunk_bytes()configurable through environment variables, replacing the previous hardcoded defaults. This improves flexibility for systems with constrained VRAM (e.g., Strix Halo with 128 GB unified memory).Changes
cuda_model_cache_limit_bytes()DS4_CUDA_WEIGHT_CACHE_LIMIT_GBenvironment variable support.value * 1 GiB.UINT64_MAX(unlimited) when unset, preserving the original behavior.cuda_model_arena_chunk_bytes(uint64_t need)DS4_CUDA_WEIGHT_ARENA_CHUNK_MBenvironment variable support.needexceeds half the chunk size, the function now aligns to 64 MB boundaries instead of the previous 256 MB alignment. This reduces wasted padding for large allocation requests.need > bytesis preserved as a fallback.Motivation
On systems with limited unified memory like the Strix Halo, a hardcoded 1792 MB chunk size and unlimited cache can quickly exhaust available memory. These environment variables allow users to tune allocation behavior without recompilation.
Testing
DS4_CUDA_WEIGHT_CACHE_LIMIT_GB=40limits total weight cache to 40 GB.DS4_CUDA_WEIGHT_ARENA_CHUNK_MB=512sets arena chunk size to 512 MB.UINT64_MAXand 1792 MB).