Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

ROCm runtime: configurable weight cache limit and arena chunk size via environment variables#397

Open
gundemirbas wants to merge 1 commit into
antirez:main from
gundemirbas:pr/rocm-runtime-changes
Open

ROCm runtime: configurable weight cache limit and arena chunk size via environment variables #397
gundemirbas wants to merge 1 commit into
antirez:main from
gundemirbas:pr/rocm-runtime-changes

Conversation

@gundemirbas

@gundemirbas gundemirbas commented Jun 11, 2026
edited
Loading

Copy link
Copy Markdown

Summary

Make cuda_model_cache_limit_bytes() and cuda_model_arena_chunk_bytes() configurable through environment variables, replacing the previous hardcoded defaults. This improves flexibility for systems with constrained VRAM (e.g., Strix Halo with 128 GB unified memory).

Changes

cuda_model_cache_limit_bytes()

  • Added DS4_CUDA_WEIGHT_CACHE_LIMIT_GB environment variable support.
  • When set, the cache limit is value * 1 GiB.
  • Falls back to UINT64_MAX (unlimited) when unset, preserving the original behavior.

cuda_model_arena_chunk_bytes(uint64_t need)

  • Added DS4_CUDA_WEIGHT_ARENA_CHUNK_MB environment variable support.
  • Default remains 1792 MB when unset.
  • Clamped to [256 MB, 8192 MB] range to prevent pathological allocations.
  • New alignment logic: when the requested need exceeds half the chunk size, the function now aligns to 64 MB boundaries instead of the previous 256 MB alignment. This reduces wasted padding for large allocation requests.
  • The existing 256 MB alignment for need > bytes is preserved as a fallback.

Motivation

On systems with limited unified memory like the Strix Halo, a hardcoded 1792 MB chunk size and unlimited cache can quickly exhaust available memory. These environment variables allow users to tune allocation behavior without recompilation.

Testing

  • DS4_CUDA_WEIGHT_CACHE_LIMIT_GB=40 limits total weight cache to 40 GB.
  • DS4_CUDA_WEIGHT_ARENA_CHUNK_MB=512 sets arena chunk size to 512 MB.
  • Values outside [256, 8192] are silently clamped.
  • Unset variables retain the original defaults (UINT64_MAX and 1792 MB).

Copy link
Copy Markdown

We have resolve cache problem in #402

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /