Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection#3573

Open
peterschmidt85 wants to merge 2 commits intomaster from
resources-gpu-default
Open

Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection #3573
peterschmidt85 wants to merge 2 commits intomaster from
resources-gpu-default

Conversation

@peterschmidt85
Copy link
Contributor

@peterschmidt85 peterschmidt85 commented Feb 13, 2026
edited
Loading

Motivation

Previously, ResourcesSpec.gpu defaulted to None, making GPU the only resource without an explicit default (CPU defaults to 2.., memory to 8GB.., disk to 100GB..). This caused the GPU field to be hidden in the CLI plan output. Additionally, the NVIDIA vendor default was silently applied deep in the validation code regardless of whether the user set a custom image.

Changes

  • Set ResourcesSpec.gpu default to GPUSpec(count=0..) instead of None, aligning GPU with other resource defaults
  • Default GPU vendor to nvidia only when using the default CUDA image (no custom image or docker). With a custom image, any vendor is allowed
  • Server-side vendor inference via set_gpu_vendor_default() with full backward compatibility across old/new CLI and server combinations

Before / After

# Before (no GPU, no image)
Resources cpu=2.. mem=8GB.. disk=100GB..
# After
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu=nvidia:0..
# Before (gpu: 1, no image)
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu:1
# After
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu=nvidia:1
# Before (gpu: 1, custom image)
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu:1
# After - any vendor allowed since user provides their own image
Resources cpu=2.. mem=8GB.. disk=100GB.. gpu=1

Backward compatibility

Tested across all CLI/server combinations (new CLI 0.21-dev, old CLI 0.20.9, new server, old server on sky.dstack.ai). GCP backend used for tests (has both NVIDIA and TPU offers).

dstack apply

Row Config CLI Server Display TPU included?
1 No GPU, no image New New gpu=nvidia:0.. No
2 No GPU, custom image New New gpu=0.. Yes
3 No GPU, docker=true New New gpu=0.. Yes
4 gpu: 1, no image New New gpu=nvidia:1 No
5 gpu: 1, custom image New New gpu=1 Yes
6 gpu: 1, docker=true New New gpu=1 Yes
7 gpu: A100 New New gpu=A100:1.. No
8 gpu: MI300X, no image New New Error: image required N/A
9 gpu: MI300X, image New New gpu=MI300X:1.. No
10 gpu: 1..4, no image New New gpu=nvidia:1..4 No
11 gpu: nvidia:1 New New gpu=nvidia:1 No
12 gpu: amd:1, image New New gpu=amd:1 No
13 No GPU, no image Old 0.20.9 New (no gpu shown) Yes
14 gpu: 1, no image Old 0.20.9 New gpu:1 No
15 gpu: 1, custom image Old 0.20.9 New gpu:1 No
16 gpu: A100 Old 0.20.9 New A100:1.. No
17 No GPU, no image New Old (sky) gpu=0.. Yes
18 gpu: 1, no image New Old (sky) gpu=1 No
19 gpu: 1, custom image New Old (sky) gpu=1 No

dstack offer

Row Config CLI Server Display TPU included?
20 --gpu 1 New New gpu=1 Yes
21 --gpu nvidia:1 New New gpu=nvidia:1 No
22 --gpu tpu:1 New New gpu=google:1 Yes (TPU only)
23 --gpu 1 New Old (sky) gpu=1 Yes
24 default Old 0.20.9 New (no gpu shown) Yes
25 --gpu 1 Old 0.20.9 New gpu:1 No

Key observations

  • No regressions in any old CLI + old server combination
  • Row 15: old CLI always sets vendor=nvidia regardless of image (pre-existing behavior)
  • Row 19: old server always infers nvidia regardless of image (pre-existing behavior)

- Default to NVIDIA only if user has no image
- Keep backward compatibility with old/new server/CLI
- Make `dstack offer` consistent with `dstack apply`
Co-authored-by: Cursor <cursoragent@cursor.com>
@peterschmidt85 peterschmidt85 changed the title (削除) Set explicit GPU default (0..) in ResourcesSpec and minor improvements in resource pretty-printing (削除ここまで) (追記) Set explicit GPU defaults in ResourcesSpec and improve default GPU vendor selection (追記ここまで) Feb 13, 2026
Copy link
Contributor Author

TODOs

Docs updates:

  • Update gpu property description in concepts pages (dev-environments.md, tasks.md, services.md) and protips.md to reflect:
    • No gpu specified → defaults to 0.. (any vendor)
    • No count specified → defaults to 1..
    • Vendor defaults to nvidia only when no custom image is set; with custom image, any vendor is allowed
  • Update reference schema docs accordingly

Future scope (outside this PR):

  • Consider making TPU an explicit opt-in for GCP backend, since TPU requires specific setup and silently offering TPU instances may lead to unexpected behavior

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@r4victor r4victor Awaiting requested review from r4victor

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /