-
Notifications
You must be signed in to change notification settings - Fork 512
Description
What would you like to be added:
Flavor-aware Fair Sharing calculation that accounts for the different values/costs of resource flavors when computing Dominant Resource Share (DRS).
Currently, Fair Sharing aggregates resource usage across all flavors (e.g., all nvidia.com/gpu regardless of whether they're T4 or A100). This treats heterogeneous resources as equivalent, leading to unfair preemption decisions.
One solution: Add optional cost weights to ResourceFlavor spec to represent relative resource value:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: a100-gpu
spec:
nodeLabels:
accelerator: nvidia-tesla-a100
cost: # New field (optional)
nvidia.com/gpu: 8.0 # 8x more valuable than baselineWhen calculating DRS, borrowing would be weighted by cost:
weighted_borrowing = (borrowed_t4 × 1.0) + (borrowed_a100 × 8.0)
Why is this needed:
Organizations with heterogeneous GPU clusters (H100, A100, T4), different CPU generations (performance tiers) face unfair resource allocation:
Example:
Team A: borrows 20 T4 GPUs (cheap, low-power)
Team B: borrows 20 A100 GPUs (expensive, high-power)
Current Fair Sharing sees both teams are equal, while team B is using a lot more resources cost wise, so they should be more preemptable
Completion requirements:
This enhancement requires the following artifacts:
- Design doc
- API change
- Docs update
The artifacts should be linked in subsequent comments.