Skip to content

Flavor-Aware Fair Sharing for Heterogeneous Resources (Weighted flavours) #8902

@monabil08

Description

@monabil08

What would you like to be added:

Flavor-aware Fair Sharing calculation that accounts for the different values/costs of resource flavors when computing Dominant Resource Share (DRS).

Currently, Fair Sharing aggregates resource usage across all flavors (e.g., all nvidia.com/gpu regardless of whether they're T4 or A100). This treats heterogeneous resources as equivalent, leading to unfair preemption decisions.

One solution: Add optional cost weights to ResourceFlavor spec to represent relative resource value:

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: a100-gpu
spec:
  nodeLabels:
    accelerator: nvidia-tesla-a100
  cost:  # New field (optional)
    nvidia.com/gpu: 8.0  # 8x more valuable than baseline

When calculating DRS, borrowing would be weighted by cost:
weighted_borrowing = (borrowed_t4 × 1.0) + (borrowed_a100 × 8.0)

Why is this needed:
Organizations with heterogeneous GPU clusters (H100, A100, T4), different CPU generations (performance tiers) face unfair resource allocation:

Example:
Team A: borrows 20 T4 GPUs (cheap, low-power)
Team B: borrows 20 A100 GPUs (expensive, high-power)
Current Fair Sharing sees both teams are equal, while team B is using a lot more resources cost wise, so they should be more preemptable

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions