-
Notifications
You must be signed in to change notification settings - Fork 431
WIP: Updated NodeFeatureRules for newer chips #1973
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Zvonko Kaiser <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR consolidates GPU node labeling rules by replacing specific H100/H800 GPU model rules with broader family-based rules for Hopper and Blackwell architectures. The changes use PCI ID range matching via regex patterns instead of individual device IDs, and extend Confidential Computing (CC) capability support to include both Hopper and Blackwell GPU families.
Key changes:
- Consolidated 5 specific Hopper-based rules (H100, H100 PCIe, H100 80GB HBM3, H800, H800 PCIE) into a single "NVIDIA Hopper GPU" rule using regex pattern for PCI ID range 0x2300-0x23ff
- Added new "NVIDIA Blackwell GPU" rule covering PCI ID range 0x2b00-0x33ff
- Updated CC capability rules to recognize both "hopper" and "blackwell" GPU families with TDX/SEV-SNP support
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| matchExpressions: | ||
| vendor: {op: In, value: ["10de"]} | ||
| device: {op: In, value: ["2322"]} | ||
| # GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions specific chip models "GB202-GB207, GB110-GB120" but the regex pattern ^(2[b-f]|3[0-3])[0-9a-f]{2}$ matches a much broader range (0x2b00-0x33ff). This includes values that don't correspond to the listed chip models. Consider either:
- Making the comment more generic (e.g., "Blackwell family (0x2b00-0x33ff)")
- Narrowing the regex pattern to match only the specific chip ranges if that's the intent
The current discrepancy between the comment and the pattern could be confusing for future maintainers.
| # GB202-GB207, GB110-GB120 (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de | |
| # Blackwell family (0x2b00-0x33ff) from https://admin.pci-ids.ucw.cz/read/PC/10de |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recently tweaked the NodeFeatureRules to work with a B200 (should be GB100, iirc) cluster we received, and the device ID is 0x2901, which wouldn't be matched by the config in this PR. any chance this can be changed?
All Hopper and Hopper+ architectures support CC.