Skip to content

Conversation

@anders-elastisys
Copy link
Contributor

@anders-elastisys anders-elastisys commented Nov 10, 2025

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • [kind/adr](set-me)

Platform Administrator notice

The CpuThrottlingHigh alert has been disabled by default and is instead left as an opt-in alert.

What does this PR do / why do we need this PR?

This PR disables the CPUThrottlingHigh alert by default, as it is already ignored with OpsGenie integration, but not when integrating with other receivers like Slack or Teams.

Information to reviewers

Planning to discuss the disabled option further with GoTos as we might want to have the option to disable other alerts as well. The config is pretty much copy pasted from the upstream prometheus chart in case we ever decide to move to using the upstream chart for some alerts instead of our own.

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change updates CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts required no updates)
    • The metrics names did change (Grafana dashboards and Prometheus alerts required an update)
  • Logs checks:
    • The logs do not show any errors after the change
  • PodSecurityPolicy checks:
    • Any changed Pod is covered by Kubernetes Pod Security Standards
    • Any changed Pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any Pods to be blocked by Pod Security Standards or Policies
  • NetworkPolicy checks:
    • Any changed Pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@anders-elastisys anders-elastisys force-pushed the anders-elastisys/disable-cputhrottling-alert-by-default branch from d966837 to a64a147 Compare November 10, 2025 12:32
thanos: {}
webhook: {}

disabled:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on disabled: true vs enabled: false # true if omitted ?

We have a lot more enabled than disabled in the config so far, so perhaps more consistent to go with enabled, which also feels like less of a double negative to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not a fan of double negative, but I wanted it to the upstream, however, our helmfile config does not need to match this anyways. If people have strong opinions on this I can change it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion, just having been confused by double negatives in the past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants