Skip to content

Conversation

@kristiangronas
Copy link
Contributor

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • [kind/adr](set-me)

What does this PR do / why do we need this PR?

The Kubernetes / API server dashboard still uses this data, originally it was removed due to the rules not being used

The rule was copied from https://github.com/prometheus-operator/kube-prometheus/blob/main/manifests/kubernetesControlPlane-prometheusRule.yaml#L877

Information to reviewers

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change updates CRDs
    • The change updates the config and the schema
  • Documentation checks:
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts required no updates)
    • The metrics names did change (Grafana dashboards and Prometheus alerts required an update)
  • Logs checks:
    • The logs do not show any errors after the change
  • PodSecurityPolicy checks:
    • Any changed Pod is covered by Kubernetes Pod Security Standards
    • Any changed Pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any Pods to be blocked by Pod Security Standards or Policies
  • NetworkPolicy checks:
    • Any changed Pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@kristiangronas kristiangronas requested a review from a team as a code owner September 25, 2025 12:12
Comment on lines +9 to +10
{{- if .Values.defaultRules.recordlabels }}
{{ toYaml .Values.defaultRules.recordlabels | indent 4 }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these to be picked up by our prometheus we need the evaluate_prometheus label, which was not getting added as we have this variable in camel case:

Suggested change
{{- if .Values.defaultRules.recordlabels }}
{{ toYaml .Values.defaultRules.recordlabels | indent 4 }}
{{- if .Values.defaultRules.recordLabels }}
{{ toYaml .Values.defaultRules.recordLabels | indent 4 }}

Copy link
Contributor Author

@kristiangronas kristiangronas Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that explains why it was only working in wc, also i had to keep the apiserver_request_sli_duration_seconds_bucket and apiserver_request_sli_duration_seconds_count metrics for half of the panels, but i'm not too sure if the space usage of that is worth it, what do you think?

Copy link
Contributor

@anders-elastisys anders-elastisys Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you have a look at how many extra time series this adds, and also if you could check how resource usage for Prometheus differs after adding this?
The reason for this being removed in the past is most likely due to us wanting to reduce resource usage for Prometheus, and the dashboard was never removed when doing this change.

If this is something you really want then you could always add templating to be able to toggle these record rules as well as the dashboard, and keep it default false.

@simonklb simonklb requested a review from a team October 2, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants