-
Notifications
You must be signed in to change notification settings - Fork 267
Description
Describe the bug
Service account in kube-system, retina-agent, seemingly has no permissions to list Retina's own CRs defined as being in retina.sh namespace.
EDIT Below, retina-agent-init is actually running the operator image instead of the init. My bad! The apigroups are, nonetheless, wrong.
Here's an excerpt from retina-agent-init container of the retina-agent-* pod (it never reaches the retina-agent container):
[EDIT] This is from the retina-operator image, by accident. The bug still applies.
E1118 23:39:21.428138 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.MetricsConfiguration: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"metricsconfigurations\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:22.001828 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1118 23:39:22.001924 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:24.764710 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1118 23:39:24.765201 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1118 23:39:26.796528 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "metricsconfigurations" in API group "retina.sh" at the cluster scope
E1118 23:39:26.796687 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.MetricsConfiguration: failed to list *v1alpha1.MetricsConfiguration: metricsconfigurations.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"metricsconfigurations\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1118 23:39:26.887540 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1118 23:39:26.888200 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
This error is similar to #1122, but actually unrelated.
Examining the relevant ClusterRole object, retina-cluster-reader, I can see that in multiple places it seems to reference retina.io which is gone since pull request #26 which replaced retina.io with retina.sh in most places:
retina/deploy/hubble/manifests/controller/helm/retina/templates/agent/clusterrole.yaml
Lines 17 to 24 in f525540
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - retinaendpoints | |
| verbs: | |
| - get | |
| - list | |
| - watch |
retina/deploy/hubble/manifests/controller/helm/retina/templates/agent/clusterrole.yaml
Lines 45 to 82 in f525540
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - retinaendpoints | |
| verbs: | |
| - create | |
| - delete | |
| - get | |
| - list | |
| - patch | |
| - update | |
| - watch | |
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - metricsconfigurations | |
| verbs: | |
| - create | |
| - delete | |
| - get | |
| - list | |
| - patch | |
| - update | |
| - watch | |
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - retinaendpoints/finalizers | |
| verbs: | |
| - update | |
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - retinaendpoints/status | |
| verbs: | |
| - get | |
| - patch | |
| - update |
This is also happening in the operator ClusterRole object, retina-operator-role:
retina/deploy/hubble/manifests/controller/helm/retina/templates/operator/clusterrole.yaml
Lines 54 to 79 in f2da04b
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - captures | |
| verbs: | |
| - create | |
| - delete | |
| - get | |
| - list | |
| - patch | |
| - update | |
| - watch | |
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - captures/finalizers | |
| verbs: | |
| - update | |
| - apiGroups: | |
| - retina.io | |
| resources: | |
| - captures/status | |
| verbs: | |
| - get | |
| - patch | |
| - update |
- Is the stack mostly launched without use of helm charts created from this repo?
- I've followed the documentation which suggests charts pushed into GHCR are meant to be used to deploy Retina, both hubble and non-hubble variants
- https://retina.sh/docs/Installation/Config
- https://retina.sh/docs/Installation/Setup
The 'standard' (non-hubble) helm deployment seems to define RBACs with the correct API group:
retina/deploy/standard/manifests/controller/helm/retina/templates/rbac.yaml
Lines 20 to 27 in 37d1566
| - apiGroups: | |
| - retina.sh | |
| resources: | |
| - retinaendpoints | |
| verbs: | |
| - get | |
| - list | |
| - watch |
retina/deploy/standard/manifests/controller/helm/retina/templates/rbac.yaml
Lines 38 to 75 in 37d1566
| - apiGroups: | |
| - retina.sh | |
| resources: | |
| - retinaendpoints | |
| verbs: | |
| - create | |
| - delete | |
| - get | |
| - list | |
| - patch | |
| - update | |
| - watch | |
| - apiGroups: | |
| - retina.sh | |
| resources: | |
| - metricsconfigurations | |
| verbs: | |
| - create | |
| - delete | |
| - get | |
| - list | |
| - patch | |
| - update | |
| - watch | |
| - apiGroups: | |
| - retina.sh | |
| resources: | |
| - retinaendpoints/finalizers | |
| verbs: | |
| - update | |
| - apiGroups: | |
| - retina.sh | |
| resources: | |
| - retinaendpoints/status | |
| verbs: | |
| - get | |
| - patch | |
| - update |
Even after addressing the issue in the agent role, there's still missing rules in retina-agent-init:
[EDIT] This is from the retina-operator image, by accident. The bug still applies.
W1119 00:09:25.596718 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1119 00:09:25.597944 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1119 00:09:25.601708 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1119 00:09:25.602039 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1119 00:09:26.899665 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1119 00:09:26.899927 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1119 00:09:26.914116 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1119 00:09:26.914710 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1119 00:09:30.043144 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1119 00:09:30.043616 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1119 00:09:30.049414 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1119 00:09:30.050010 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
W1119 00:09:33.326252 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "jobs" in API group "batch" at the cluster scope
E1119 00:09:33.326494 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1.Job: failed to list *v1.Job: jobs.batch is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"jobs\" in API group \"batch\" at the cluster scope" logger="UnhandledError"
W1119 00:09:33.636222 1 reflector.go:569] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User "system:serviceaccount:kube-system:retina-agent" cannot list resource "captures" in API group "retina.sh" at the cluster scope
E1119 00:09:33.636389 1 reflector.go:166] "Unhandled Error" err="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:251: Failed to watch *v1alpha1.Capture: failed to list *v1alpha1.Capture: captures.retina.sh is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"captures\" in API group \"retina.sh\" at the cluster scope" logger="UnhandledError"
Standard deployment is also not listing this for retina-cluster-reader so it's likely also broken right now, unless this is not enabled on standard deployment:
- apigroup=batch/v1, resource=jobs, op=list (presumably also get, watch)
- apigroup=retina.sh/v1alpha1, resource=capture, op=list (presumably also get, watch)
I've added this:
- apiGroups:
- batch
resources:
- jobs
verbs:
- get
- watch
- listand updated this:
- apiGroups:
- retina.sh
resources:
- retinaendpoints
- captures
verbs:
- get
- list
- watchI have not observed errors in operator's logs due to wrong ClusterRole apigroup (retina.io); the pod seems to be reporting healthy. I may have missed something though.
To Reproduce
Steps to reproduce the behavior:
- Install helmchart oci://ghcr.io/microsoft/retina/charts/retina-hubble v0.0.33-dev-rc1 (declared in https://api.github.com/repos/microsoft/retina/releases/latest)
- Use
kubectl logsto observe error in the pod - Use
kubectl edit clusterrole retina-cluster-reader -n kube-systemandkubectl edit clusterrole retina-operator-role -n kube-systemto see incorrect apigroup used
Expected behavior
A clear and concise description of what you expected to happen.
The matching apigroup is used between CRDs, and the apigroups used in the operator and agent, and the clusterroles granting agent and operator rights to perform changes to the cluster.
Screenshots
If applicable, add screenshots to help explain your problem.
n/a
Platform (please complete the following information):
- OS: Debian 12
- Kubernetes Version: 1.33
- Host: local k3s
- Retina Version: v0.0.33-dev-rc1 (helmchart version declared in https://api.github.com/repos/microsoft/retina/releases/latest)
Additional context
Add any other context about the problem here.
This also breaks hubble-relay since it can't talk to the agent.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status