Skip to content

Conversation

@antonipp
Copy link
Contributor

@antonipp antonipp commented Dec 1, 2025

Description

This PR adds the possibility of enabling profiling. We would like to use the controller to run at a fairly large scale (pools of 1000s-10000s of sandboxes) so it would be great to have extra observability to detect potential bottlenecks.

I also noticed that --metrics-bind-address was not used at all so I fixed that too. If enabled, the pprof endpoints will be exposed on the same port.

Testing

Verified that this worked by running a forked version of the controller and enabling pprof. The profiles were properly collected (in our case by Datadog):
image

@netlify
Copy link

netlify bot commented Dec 1, 2025

Deploy Preview for agent-sandbox canceled.

Name Link
🔨 Latest commit 8751811
🔍 Latest deploy log https://app.netlify.com/projects/agent-sandbox/deploys/6936a65b8009e300084b20ec

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: antonipp
Once this PR has been reviewed and has the lgtm label, please assign janetkuo for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 1, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @antonipp. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 1, 2025
@janetkuo
Copy link
Member

janetkuo commented Dec 2, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Dec 2, 2025
if enablePprof {
setupLog.Info("pprof enabled")
metricsOpts.ExtraHandlers = map[string]http.Handler{
"/debug/pprof/": http.HandlerFunc(pprof.Index),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can enable just pprof profile and rest under a different flag ?

Copy link
Contributor

@barney-s barney-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding pprof support and fixing the metrics address! This will be very helpful for performance analysis.

I have a few suggestions to improve this further, mainly around separating the health probe endpoint and considering the security implications of exposing all pprof endpoints. Please see the detailed comments.

"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.BoolVar(&extensions, "extensions", false, "Enable extensions controllers.")
flag.BoolVar(&enablePprof, "enable-pprof", false, "Enable pprof profiling endpoints on the metrics server.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text could be more descriptive. For example: "Enable profiling via pprof handlers on the metrics endpoint. All pprof endpoints will be exposed under /debug/pprof/."

}
if enablePprof {
setupLog.Info("pprof enabled")
metricsOpts.ExtraHandlers = map[string]http.Handler{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing all pprof endpoints, especially the index, with a single flag in a production environment can be a security concern as it might leak sensitive information. It is better to have separate flags for different profiles or make it clear this is for debugging purposes only.

if enablePprof {
setupLog.Info("pprof enabled")
metricsOpts.ExtraHandlers = map[string]http.Handler{
"/debug/pprof/": http.HandlerFunc(pprof.Index),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pprof.Index handler provides links to all available profiles. If this controller is used in a multi-tenant environment, this could potentially leak information. Consider exposing only specific, non-sensitive profiles like profile and trace if this is intended for production use.

metricsOpts.ExtraHandlers = map[string]http.Handler{
"/debug/pprof/": http.HandlerFunc(pprof.Index),
"/debug/pprof/cmdline": http.HandlerFunc(pprof.Cmdline),
"/debug/pprof/profile": http.HandlerFunc(pprof.Profile),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /debug/pprof/cmdline endpoint exposes the command-line arguments of the process. In a Kubernetes environment, this could leak sensitive information passed as flags. It would be safer to not expose this particular endpoint by default.

"/debug/pprof/": http.HandlerFunc(pprof.Index),
"/debug/pprof/cmdline": http.HandlerFunc(pprof.Cmdline),
"/debug/pprof/profile": http.HandlerFunc(pprof.Profile),
"/debug/pprof/symbol": http.HandlerFunc(pprof.Symbol),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /debug/pprof/symbol endpoint can be used to look up program counters, potentially giving an attacker more information about the binary's structure to aid in exploiting other vulnerabilities. This endpoint is not typically needed for standard profiling and should be disabled unless required for a specific debugging scenario.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 3, 2025
@antonipp
Copy link
Contributor Author

antonipp commented Dec 3, 2025

Ok I split the flags into two: just profile + trace by default and then other endpoints behind a separate flag.

tbh I just initially copied what Cilium had here: https://github.com/cilium/cilium/blob/3f34d20b73548b6fb68df9b3e79198178e2e1939/pkg/pprof/cell.go#L125-L129
but I guess better safe than sorry

Signed-off-by: Anton Ippolitov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants