Skip to content

Conversation

@gianlucam76
Copy link
Member

This PR introduces a dynamic, randomized backoff mechanism for ClusterSummary reconciliations. Currently, when a deployment fails, the controller retries using a static normalRequeueAfter interval.

The new logic observes the consecutiveFailures count across all features (Helm, Resources, Kustomize) and scales the requeue interval accordingly.

This PR introduces a dynamic, randomized backoff mechanism for ClusterSummary
reconciliations. Currently, when a deployment fails, the controller retries
using a static normalRequeueAfter interval.

The new logic observes the consecutiveFailures count across all features
(Helm, Resources, Kustomize) and scales the requeue interval accordingly.
@gianlucam76 gianlucam76 merged commit 627016b into projectsveltos:main Jan 6, 2026
8 checks passed
@gianlucam76 gianlucam76 deleted the deployment-errors branch January 6, 2026 10:22
@vaibhavd21
Copy link

Thank you

@vaibhavd21
Copy link

This is better over MaxConsecutiveFailures

	// The maximum number of consecutive deployment failures that Sveltos will permit.
	// After this many consecutive failures, the deployment will be considered failed, and Sveltos will stop retrying.
	// This setting applies only to feature deployments, not resource removal.
	// This field is optional. If not set, Sveltos default behavior is to keep retrying.
	// +optional
	MaxConsecutiveFailures *uint `json:"maxConsecutiveFailures,omitempty"`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants