Skip to content

Conversation

@ArkaSaha30
Copy link
Contributor

@ArkaSaha30 ArkaSaha30 commented Oct 23, 2025

This PR will add AutoProvider certificate manager as a part of generating default self-signed certificates:

  • interface functions
  • reconciler logic
  • e2e tests
etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get pods
NAME                   READY   STATUS    RESTARTS   AGE
etcdcluster-sample-0   1/1     Running   0          34m
etcdcluster-sample-1   1/1     Running   0          34m
etcdcluster-sample-2   1/1     Running   0          33m


etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe pod etcdcluster-sample-0  
Name:             etcdcluster-sample-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             kind-control-plane/172.18.0.2
Start Time:       Mon, 27 Oct 2025 23:47:43 +0530
Labels:           app=etcdcluster-sample
                  apps.kubernetes.io/pod-index=0
                  controller=etcdcluster-sample
                  controller-revision-hash=etcdcluster-sample-754c4cfbcd
                  statefulset.kubernetes.io/pod-name=etcdcluster-sample-0
Annotations:      <none>
Status:           Running
IP:               10.244.0.9
IPs:
  IP:           10.244.0.9
Controlled By:  StatefulSet/etcdcluster-sample
Containers:
  etcd:
    Container ID:  containerd://faa47a05746374496ff1b3d0147a93efc4d72bc07572e68183df1df9decdcdbf
    Image:         gcr.io/etcd-development/etcd:v3.5.21
    Image ID:      gcr.io/etcd-development/etcd@sha256:fd158fbe55240e252947bbd2e8dddc217997ff43978071fac2bd202b6ad15c03
    Ports:         2379/TCP, 2380/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /usr/local/bin/etcd
    Args:
      --name=$(POD_NAME)
      --listen-peer-urls=http://0.0.0.0:2380
      --listen-client-urls=http://0.0.0.0:2379
      --initial-advertise-peer-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2380
      --advertise-client-urls=http://$(POD_NAME).etcdcluster-sample.$(POD_NAMESPACE).svc.cluster.local:2379
    State:          Running
      Started:      Mon, 27 Oct 2025 23:47:52 +0530
    Ready:          True
    Restart Count:  0
    Environment Variables from:
      etcdcluster-sample-state  ConfigMap  Optional: false
    Environment:
      POD_NAME:       etcdcluster-sample-0 (v1:metadata.name)
      POD_NAMESPACE:  default (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kwksc (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:
  server-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-server-tls
    Optional:    false
  peer-secret:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcdcluster-sample-peer-tls
    Optional:    false
  kube-api-access-kwksc:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  34m   default-scheduler  Successfully assigned default/etcdcluster-sample-0 to kind-control-plane
  Normal  Pulling    34m   kubelet            Pulling image "gcr.io/etcd-development/etcd:v3.5.21"
  Normal  Pulled     34m   kubelet            Successfully pulled image "gcr.io/etcd-development/etcd:v3.5.21" in 8.646s (8.646s including waiting). Image size: 20653564 bytes.
  Normal  Created    34m   kubelet            Created container: etcd
  Normal  Started    34m   kubelet            Started container etcd

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k get secrets
NAME                            TYPE                DATA   AGE
etcdcluster-sample-client-tls   kubernetes.io/tls   3      35m
etcdcluster-sample-peer-tls     kubernetes.io/tls   3      35m
etcdcluster-sample-server-tls   kubernetes.io/tls   3      35m

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-client-tls
Name:         etcdcluster-sample-client-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-peer-tls  
Name:         etcdcluster-sample-peer-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
tls.crt:  774 bytes
tls.key:  365 bytes
ca.crt:   774 bytes

etcd-operator on  auto-cert-provider [$?] via 🐳 colima via 🐹 v1.25.3 on ☁️  (us-east-1) on ☁️  [email protected] 
❯ k describe secret etcdcluster-sample-server-tls
Name:         etcdcluster-sample-server-tls
Namespace:    default
Labels:       <none>
Annotations:  <none>

Type:  kubernetes.io/tls

Data
====
ca.crt:   769 bytes
tls.crt:  769 bytes
tls.key:  365 bytes

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 3 times, most recently from f819caf to 82651bc Compare October 27, 2025 18:17
@ArkaSaha30 ArkaSaha30 changed the title [WIP] Add AutoProvider interface functions Add AutoProvider interface functions Oct 27, 2025
@ArkaSaha30 ArkaSaha30 requested review from ahrtr and ivanvc October 27, 2025 18:49
@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

@abdurrehman107 @ballista01 @ivanvc @hakman PTAL.

Reviewing this PR is also an opportunity to get familiar with the certificate design & implementation.

Reference: https://docs.google.com/document/d/1k8os77Hqxe36nUaZ31YqBGBuXcgdozVPppJYOfwUDFA/edit?tab=t.0

@ahrtr
Copy link
Member

ahrtr commented Oct 28, 2025

cc @frederiko

@ArkaSaha30 ArkaSaha30 changed the title Add AutoProvider interface functions Add AutoProvider certificate manager Oct 28, 2025
return nil, fmt.Errorf("failed to get certificate: %w", err)
}

return &interfaces.Config{}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get the info from the certificate included in the secret.

Copy link
Contributor Author

@ArkaSaha30 ArkaSaha30 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ahrtr , I explored getting the original config from the secret itself but it will be a complicated one since we are manipulating the config and passing it to transport.SelfCert() which in turn creats the certificate.
Another workaround is to retrieve the certificateConfig from the etcdCluster object via the certificateSecret.ownerRef.Name, but it will cause a circular dependency

What can be the suggested solution here?
cc @ivanvc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it will be a complicated one

Where is the complication? I don't see a PoC so far, nor a reasonable explanation.

The easiest solution is to just save the config passed to EnsureCertificateSecret, and return it in GetCertificateConfig. The problem is that it isn't the final real configuration that we used to create the certificate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have updated the GetCertificateConfig to return the saved user-defined config

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, Arka. I left some observations, comments, and suggestions.

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 4 times, most recently from 9d90c9e to e2c11fd Compare October 30, 2025 09:56
@ArkaSaha30
Copy link
Contributor Author

/test pull-etcd-operator-test-e2e

@ivanvc
Copy link
Member

ivanvc commented Nov 9, 2025

@ArkaSaha30, where do we stand in this pull request? Is this ready for another review?

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 3 times, most recently from 6027cab to ccf14e3 Compare November 30, 2025 16:10
@ArkaSaha30 ArkaSaha30 requested a review from ivanvc November 30, 2025 16:11
Copy link

@neolit123 neolit123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates
generally LGTM, just a few minor comments from me

autoConfig := ec.Spec.TLS.ProviderCfg.AutoCfg
duration, err := time.ParseDuration(autoConfig.ValidityDuration)
if err != nil {
log.Printf("Failed to parse ValidityDuration: %s", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it return an error here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently handling the scenario in auto/provider.go - https://github.com/ArkaSaha30/etcd-operator/blob/auto-cert-provider/pkg/certificate/auto/provider.go#L217-L220
Currently, it will log the error but fall back to the default duration of 1yr and let the user bring up the etcdCluster.
Please let me know if this is not the desired behaviour, and I will update it to throw an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return an error here, we only use the default value if not provided. Pls raise a separate ticket to track this.


var userAutoConfig *interfaces.Config

func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever the "name, namespace" pattern is used, an object key can be used. This ensures consistency. See github.com/kubernetes-sigs/controller-runtime/blob/main/pkg/client/example_test.go for example.

Suggested change
func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretKey ctrlruntimeclient.ObjectKey

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is an interface change, I will change and refactor it for both certManager and auto certificate provider in a separate follow-up PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, Pls raise a separate ticket to track this change

return interfaces.ErrTLSCert
}

decodeCertificatePem, _ := pem.Decode(certificateData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not need an err check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already checking the error in the next line.
pem.Decode returns (p *Block, rest []byte) - pointer to decoded PEM block and the remaining bytes after the block.
if decodeCertificatePem == nil checks if the decoding failed and returns error ErrDecodeCert


// parsePrivateKey parses the private key from the PEM-encoded data.
func parsePrivateKey(privateKeyData []byte) (crypto.PrivateKey, error) {
block, _ := pem.Decode(privateKeyData)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pem.Decode error handling style: #227 (comment)

@ahrtr
Copy link
Member

ahrtr commented Dec 2, 2025

@ArkaSaha30 pls feedback after you resolve all the review comments, thx

@ArkaSaha30
Copy link
Contributor Author

@ArkaSaha30 pls feedback after you resolve all the review comments, thx

Sure @ahrtr , I'll address all the review comments this week.

@ArkaSaha30 ArkaSaha30 force-pushed the auto-cert-provider branch 2 times, most recently from d8aab27 to 6513f37 Compare December 7, 2025 18:43
This commit will add the initial scaffolding for AutoProvider.

Signed-off-by: ArkaSaha30 <[email protected]>
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with two followups.

Thanks @ArkaSaha30 for the great work!


var userAutoConfig *interfaces.Config

func (ac *Provider) EnsureCertificateSecret(ctx context.Context, secretName, namespace string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, Pls raise a separate ticket to track this change

autoConfig := ec.Spec.TLS.ProviderCfg.AutoCfg
duration, err := time.ParseDuration(autoConfig.ValidityDuration)
if err != nil {
log.Printf("Failed to parse ValidityDuration: %s", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return an error here, we only use the default value if not provided. Pls raise a separate ticket to track this.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ArkaSaha30

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

}
}

var userAutoConfig *interfaces.Config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define a global variable? Please add it as a field in Provider struct (line 34 above).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls also resolve this in a followup PR, thx

@ahrtr ahrtr merged commit 0be3efb into etcd-io:main Dec 9, 2025
6 checks passed
Comment on lines +122 to +124
if parseCert.NotAfter.Before(time.Now()) {
return interfaces.ErrCertExpired
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a good idea also to check parseCert.NotBefore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Adding it in the refactor issue: #251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants