Skip to content

Conversation

@xing-yang
Copy link
Contributor

@xing-yang xing-yang commented Dec 10, 2025

What this PR does / why we need it:

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Testing done:
WCP pre-check pipeline: https://jenkins-vcf-csifvt.devops.broadcom.net/job/wcp-instapp-e2e-pre-checkin/729/ (failed for unrelated reasons. This PR only made changes in pvCSI)
VKS pre-check pipeline: https://jenkins-vcf-csifvt.devops.broadcom.net/view/instapp/job/vks-instapp-e2e-pre-checkin/671/ (passed)

Manual testing:
Positive testing:

In VKS cluster:
Create a PVC.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get pvc -n test-ns
NAME       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS                VOLUMEATTRIBUTESCLASS   AGE
test-pvc   Bound    pvc-e24bd3f7-3da7-409a-a6cc-1d06c192ddad   50Gi       RWO            wcpglobal-storage-profile   <unset>                 6s

Create a Pod writing data to the PVC.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get pod -n test-ns
NAME          READY   STATUS    RESTARTS   AGE
data-writer   1/1     Running   0          5m25s

Create a VolumeSnapshot.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl create -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/test-snapshot created

root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get volumesnapshot -n test-ns
NAME            READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                SNAPSHOTCONTENT                                    CREATIONTIME   AGE
test-snapshot   false        test-pvc                                          volumesnapshotclass-delete   snapcontent-20870e28-c873-42a5-a609-ec91362efa91                  9s

root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get volumesnapshot -n test-ns
NAME            READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                SNAPSHOTCONTENT                                    CREATIONTIME   AGE
test-snapshot   true         test-pvc                            50Gi          volumesnapshotclass-delete   snapcontent-20870e28-c873-42a5-a609-ec91362efa91   5s             16s

Delete VolumeSnapshot.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl delete -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io "test-snapshot" deleted
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get vs -n test-ns
No resources found in test-ns namespace.

Negative testing:

Scale down vSphere CSI Controller in Supervisor to simulate a timeout.

VKS
Create a VolumeSnapshot. It is pending.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl create -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io/test-snapshot created
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get vs -n test-ns
NAME            READYTOUSE   SOURCEPVC   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                SNAPSHOTCONTENT                                    CREATIONTIME   AGE
test-snapshot   false        test-pvc                                          volumesnapshotclass-delete   snapcontent-4ea3ce83-94c1-40a3-a423-4f7a89861300                  3s

Supervisor
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get vs -n test-vks
NAME                                                                        READYTOUSE   SOURCEPVC                                                                   SOURCESNAPSHOTCONTENT   RESTORESIZE   SNAPSHOTCLASS                SNAPSHOTCONTENT                                    CREATIONTIME   AGE
16947ac0-efa3-4641-86a6-b96c78dc8164-4ea3ce83-94c1-40a3-a423-4f7a89861300   false        16947ac0-efa3-4641-86a6-b96c78dc8164-e24bd3f7-3da7-409a-a6cc-1d06c192ddad                                         volumesnapshotclass-delete   snapcontent-b27d8275-329d-4456-8b94-c78446fc52c0                  11s

VKS
Wait until timeout error shows up in pvCSI logs.
{"level":"error","time":"2025-12-11T21:46:49.384508262Z","caller":"common/common_controller_helper.go:332","msg":"unable to fetch volumesnapshot \"test-vks\"/\"16947ac0-efa3-4641-86a6-b96c78dc8164-4ea3ce83-94c1-40a3-a423-4f7a89861300\" from supervisor cluster with err: client rate limiter Wait returned an error: context deadline exceeded"......

{"level":"error","time":"2025-12-11T21:46:49.407895132Z","caller":"wcpguest/controller.go:1649","msg":"volumesnapshot: 16947ac0-efa3-4641-86a6-b96c78dc8164-4ea3ce83-94c1-40a3-a423-4f7a89861300 on namespace: test-vks in supervisor cluster was not Ready. Error: volumesnapshot 16947ac0-efa3-4641-86a6-b96c78dc8164-4ea3ce83-94c1-40a3-a423-4f7a89861300 in namespace test-vks not in ReadyToUse within 60 seconds.....

Scale up vSphere CSI Controller in Supervisor.

VKS
Delete VolumeSnapshot.
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl delete -f snapshot.yaml
volumesnapshot.snapshot.storage.k8s.io "test-snapshot" deleted
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get vs -n test-ns
No resources found in test-ns namespace.

Supervisor
root@4202083ec3827361bf3d46cab573c369 [ ~ ]# kubectl get vs -n test-vks
No resources found in test-vks namespace.

Special notes for your reviewer:

Release note:

Set error code to indicate timeout in pvCSI to avoid leaking volume snapshots.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 10, 2025
@xing-yang xing-yang changed the title WIP: Change error code for create snapshot to a non-final error WIP: Change error code for create snapshot in pvCSI to a non-final error Dec 10, 2025
@deepakkinni
Copy link
Collaborator

Triggering CSI-TKG Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #674

@deepakkinni
Copy link
Collaborator

Triggering CSI-WCP Pre-checkin Pipeline for this PR... Job takes approximately an hour to complete
Jenkins Build #734

@xing-yang xing-yang changed the title WIP: Change error code for create snapshot in pvCSI to a non-final error Change error code for create snapshot in pvCSI to a non-final error Dec 12, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants