-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add revision number to Version Data #8722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
# Conflicts: # config/dynamicconfig/development-sql.yaml
Shivs11
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did one pass of this PR - i think it's great from an initial scan but i think we should sync up to discuss a few things since if we miss out something, this could be breaking
| if cleanupOldDeletedVersions(tqWorkerDeploymentData) { | ||
| changed = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we could just have this function placed inside of removeDeploymentVersions and call it something like removeAndCleanUpDeploymentVersions. It shall then have the contained logic of deleting old versions, marking the version in the new format as deleted and also Garbage Collecting any versions if present.
| // TODO: improve this logic to remove even more old versions if we're reaching the limit of max versions per task queue. | ||
| // max versions per task queue is not implemented yet but we should have it because as of now a task queue can be polled from | ||
| // numerous deployment versions (accross different deployments) and that leads to a very large user data size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this a bit more - I think we should have a safeguard (dynamic config) for this prior to launching this as a GA. If folks use versioning with their CI/CD systems (as is done by our internal saas team) on the same task-queue, what you have placed in the comment can 100% come to life.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will follow up with a PR to clean up more TQs when reaching a limit
| // If the routing config changed after applying this operation. Compared base on revision number. | ||
| bool routing_config_changed = 2; | ||
| // Deprecated. using this is not totaly safe in case of retries. | ||
| bool routing_config_changed = 2 [deprecated = true]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has confused me a little....might be worth discussing on call.
| // Tracks the version of the deployment version workflow when a particular run of a workflow starts base on the dynamic config of the | ||
| // worker who completes the first task of the workflow. `workflowVersion` remains the same until the workflow CaNs when it | ||
| // will get another chance to pick the latest manager version. | ||
| workflowVersion DeploymentWorkflowVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
// workflowVersion is set at workflow start based on the dynamic config of the worker
// that completes the first task. It remains constant for the lifetime of the run and
// only updates when the workflow performs continue-as-new.
| // Still need to check RoutingConfig not being nil because of edge cases during enabling dynamic config. | ||
| // i.e. the deployment workflow might run old version and not send the routing config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as my understanding goes, you have placed this check since you are worried that the deployment workflow and the version workflow may not be in full sync when it comes to the version of the code they are running.
| // syncRegisteredTaskQueueAsync syncs the routing config and version data to the new task queue. | ||
| // This method does not increment version data revision number. | ||
| // Note: task queue registration does not affect WorkerDeploymentInfo.RoutingConfigUpdateState hence we do not signal | ||
| // the deployment workflow about propagation completion. | ||
| // TODO: should RoutingConfigUpdateState become IN_PROGRESS while registration is propagating? Current thoughts: It | ||
| // doesn't seem useful because the following setCurrent/Ramping will set the status to IN_PROGRESS. The only case is if | ||
| // user is calling setCurrent/Ramping before registration passing IgnoreMissingTaskQueues, but in that case still the | ||
| // registration is likely not in control of the operator but is async all the way from the worker side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets discuss this on a call; have some concerns/thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW - I think we should be a bit conservative here and have the routing config state to be in progress. I think convoluting different API calls further with an in-progress propagation is more error prone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought more about this and I think we should do in the case that version is already ramping or current (but not generally). Updated my comment. I think it's fine to do it later since it's a very small edge case.
| // Ensure CurrentDeploymentVersion is populated from deprecated field if needed for backward compatibility | ||
| currentDeploymentVersion := d.State.RoutingConfig.CurrentDeploymentVersion | ||
| if currentDeploymentVersion == nil { | ||
| currentDeploymentVersion = worker_versioning.ExternalWorkerDeploymentVersionFromStringV31(d.State.RoutingConfig.CurrentVersion) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just make our life simpler by always populating both values while the workflow CAN's. It's simpler since then, in all our API handlers, we can just focus on using the new value and not worry about workflow versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will follow up in another PR
What changed?
Add revision number to Worker Deployment Version Data that is synced to Task Queues.
Why?
This allows making task queue registration also async, but more importantly, it prevent's race conditions between concurrent registrations, setCurrent/Ramping, drainage, and deletion.
How did you test it?
Potential risks
None.