-
Notifications
You must be signed in to change notification settings - Fork 124
fix(DENG-10248): update shredder to support usage reporting derived #8589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(DENG-10248): update shredder to support usage reporting derived #8589
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not actually sure this test_config does anything or if these changes are even needed / useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The derived tables need to be added to the FakeClient in this file. This should be tested since it's easy to get wrong (as it is now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just took a closer look at the tests and the setup isn't trivial so I can update the tests if you don't want to do it
| field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]), | ||
| ): sources[table.dataset_id] | ||
| for table in glean_derived_tables | ||
| if any(field.name == USAGE_PROFILE_ID for field in table.schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Event though it's unlikely, I suggest adding a check for client_id as well to ensure it doesn't duplicate any table added above.
| if any(field.name == USAGE_PROFILE_ID for field in table.schema) | |
| if any(field.name == USAGE_PROFILE_ID for field in table.schema) | |
| and all(field.name != CLIENT_ID for field in table.schema) |
| field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]), | ||
| ): sources[table.dataset_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would use the deletion request ping. The derived datasets needs to be added to usage_reporting_sources above
| field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]), | |
| ): sources[table.dataset_id] | |
| field=(USAGE_PROFILE_ID,) * len(usage_reporting_sources[table.dataset_id]), | |
| ): usage_reporting_sources[table.dataset_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The derived tables need to be added to the FakeClient in this file. This should be tested since it's easy to get wrong (as it is now)
fix(DENG-10248): update shredder to support usage reporting derived
This change is to ensure usage_reporting derived datasets get shredded.