Skip to content

Conversation

@kik-kik
Copy link
Contributor

@kik-kik kik-kik commented Dec 11, 2025

fix(DENG-10248): update shredder to support usage reporting derived

This change is to ensure usage_reporting derived datasets get shredded.

@kik-kik kik-kik self-assigned this Dec 11, 2025
@kik-kik kik-kik requested a review from a team as a code owner December 11, 2025 16:18
@kik-kik kik-kik added the bug Something isn't working label Dec 11, 2025
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually sure this test_config does anything or if these changes are even needed / useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derived tables need to be added to the FakeClient in this file. This should be tested since it's easy to get wrong (as it is now)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just took a closer look at the tests and the setup isn't trivial so I can update the tests if you don't want to do it

field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]),
): sources[table.dataset_id]
for table in glean_derived_tables
if any(field.name == USAGE_PROFILE_ID for field in table.schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event though it's unlikely, I suggest adding a check for client_id as well to ensure it doesn't duplicate any table added above.

Suggested change
if any(field.name == USAGE_PROFILE_ID for field in table.schema)
if any(field.name == USAGE_PROFILE_ID for field in table.schema)
and all(field.name != CLIENT_ID for field in table.schema)

Comment on lines +984 to +985
field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]),
): sources[table.dataset_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would use the deletion request ping. The derived datasets needs to be added to usage_reporting_sources above

Suggested change
field=(USAGE_PROFILE_ID,) * len(sources[table.dataset_id]),
): sources[table.dataset_id]
field=(USAGE_PROFILE_ID,) * len(usage_reporting_sources[table.dataset_id]),
): usage_reporting_sources[table.dataset_id]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The derived tables need to be added to the FakeClient in this file. This should be tested since it's easy to get wrong (as it is now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants