-
Notifications
You must be signed in to change notification settings - Fork 124
DSRE-1779 Add FxA monitoring query #6511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9433224 to
f39c372
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
yessss. Where is the setting for how often it runs? |
It's configured to run in bqetl_accounts_derived DAG, which is scheduled to run daily. DAG generation magic will make sure that this runs after MySQL tables are synced to BQ (expand |
| Note: because its source tables are overwritten daily, query used to | ||
| populate this table is not idempotent so this table should not be backfilled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is, if someone re-runs a historical DAG run then historical data would be incorrectly overwritten with current data (which is highly probable since other normal daily-incremental ETLs exist in the bqetl_accounts_derived DAG).
I'd suggest using BigQuery's time travel feature in this ETL to select the data as it existed at data_interval_end so that any DAG run within the last ~7 days could potentially be backfilled, but trying to backfill further than time travel allows will fail outright rather than incorrectly overwriting data.
Here's an example of an ETL using time travel like that (though this doesn't need to be a script, and can remain a normal query ETL; a script was needed in that case because of the discrepancy between the monthly scheduling and the date partitioning setup).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for this idea!
| ( | ||
| SELECT | ||
| "accounts_linked_to_google" AS table_name, | ||
| COUNT(uid) AS total_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where uid is null in this table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely, IIUC this query has been running for a while in a custom environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run it by hand a few times but not "awhile". A NULL there would be a bug in the system. There shouldn't be any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the only reason to use COUNT(expression) rather than COUNT(*) is if the expression might be null and shouldn't be counted in that case.
Or if it was intended to count distinct uid values then it should be COUNT(DISTINCT uid).
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
8aa5367 to
c98a15f
Compare
This comment has been minimized.
This comment has been minimized.
c98a15f to
ba2bd7c
Compare
This comment has been minimized.
This comment has been minimized.
sql/moz-fx-data-shared-prod/accounts_backend_derived/monitoring_db_counts_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/accounts_backend_derived/monitoring_db_counts_v1/query.sql
Outdated
Show resolved
Hide resolved
| SELECT | ||
| "accounts_with_secondary_emails" AS table_name, | ||
| COUNT( | ||
| DISTINCT `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1`.uid | ||
| ) AS total_rows | ||
| FROM | ||
| `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1` FOR SYSTEM_TIME AS OF TIMESTAMP( | ||
| @as_of_date, | ||
| 'UTC' | ||
| ) | ||
| JOIN | ||
| `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1` FOR SYSTEM_TIME AS OF TIMESTAMP( | ||
| @as_of_date, | ||
| 'UTC' | ||
| ) | ||
| ON `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1`.uid = `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1`.uid | ||
| WHERE | ||
| `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1`.isPrimary = FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using table aliases would make this less verbose/more readable.
| SELECT | |
| "accounts_with_secondary_emails" AS table_name, | |
| COUNT( | |
| DISTINCT `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1`.uid | |
| ) AS total_rows | |
| FROM | |
| `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1` FOR SYSTEM_TIME AS OF TIMESTAMP( | |
| @as_of_date, | |
| 'UTC' | |
| ) | |
| JOIN | |
| `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1` FOR SYSTEM_TIME AS OF TIMESTAMP( | |
| @as_of_date, | |
| 'UTC' | |
| ) | |
| ON `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1`.uid = `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1`.uid | |
| WHERE | |
| `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1`.isPrimary = FALSE | |
| SELECT | |
| "accounts_with_secondary_emails" AS table_name, | |
| COUNT(DISTINCT fxa_accounts_v1.uid) AS total_rows | |
| FROM | |
| `moz-fx-data-shared-prod.accounts_db_external.fxa_accounts_v1` AS fxa_accounts_v1 FOR SYSTEM_TIME AS OF TIMESTAMP( | |
| @as_of_date, | |
| 'UTC' | |
| ) | |
| JOIN | |
| `moz-fx-data-shared-prod.accounts_db_external.fxa_emails_v1` AS fxa_emails_v1 FOR SYSTEM_TIME AS OF TIMESTAMP( | |
| @as_of_date, | |
| 'UTC' | |
| ) | |
| ON fxa_accounts_v1.uid = fxa_emails_v1.uid | |
| WHERE | |
| fxa_emails_v1.isPrimary = FALSE |
(same goes for the similar subquery below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would unfortunately fail on formatting. Sqlgot seems not to be able to handle table aliases combined with another expression following them (FOR SYSTEM_TIME here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang. I'd suggest reporting that SQLGlot issue, as they're very responsive and have fixed every issue I've reported to them (often the same day on main, to be included in the next release).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is if you don't quote the whole table identifier (e.g. `moz-fx-data-shared-prod`.accounts_db_external.fxa_accounts_v1) then BigQuery allows you to refer to just the final table name segment, like an automatic alias. (though I do generally prefer to quote the entire table identifier)
ba2bd7c to
cb9478e
Compare
This comment has been minimized.
This comment has been minimized.
…g_db_counts_v1/metadata.yaml Co-authored-by: Sean Rose <[email protected]>
This comment has been minimized.
This comment has been minimized.
…g_db_counts_v1/query.sql Co-authored-by: Sean Rose <[email protected]>
This comment has been minimized.
This comment has been minimized.
| COUNT(*) AS total_rows | ||
| FROM | ||
| `moz-fx-data-shared-prod.accounts_db_external.fxa_account_groups_v1` FOR SYSTEM_TIME AS OF TIMESTAMP( | ||
| @as_of_date, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the other time travel expressions in this query should also use @as_of_date + 1 like the first one does, otherwise the data will be inconsistent.
Integration report for "s/@as_of_date,/@as_of_date+1,/g"
|
|
Thanks for landing. I see data in the table. I created mozilla/lookml-generator#1116 to be able to access it in Looker |
Supersedes mozilla/docker-etl#295
/cc @clouserw