Skip to content

Let soda scan do a "dry run" #2473

@csar398

Description

@csar398

Currently it's possible to examine a scan's queries by adding the --verbose flag to soda scan.

This is good to know what queries were sent, but it would be great if you could also know which queries will be sent without actually executing them against the data source.

I'm thinking about a --dry-run flag for soda scan which would just return the rendered SQL queries. As a user, you could then get a cost estimation with the returned SQL query (see e.g. https://cloud.google.com/bigquery/docs/best-practices-costs#perform-dry-run).

Is this something that could fit in your roadmap?

I've been looking at the soda-core codebase and would be interested in contributing such a feature. At the moment it's not clear to me where to start, since SQL queries are resolved with a succession of steps and are available just ahead of running them against the data source. Then of course the scan logs should also be adapted to account for empty query results with a dry run. So any advice would be great :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions