-
Notifications
You must be signed in to change notification settings - Fork 281
chore: Add Iceberg TPC-H benchmarking scripts #3294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add scripts to benchmark TPC-H queries against Iceberg tables using Comet's native iceberg-rust integration: - create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tables - tpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tables - comet-tpch-iceberg.sh: Shell script to run the benchmark with Comet Also updates README.md with Iceberg benchmarking documentation. Co-Authored-By: Claude Opus 4.5 <[email protected]>
…h.py Merge tpcbench-iceberg.py into tpcbench.py using mutually exclusive args: - --data for Parquet files - --catalog/--database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@mbutrovich I can now run TPC-H w/ Iceberg native scan locally |
Resolve conflict in tpcbench.py by combining: - Upstream: --format and --options for multiple file formats - Branch: --catalog and --database for Iceberg tables Co-Authored-By: Claude Opus 4.5 <[email protected]>
dev/benchmarks/README.md
Outdated
|
|
||
| $SPARK_HOME/bin/spark-submit \ | ||
| --master $SPARK_MASTER \ | ||
| --jars $ICEBERG_JAR \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should work either way, but this doesn't match the usage in create-iceberg-tpch.py. There we use --packages, here we're defining the jar. Both should work, I think, but maybe best to be consistent.
dev/benchmarks/README.md
Outdated
| --conf spark.cores.max=8 \ | ||
| --conf spark.executor.memory=16g \ | ||
| --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ | ||
| --conf spark.sql.catalog.local.type=hadoop \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcodes the catalog. Above you have ICEBERG_CATALOG=${ICEBERG_CATALOG:-local}. I'd be consistent.
dev/benchmarks/README.md
Outdated
| --conf spark.sql.catalog.local.warehouse=$ICEBERG_WAREHOUSE \ | ||
| create-iceberg-tpch.py \ | ||
| --parquet-path $TPCH_DATA \ | ||
| --catalog local \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same hardcoded catalog.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3294 +/- ##
============================================
+ Coverage 56.12% 59.95% +3.82%
- Complexity 976 1473 +497
============================================
Files 119 175 +56
Lines 11743 16167 +4424
Branches 2251 2682 +431
============================================
+ Hits 6591 9693 +3102
- Misses 4012 5126 +1114
- Partials 1140 1348 +208 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
- Use --packages instead of --jars for table creation to match create-iceberg-tpch.py usage - Use $ICEBERG_CATALOG variable instead of hardcoding 'local' in spark.sql.catalog config to be consistent with comet-tpch-iceberg.sh - Clarify that JAR download is only needed for benchmark execution Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Thanks @mbutrovich! I pushed a commit to address the feedback. |
Summary
create-iceberg-tpch.py: Convert Parquet TPC-H data to Iceberg tablestpcbench-iceberg.py: Run TPC-H queries against Iceberg catalog tablescomet-tpch-iceberg.sh: Shell script to run the benchmark with CometTest plan
create-iceberg-tpch.pyto create Iceberg tables from Parquet datacomet-tpch-iceberg.shand verifyCometIcebergNativeScanExecappears in plans🤖 Generated with Claude Code