Storage-Partitioned Joins (supported in Iceberg, not in Delta)

jaceklaskowski · jaceklaskowski · commit 6e70f76e278f · 2024-05-26T14:54:00.000+02:00
diff --git a/docs/SQLConf.md b/docs/SQLConf.md
@@ -1053,6 +1053,14 @@ Used when `CacheManager` is requested to [cache a structured query](CacheManager
 
 Used when [Aggregation](execution-planning-strategies/Aggregation.md) execution planning strategy is executed (and uses `AggUtils` to [create an aggregation physical operator](aggregations/AggUtils.md#createAggregate)).
 
+## <span id="V2_BUCKETING_PARTIALLY_CLUSTERED_DISTRIBUTION_ENABLED"> v2BucketingPartiallyClusteredDistributionEnabled { #v2BucketingPartiallyClusteredDistributionEnabled }
+
+[spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled](configuration-properties.md#spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled)
+
+## <span id="V2_BUCKETING_PUSH_PART_VALUES_ENABLED"> v2BucketingPushPartValuesEnabled { #v2BucketingPushPartValuesEnabled }
+
+[spark.sql.sources.v2.bucketing.pushPartValues.enabled](configuration-properties.md#spark.sql.sources.v2.bucketing.pushPartValues.enabled)
+
 ## <span id="VARIABLE_SUBSTITUTE_ENABLED"><span id="variableSubstituteEnabled"><span id="spark.sql.variable.substitute"> variableSubstituteEnabled
 
 [spark.sql.variable.substitute](configuration-properties.md#spark.sql.variable.substitute)
diff --git a/docs/configuration-properties.md b/docs/configuration-properties.md
@@ -1282,7 +1282,7 @@ Used when:
 
 **spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled**
 
-During a Storage-Partitioned Join, whether to allow input partitions to be partially clustered, when both sides of the join are of `KeyGroupedPartitioning`.
+During a [Storage-Partitioned Join](storage-partitioned-joins/index.md), whether to allow input partitions to be partially clustered, when both sides of the join are of [KeyGroupedPartitioning](connector/KeyGroupedPartitioning.md).
 
 Default: `false`
 
@@ -1292,6 +1292,14 @@ This is an optimization on skew join and can help to reduce data skewness when c
 
 Requires both [spark.sql.sources.v2.bucketing.enabled](#spark.sql.sources.v2.bucketing.enabled) and [spark.sql.sources.v2.bucketing.pushPartValues.enabled](#spark.sql.sources.v2.bucketing.pushPartValues.enabled) to be enabled
 
+Use [SQLConf.v2BucketingPartiallyClusteredDistributionEnabled](SQLConf.md#v2BucketingPartiallyClusteredDistributionEnabled) for the current value
+
+Used when:
+
+* `BatchScanExec` physical operator is requested for the [input RDD](physical-operators/BatchScanExec.md#inputRDD)
+* `DataSourceV2ScanExecBase` physical operator is requested for [groupPartitions](physical-operators/DataSourceV2ScanExecBase.md#groupPartitions)
+* [EnsureRequirements](physical-optimizations/EnsureRequirements.md) physical optimization is executed (to [checkKeyGroupCompatible](physical-optimizations/EnsureRequirements.md#checkKeyGroupCompatible))
+
 ### <span id="V2_BUCKETING_PUSH_PART_VALUES_ENABLED"> v2.bucketing.pushPartValues.enabled { #spark.sql.sources.v2.bucketing.pushPartValues.enabled }
 
 **spark.sql.sources.v2.bucketing.pushPartValues.enabled**
@@ -1303,6 +1311,14 @@ Default: `false`
 When enabled, if both sides of a join are of `KeyGroupedPartitioning` and if they share compatible partition keys, even if they don't have the exact same partition values, Spark will calculate a superset of partition values and pushdown that info to scan nodes, which will use empty partitions for the missing partition values on either side.
 This could help to eliminate unnecessary shuffles.
 
+Use [SQLConf.v2BucketingPushPartValuesEnabled](SQLConf.md#v2BucketingPushPartValuesEnabled) for the current value
+
+Used when:
+
+* `DataSourceV2ScanExecBase` physical operator is requested to [groupPartitions](physical-operators/DataSourceV2ScanExecBase.md#groupPartitions)
+* `BatchScanExec` physical operator is requested for the [inputRDD](physical-operators/BatchScanExec.md#inputRDD)
+* `EnsureRequirements` physical optimization is requested to [checkKeyGroupCompatible](physical-optimizations/EnsureRequirements.md#checkKeyGroupCompatible)
+
 ## <span id="spark.sql.objectHashAggregate.sortBased.fallbackThreshold"> spark.sql.objectHashAggregate.sortBased.fallbackThreshold
 
 **(internal)** The number of entires in an in-memory hash map (to store aggregation buffers per grouping keys) before [ObjectHashAggregateExec](physical-operators/ObjectHashAggregateExec.md) ([ObjectAggregationIterator](aggregations/ObjectAggregationIterator.md#processInputs), precisely) falls back to sort-based aggregation
diff --git a/docs/physical-operators/SortMergeJoinExec.md b/docs/physical-operators/SortMergeJoinExec.md
@@ -1,3 +1,7 @@
+---
+title: SortMergeJoinExec
+---
+
 # SortMergeJoinExec Physical Operator
 
 `SortMergeJoinExec` is a [shuffle-based join physical operator](ShuffledJoin.md) for [sort-merge join](#doExecute) (with the [left join keys](#leftKeys) being [orderable](../expressions/RowOrdering.md#isorderable)).
diff --git a/docs/physical-optimizations/EnsureRequirements.md b/docs/physical-optimizations/EnsureRequirements.md
@@ -83,6 +83,28 @@ ensureDistributionAndOrdering(
 
 `ensureDistributionAndOrdering` is...FIXME
 
+### checkKeyGroupCompatible { #checkKeyGroupCompatible }
+
+```scala
+checkKeyGroupCompatible(
+  left: SparkPlan,
+  right: SparkPlan,
+  joinType: JoinType,
+  requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]]
+checkKeyGroupCompatible(
+  parent: SparkPlan,
+  left: SparkPlan,
+  right: SparkPlan,
+  requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]] // (1)!
+```
+
+1. Uses `JoinType` of either [SortMergeJoinExec](../physical-operators/SortMergeJoinExec.md) or [ShuffledHashJoinExec](../physical-operators/ShuffledHashJoinExec.md) physical operator
+
+!!! note
+    Only [SortMergeJoinExec](../physical-operators/SortMergeJoinExec.md) and [ShuffledHashJoinExec](../physical-operators/ShuffledHashJoinExec.md) physical operators are considered.
+
+`checkKeyGroupCompatible`...FIXME
+
 ## OptimizeSkewedJoin { #OptimizeSkewedJoin }
 
 `EnsureRequirements` is used to create a [OptimizeSkewedJoin](OptimizeSkewedJoin.md) physical optimization.
diff --git a/docs/storage-partitioned-joins/index.md b/docs/storage-partitioned-joins/index.md
@@ -1,12 +1,39 @@
 # Storage-Partitioned Joins
 
-**Storage-Partitioned Joins** (_SPJ_) are a new type of [join](../joins.md) in Spark SQL that use the existing storage layout for a partitioned join to avoid expensive shuffles (similarly to [Bucketing](../bucketing/index.md)).
+**Storage-Partitioned Join** (_SPJ_) is a new type of [join](../joins.md) in Spark SQL that uses the existing storage layout for a partitioned join to avoid expensive shuffles (similarly to [Bucketing](../bucketing/index.md)).
 
 !!! note
     Storage-Partitioned Joins feature was added in Apache Spark 3.3.0 ([\[SPARK-37375\] Umbrella: Storage Partitioned Join (SPJ)]({{ spark.jira }}/SPARK-37375)).
 
-Storage-Partitioned Join is meant mainly, if not exclusively, for [Spark SQL connectors](../connector/index.md) (_v2 data sources_).
+Storage-Partitioned Join is based on [KeyGroupedPartitioning](../connector/KeyGroupedPartitioning.md) to determine partitions.
+
+Out of the available built-in [DataSourceV2ScanExecBase](../physical-operators/DataSourceV2ScanExecBase.md) physical operators, only [BatchScanExec](../physical-operators/BatchScanExec.md) supports storage-partitioned joins.
+
+Storage-Partitioned Join is meant for [Spark SQL connectors](../connector/index.md) (yet there are none built-in at the moment).
 
 Storage-Partitioned Join was proposed in this [SPIP](https://docs.google.com/document/d/1foTkDSM91VxKgkEcBMsuAvEjNybjja-uHk-r3vtXWFE).
 
-Storage-Partitioned Join uses [KeyGroupedPartitioning](../connector/KeyGroupedPartitioning.md) to determine partitions.
+!!! note
+    It [appears](../physical-optimizations/EnsureRequirements.md#checkKeyGroupCompatible) that [SortMergeJoinExec](../physical-operators/SortMergeJoinExec.md) and [ShuffledHashJoinExec](../physical-operators/ShuffledHashJoinExec.md) physical operator are the only candidates for Storage-Partitioned Joins.
+
+## Configuration Properties
+
+* [spark.sql.sources.v2.bucketing.enabled](../configuration-properties.md#spark.sql.sources.v2.bucketing.enabled)
+* [spark.sql.sources.v2.bucketing.pushPartValues.enabled](../configuration-properties.md#spark.sql.sources.v2.bucketing.pushPartValues.enabled)
+* [spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled](../configuration-properties.md#spark.sql.sources.v2.bucketing.partiallyClusteredDistribution.enabled)
+
+## Apache Iceberg
+
+Storage-Partitioned Join is supported in [Apache Iceberg 1.2.0](https://iceberg.apache.org/releases/#121-release):
+
+> Added support for storage partition joins to improve read and write performance ([#6371](https://github.com/apache/iceberg/pull/6371))
+
+## Delta Lake
+
+Storage-Partitioned Join is not supported in Delta Lake yet (as per [this feature request](https://github.com/delta-io/delta/issues/1698)).
+
+## Learn More
+
+1. [What's new in Apache Spark 3.3 - joins](https://www.waitingforcode.com/apache-spark-sql/what-new-apache-spark-3.3-joins/read) by Bartosz Konieczny
+1. (video) [Storage-Partitioned Join for Apache Spark](https://youtu.be/ioLeHZDMSuU)
+1. (video) [Eliminating Shuffles in Delete Update, and Merge](https://youtu.be/AIZjy6_K0ws)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -163,7 +163,6 @@ nav:
     - ... | bloom-filter-join/**.md
     - ... | bucketing/**.md
     - ... | cache-serialization/**.md
-    - ... | storage-partitioned-joins/**.md
     - Catalog Plugin API:
       - connector/catalog/index.md
       - CatalogExtension: connector/catalog/CatalogExtension.md
@@ -224,10 +223,11 @@ nav:
     - Partition File Metadata Caching:
       - partition-file-metadata-caching/index.md
     # FIXME Rename to spark-connect?
-    - ... | connect/**.md
     - ... | runtime-filtering/**.md
+    - ... | connect/**.md
     - ... | thrift-server/**.md
-    - Statistics: new-and-noteworthy/statistics.md
+    - new-and-noteworthy/statistics.md
+    - ... | storage-partitioned-joins/**.md
     - ... | subexpression-elimination/**.md
     - ... | subqueries/**.md
     - ... | table-valued-functions/**.md