You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### <spanid="numBitsExpression"> Number of Bits Expression
29
+
### Number of Bits Expression { #numBitsExpression }
30
30
31
31
`BloomFilterAggregate` can be given **Number of Bits** (as an [Expression](Expression.md)) when [created](#creating-instance).
32
32
@@ -36,7 +36,7 @@ The maximum value for the number of bits is [spark.sql.optimizer.runtime.bloomFi
36
36
37
37
The number of bits expression is the [third](#third) expression (in this `TernaryLike` tree node).
38
38
39
-
## <spanid="numBits"> Number of Bits
39
+
## Number of Bits { #numBits }
40
40
41
41
```scala
42
42
numBits:Long
@@ -76,4 +76,22 @@ The `numBits` value [must be a positive value](#checkInputDataTypes).
76
76
77
77
`eval` is part of the [TypedImperativeAggregate](TypedImperativeAggregate.md#eval) abstraction.
78
78
79
-
`eval`...FIXME
79
+
`eval`[serializes](#serialize) the given `buffer` (unless the [cardinality](../bloom-filter-join/BloomFilter.md#cardinality) of this `BloomFilter` is `0` and `eval` returns `null`).
80
+
81
+
??? note "FIXME Why does `eval` return `null`?"
82
+
83
+
## Serializing Aggregate Buffer { #serialize }
84
+
85
+
??? note "TypedImperativeAggregate"
86
+
87
+
```scala
88
+
serialize(
89
+
obj: BloomFilter): Array[Byte]
90
+
```
91
+
92
+
`serialize` is part of the [TypedImperativeAggregate](TypedImperativeAggregate.md#serialize) abstraction.
93
+
94
+
??? note "Two `serialize`s"
95
+
There is another `serialize` (in `BloomFilterAggregate` companion object) that just makes unit testing easier.
`serializeAggregateBufferInPlace` is a procedure (returns `Unit`) so _whatever happens inside, stays inside_ (paraphrasing the [former advertising slogan of Las Vegas, Nevada](https://idioms.thefreedictionary.com/what+happens+in+Vegas+stays+in+Vegas)).
184
+
185
+
`serializeAggregateBufferInPlace`[gets the aggregate buffer](#getBufferObject) from the given `buffer` and [serializes it](#serialize).
186
+
187
+
In the end, `serializeAggregateBufferInPlace` stores the serialized aggregate buffer back to the given `buffer` at [mutableAggBufferOffset](ImperativeAggregate.md#mutableAggBufferOffset).
188
+
189
+
---
190
+
191
+
`serializeAggregateBufferInPlace` is used when:
192
+
193
+
*`AggregatingAccumulator` is requested to [withBufferSerialized](../AggregatingAccumulator.md#withBufferSerialized)
194
+
*`AggregationIterator` is requested to [generateResultProjection](../aggregations/AggregationIterator.md#generateResultProjection)
195
+
*`ObjectAggregationMap` is requested to [dumpToExternalSorter](../aggregations/ObjectAggregationMap.md#dumpToExternalSorter)
Copy file name to clipboardExpand all lines: docs/logical-optimizations/InjectRuntimeFilter.md
+73-1Lines changed: 73 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,9 @@
4
4
5
5
`InjectRuntimeFilter` is part of [InjectRuntimeFilter](../SparkOptimizer.md#InjectRuntimeFilter) fixed-point batch of rules.
6
6
7
+
!!! note "Runtime Filter"
8
+
**Runtime Filter** can be a [BloomFilter](#hasBloomFilter) (with [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) enabled) or [InSubquery](#hasInSubquery) filter.
9
+
7
10
!!! note "Noop"
8
11
`InjectRuntimeFilter` is a _noop_ (and does nothing) for the following cases:
9
12
@@ -107,7 +110,30 @@ injectInSubqueryFilter(
107
110
filterCreationSidePlan: LogicalPlan):LogicalPlan
108
111
```
109
112
110
-
`injectInSubqueryFilter`...FIXME
113
+
!!! note "The same `DataType`s"
114
+
`injectInSubqueryFilter` requires that the [DataType](../expressions/Expression.md#dataType)s of the given `filterApplicationSideExp` and `filterCreationSideExp` are the same.
115
+
116
+
`injectInSubqueryFilter` creates an [Aggregate](../logical-operators/Aggregate.md) logical operator with the following:
117
+
118
+
Property | Value
119
+
---------|------
120
+
[Grouping Expressions](../logical-operators/Aggregate.md#groupingExpressions) | The given `filterCreationSideExp` expression
121
+
[Aggregate Expressions](../logical-operators/Aggregate.md#aggregateExpressions) | An `Alias` expression for the `filterCreationSideExp` expression (possibly [mayWrapWithHash](#mayWrapWithHash))
122
+
[Child Logical Operator](../logical-operators/Aggregate.md#child) | The given `filterCreationSidePlan` expression
123
+
124
+
`injectInSubqueryFilter` executes [ColumnPruning](../logical-optimizations/ColumnPruning.md) logical optimization on the `Aggregate` logical operator.
125
+
126
+
Unless the `Aggregate` logical operator [canBroadcastBySize](../JoinSelectionHelper.md#canBroadcastBySize), `injectInSubqueryFilter` returns the given `filterApplicationSidePlan` logical plan (and basically throws away all the work so far).
127
+
128
+
!!! note
129
+
`injectInSubqueryFilter` skips the `InSubquery` filter if the size of the `Aggregate` is beyond [broadcast join threshold](../JoinSelectionHelper.md#canBroadcastBySize) and the semi-join will be a shuffle join, which is not worthwhile.
130
+
131
+
`injectInSubqueryFilter` creates an `InSubquery` logical operator with the following:
132
+
133
+
* The given `filterApplicationSideExp` (possibly [mayWrapWithHash](#mayWrapWithHash))
134
+
*[ListQuery](../expressions/ListQuery.md) expression with the `Aggregate`
135
+
136
+
In the end, `injectInSubqueryFilter` creates a `Filter` logical operator with the `InSubquery` logical operator and the given `filterApplicationSidePlan` expression.
111
137
112
138
!!! note
113
139
`injectInSubqueryFilter` is used when `InjectRuntimeFilter` is requested to [injectFilter](#injectFilter) with [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration properties disabled (unlike [spark.sql.optimizer.runtimeFilter.semiJoinReduction.enabled](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.semiJoinReduction.enabled)).
`hasDynamicPruningSubquery` checks if there is a `Filter` logical operator with a [DynamicPruningSubquery](../expressions/DynamicPruningSubquery.md) expression on the `left` or `right` side (of a join).
169
+
170
+
## hasRuntimeFilter { #hasRuntimeFilter }
171
+
172
+
```scala
173
+
hasRuntimeFilter(
174
+
left: LogicalPlan,
175
+
right: LogicalPlan,
176
+
leftKey: Expression,
177
+
rightKey: Expression):Boolean
178
+
```
179
+
180
+
`hasRuntimeFilter` checks if there is [hasBloomFilter](#hasBloomFilter) (with [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) enabled) or [hasInSubquery](#hasInSubquery) filter on the `left` or `right` side (of a join).
181
+
182
+
## hasBloomFilter { #hasBloomFilter }
183
+
184
+
```scala
185
+
hasBloomFilter(
186
+
left: LogicalPlan,
187
+
right: LogicalPlan,
188
+
leftKey: Expression,
189
+
rightKey: Expression):Boolean
190
+
```
191
+
192
+
`hasBloomFilter` checks if there is [findBloomFilterWithExp](#findBloomFilterWithExp) on the `left` or `right` side (of a join).
`findBloomFilterWithExp` tries to find a `Filter` logical operator with a [BloomFilterMightContain](../expressions/BloomFilterMightContain.md) expression (and `XxHash64`) among the nodes of the given [LogicalPlan](../logical-operators/LogicalPlan.md).
0 commit comments