You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*As an [aggregation buffer](../expressions/BloomFilterAggregate.md#createAggregationBuffer) in [BloomFilterAggregate](../expressions/BloomFilterAggregate.md) expression
6
+
*[BloomFilterAggregate](../expressions/BloomFilterAggregate.md) expression (as an [aggregation buffer](../expressions/BloomFilterAggregate.md#createAggregationBuffer))
*`BloomFilterImpl` is requested to [mightContain](BloomFilterImpl.md#mightContain)
35
-
*`BloomFilterMightContain` is requested to [eval](../expressions/BloomFilterMightContain.md#eval) and [doGenCode](../expressions/BloomFilterMightContain.md#doGenCode)
53
+
*`BloomFilterMightContain` is requested to [evaluate](../expressions/BloomFilterMightContain.md#eval) and [doGenCode](../expressions/BloomFilterMightContain.md#doGenCode)
`create` creates a [BloomFilterImpl](BloomFilterImpl.md) for the given `expectedNumItems`.
68
88
69
-
Unless the false positive probability is given, `create` uses [DEFAULT_FPP](#DEFAULT_FPP) value to [determine the number of bits](#optimalNumOfBits).
89
+
Unless the **False Positive Probability** (`fpp`) is given, `create` uses [DEFAULT_FPP](#DEFAULT_FPP) value to [determine the optimal number of bits](#optimalNumOfBits).
Copy file name to clipboardExpand all lines: docs/bloom-filter-join/BloomFilterImpl.md
+13-8Lines changed: 13 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,20 @@
13
13
14
14
*`BloomFilter` is requested to [create a BloomFilter](BloomFilter.md#create)
15
15
16
-
## <spanid="mightContainLong"> mightContainLong
16
+
## mightContainLong { #mightContainLong }
17
17
18
-
```java
19
-
boolean mightContainLong(
20
-
long item)
21
-
```
18
+
??? note "BloomFilter"
22
19
23
-
`mightContainLong` is part of the [BloomFilter](BloomFilter.md#mightContainLong) abstraction.
20
+
```java
21
+
boolean mightContainLong(
22
+
long item)
23
+
```
24
24
25
-
---
25
+
`mightContainLong` is part of the [BloomFilter](BloomFilter.md#mightContainLong) abstraction.
26
26
27
-
`mightContainLong`...FIXME
27
+
`mightContainLong` uses `Murmur3_x86_32` to generate two hashes of the given `item` with two different seeds: `0` and the hash result of the first hashing.
28
+
29
+
`mightContainLong` requests the [BitArray](#bits) for the number of bits (`bitSize`).
30
+
31
+
In the end, `mightContainLong` checks out if the bit for the hashes (combined) is set (non-zero) in the [BitArray](#bits) up to [numHashFunctions](#numHashFunctions) times.
32
+
With all the bits checked and set, `mightContainLong` is positive. Otherwise, `mightContainLong` is negative.
Copy file name to clipboardExpand all lines: docs/bloom-filter-join/index.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,14 @@
1
1
# Bloom Filter Join
2
2
3
-
**Bloom Filter Join** is an optimization of join queries by pre-filtering one side of a join using a Bloom filter and IN predicate based on the values from the other side of the join.
3
+
**Bloom Filter Join** is an optimization of join queries by pre-filtering one side of a join using [BloomFilter](BloomFilter.md) or `InSubquery` predicate based on the values from the other side of the join.
4
+
5
+
Bloom Filter Join uses [BloomFilter](BloomFilter.md)s as runtime filters when [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled) configuration property is enabled.
6
+
7
+
Bloom Filter Join uses [InjectRuntimeFilter](../logical-optimizations/InjectRuntimeFilter.md) logical optimization to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters ([BloomFilter](BloomFilter.md)s or `InSubquery`s).
4
8
5
9
??? note "SPARK-32268"
6
10
Bloom Filter Join was introduced in [SPARK-32268]({{ spark.jira }}/SPARK-32268).
Copy file name to clipboardExpand all lines: docs/logical-optimizations/InjectRuntimeFilter.md
+13-5Lines changed: 13 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,18 +25,24 @@
25
25
26
26
With [runtimeFilterSemiJoinReductionEnabled](../SQLConf.md#runtimeFilterSemiJoinReductionEnabled) enabled and the new and the initial logical plans not equal, `apply` executes [RewritePredicateSubquery](RewritePredicateSubquery.md) logical optimization with the new logical plan. Otherwise, `apply` returns the new logical plan.
`tryInjectRuntimeFilter`[finds equi-joins](../ExtractEquiJoinKeys.md#unapply) in the given [LogicalPlan](../logical-operators/LogicalPlan.md).
35
+
`tryInjectRuntimeFilter`transforms the given [LogicalPlan](../logical-operators/LogicalPlan.md) with regards to [equi-joins](../ExtractEquiJoinKeys.md#unapply).
36
36
37
-
When _some_ requirements are met, `tryInjectRuntimeFilter`[injectFilter](#injectFilter) on the left side first and on the right side if on the left was not successful.
37
+
For every equi-join, `tryInjectRuntimeFilter`[injects a runtime filter](#injectFilter)(on the left side first and on the right side if on the left was not successful) when all the following requirements are met:
1. A join side has no [DynamicPruningSubquery](#hasDynamicPruningSubquery) filter already
40
+
1. A join side has no [RuntimeFilter](#hasRuntimeFilter)
41
+
1. The left and right keys (pair-wise) are [simple expression](#isSimpleExpression)s
42
+
1.[canPruneLeft](../JoinSelectionHelper.md#canPruneLeft) or [canPruneRight](../JoinSelectionHelper.md#canPruneRight)
43
+
1.[filteringHasBenefit](#filteringHasBenefit)
44
+
45
+
`tryInjectRuntimeFilter` tries to inject up to [spark.sql.optimizer.runtimeFilter.number.threshold](../configuration-properties.md#spark.sql.optimizer.runtimeFilter.number.threshold) filters.
40
46
41
47
## Injecting Filter Operator { #injectFilter }
42
48
@@ -48,7 +54,9 @@ injectFilter(
48
54
filterCreationSidePlan: LogicalPlan):LogicalPlan
49
55
```
50
56
51
-
`injectFilter`...FIXME
57
+
With [spark.sql.optimizer.runtime.bloomFilter.enabled](../configuration-properties.md#spark.sql.optimizer.runtime.bloomFilter.enabled), `injectFilter`[injects a filter using BloomFilter](#injectBloomFilter).
58
+
59
+
Otherwise, `injectFilter`[injects a filter using InSubquery](#injectInSubqueryFilter).
Copy file name to clipboardExpand all lines: docs/spark-sql-Dataset-basic-actions.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,7 @@
1
+
---
2
+
title: Basic Actions
3
+
---
4
+
1
5
# Dataset API — Basic Actions
2
6
3
7
**Basic actions** are a set of operators (_methods_) of the <<spark-sql-dataset-operators.md#, Dataset API>> for transforming a `Dataset` into a session-scoped or global temporary view and _other basic actions_ (FIXME).
Copy file name to clipboardExpand all lines: docs/spark-sql-dataset-operators.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,3 +1,7 @@
1
+
---
2
+
title: Operators
3
+
---
4
+
1
5
# Dataset API — Dataset Operators
2
6
3
7
Dataset API is a [set of operators](#methods) with typed and untyped transformations, and actions to work with a structured query (as a [Dataset](Dataset.md)) as a whole.
0 commit comments