Skip to content

Commit d932d98

Browse files
FileTable
1 parent 21c8bc2 commit d932d98

File tree

9 files changed

+109
-48
lines changed

9 files changed

+109
-48
lines changed

docs/connector/SupportsRead.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Used when:
2323

2424
## Implementations
2525

26-
* [FileTable](FileTable.md)
26+
* [FileTable](../datasources/FileTable.md)
2727
* `JDBCTable`
2828
* [KafkaTable](../kafka/KafkaTable.md)
2929
* `MemoryStreamTable` ([Spark Structured Streaming]({{ book.structured_streaming }}/datasources/memory))

docs/connector/SupportsWrite.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Used when:
2424
## Implementations
2525

2626
* ConsoleTable (Spark Structured Streaming)
27-
* [FileTable](FileTable.md)
27+
* [FileTable](../datasources/FileTable.md)
2828
* ForeachWriterTable (Spark Structured Streaming)
2929
* [KafkaTable](../kafka/KafkaTable.md)
3030
* MemorySink (Spark Structured Streaming)

docs/connector/Table.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Used when:
7878
## Implementations
7979

8080
* `ConsoleTable` (Spark Structured Streaming)
81-
* [FileTable](FileTable.md)
81+
* [FileTable](../datasources/FileTable.md)
8282
* `ForeachWriterTable` (Spark Structured Streaming)
8383
* [KafkaTable](../kafka/KafkaTable.md)
8484
* `MemorySink` (Spark Structured Streaming)

docs/datasources/FileIndex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ Used when:
5959
* `DataSource` is requested to [getOrInferFileFormatSchema](../DataSource.md#getOrInferFileFormatSchema) and [resolve a FileFormat-based relation](../DataSource.md#resolveRelation)
6060
* `FallBackFileSourceV2` logical resolution rule is executed
6161
* [FileScanBuilder](FileScanBuilder.md) is created
62-
* `FileTable` is requested for [dataSchema](../connector/FileTable.md#dataSchema) and [partitioning](../connector/FileTable.md#partitioning)
62+
* `FileTable` is requested for [dataSchema](FileTable.md#dataSchema) and [partitioning](FileTable.md#partitioning)
6363

6464
### <span id="refresh"> Refreshing Cached File Listings
6565

Lines changed: 54 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
11
# FileTable
22

3-
`FileTable` is an [extension](#contract) of the [Table](Table.md) abstraction for [file-backed tables](#implementations) with support for [read](SupportsRead.md) and [write](SupportsWrite.md).
3+
`FileTable` is an [extension](#contract) of the [Table](../connector/Table.md) abstraction for [file-based tables](#implementations) with support for [read](../connector/SupportsRead.md) and [write](../connector/SupportsWrite.md).
44

55
## Contract
66

7-
### <span id="fallbackFileFormat"> fallbackFileFormat
7+
### <span id="fallbackFileFormat"> Fallback FileFormat
88

99
```scala
1010
fallbackFileFormat: Class[_ <: FileFormat]
1111
```
1212

13-
Fallback V1 [FileFormat](../datasources/FileFormat.md)
13+
Fallback V1 [FileFormat](FileFormat.md)
1414

1515
Used when `FallBackFileSourceV2` extended resolution rule is executed (to resolve an `InsertIntoStatement` with a [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md) with a `FileTable`)
1616

17-
### <span id="formatName"> formatName
17+
### <span id="formatName"> Format Name
1818

1919
```scala
2020
formatName: String
2121
```
2222

2323
Name of the file table (_format_)
2424

25-
### <span id="inferSchema"> inferSchema
25+
### <span id="inferSchema"> Schema Inference
2626

2727
```scala
2828
inferSchema(
@@ -53,7 +53,7 @@ Default: All [DataType](../types/DataType.md)s are supported by default
5353
* `CSVTable`
5454
* `JsonTable`
5555
* `OrcTable`
56-
* [ParquetTable](../datasources/parquet/ParquetTable.md)
56+
* [ParquetTable](parquet/ParquetTable.md)
5757
* `TextTable`
5858

5959
## Creating Instance
@@ -73,15 +73,17 @@ Default: All [DataType](../types/DataType.md)s are supported by default
7373
capabilities: java.util.Set[TableCapability]
7474
```
7575

76-
`capabilities` are the following [TableCapabilities](TableCapability.md):
76+
`capabilities` is part of the [Table](../connector/Table.md#capabilities) abstraction.
7777

78-
* [BATCH_READ](TableCapability.md#BATCH_READ)
79-
* [BATCH_WRITE](TableCapability.md#BATCH_WRITE)
80-
* [TRUNCATE](TableCapability.md#TRUNCATE)
78+
---
8179

82-
`capabilities` is part of the [Table](Table.md#capabilities) abstraction.
80+
`capabilities` are the following [TableCapabilities](../connector/TableCapability.md):
8381

84-
## <span id="dataSchema"> dataSchema
82+
* [BATCH_READ](../connector/TableCapability.md#BATCH_READ)
83+
* [BATCH_WRITE](../connector/TableCapability.md#BATCH_WRITE)
84+
* [TRUNCATE](../connector/TableCapability.md#TRUNCATE)
85+
86+
## <span id="dataSchema"> Data Schema
8587

8688
```scala
8789
dataSchema: StructType
@@ -92,47 +94,71 @@ dataSchema: StructType
9294
??? note "Lazy Value"
9395
`dataSchema` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and cached afterwards.
9496

97+
---
98+
9599
`dataSchema` is used when:
96100

97101
* `FileTable` is requested for a [schema](#schema)
98102
* _others_ (in [FileTables](#implementations))
99103

100-
## fileIndex
104+
## <span id="partitioning"> Partitioning
101105

102106
```scala
103-
fileIndex: PartitioningAwareFileIndex
107+
partitioning: Array[Transform]
104108
```
105109

106-
`fileIndex`...FIXME
110+
`partitioning` is part of the [Table](../connector/Table.md#partitioning) abstraction.
111+
112+
---
107113

108-
`fileIndex` is used when...FIXME
114+
`partitioning`...FIXME
109115

110-
## partitioning
116+
## <span id="properties"> Properties
111117

112118
```scala
113-
partitioning: Array[Transform]
119+
properties: util.Map[String, String]
114120
```
115121

116-
`partitioning`...FIXME
122+
`properties` is part of the [Table](../connector/Table.md#properties) abstraction.
117123

118-
`partitioning` is part of the [Table](Table.md#partitioning) abstraction.
124+
---
119125

120-
## properties
126+
`properties` returns the [options](#options).
127+
128+
## <span id="schema"> Table Schema
121129

122130
```scala
123-
properties: util.Map[String, String]
131+
schema: StructType
124132
```
125133

126-
`properties` is simply the [options](#options).
134+
`schema` is part of the [Table](../connector/Table.md#schema) abstraction.
127135

128-
`properties` is part of the [Table](Table.md#properties) abstraction.
136+
---
129137

130-
## <span id="schema"> schema
138+
`schema`...FIXME
139+
140+
## <span id="fileIndex"> PartitioningAwareFileIndex
131141

132142
```scala
133-
schema: StructType
143+
fileIndex: PartitioningAwareFileIndex
134144
```
135145

136-
`schema`...FIXME
146+
??? note "Lazy Value"
147+
`fileIndex` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
148+
149+
Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy).
150+
151+
`fileIndex` creates one of the following [PartitioningAwareFileIndex](PartitioningAwareFileIndex.md)s:
152+
153+
* `MetadataLogFileIndex` when reading from the results of a streaming query
154+
* [InMemoryFileIndex](InMemoryFileIndex.md)
155+
156+
---
157+
158+
`fileIndex` is used when:
137159

138-
`schema` is part of the [Table](Table.md#schema) abstraction.
160+
* [FileTable](FileTable.md#implementations)s are requested for [FileScanBuilder](FileScanBuilder.md#fileIndex)s
161+
* `Dataset` is requested for the [inputFiles](../Dataset.md#inputFiles)
162+
* `CacheManager` is requested to [lookupAndRefresh](../CacheManager.md#lookupAndRefresh)
163+
* `FallBackFileSourceV2` is created
164+
* `FileTable` is requested to [dataSchema](#dataSchema), [schema](#schema), [partitioning](#partitioning)

docs/datasources/InMemoryFileIndex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ While being created, `InMemoryFileIndex` [refresh0](#refresh0).
2121
* `HiveMetastoreCatalog` is requested to [inferIfNeeded](../hive/HiveMetastoreCatalog.md#inferIfNeeded)
2222
* `CatalogFileIndex` is requested for the [partitions by the given predicate expressions](CatalogFileIndex.md#filterPartitions) for a non-partitioned Hive table
2323
* `DataSource` is requested to [createInMemoryFileIndex](../DataSource.md#createInMemoryFileIndex)
24-
* `FileTable` is requested for a [PartitioningAwareFileIndex](../connector/FileTable.md#fileIndex)
24+
* `FileTable` is requested for a [PartitioningAwareFileIndex](FileTable.md#fileIndex)
2525

2626
## <span id="refresh"> Refreshing Cached File Listings
2727

docs/datasources/PartitioningAwareFileIndex.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ allFiles(): Seq[FileStatus]
6161

6262
* `DataSource` is requested to [getOrInferFileFormatSchema](../DataSource.md#getOrInferFileFormatSchema) and [resolveRelation](../DataSource.md#resolveRelation)
6363
* `PartitioningAwareFileIndex` is requested for [files matching filters](#listFiles), [input files](#inputFiles), and [size](#sizeInBytes)
64-
* `FileTable` is requested for a [data schema](../connector/FileTable.md#dataSchema)
64+
* `FileTable` is requested for a [data schema](FileTable.md#dataSchema)
6565

6666
## <span id="listFiles"> Files Matching Filters
6767

Lines changed: 47 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ParquetTable
22

3-
`ParquetTable` is a [FileTable](../../connector/FileTable.md).
3+
`ParquetTable` is a [FileTable](../FileTable.md).
44

55
## Creating Instance
66

@@ -15,28 +15,36 @@
1515

1616
`ParquetTable` is created when:
1717

18-
* `ParquetDataSourceV2` is requested to [getTable](ParquetDataSourceV2.md#getTable)
18+
* `ParquetDataSourceV2` is requested for a [Table](ParquetDataSourceV2.md#getTable)
1919

20-
## <span id="formatName"> formatName
20+
## <span id="formatName"> Format Name
2121

2222
```scala
2323
formatName: String
2424
```
2525

26-
`formatName` is `Parquet`.
26+
`formatName` is part of the [FileTable](../FileTable.md#formatName) abstraction.
2727

28-
`formatName` is part of the [FileTable](../../connector/FileTable.md#formatName) abstraction.
28+
---
2929

30-
## <span id="inferSchema"> inferSchema
30+
`formatName` is the following text:
31+
32+
```text
33+
Parquet
34+
```
35+
36+
## <span id="inferSchema"> Schema Inference
3137

3238
```scala
3339
inferSchema(
3440
files: Seq[FileStatus]): Option[StructType]
3541
```
3642

37-
`inferSchema` [infers the schema](ParquetUtils.md#inferSchema) (with the [options](#options) and the input Hadoop `FileStatus`es).
43+
`inferSchema` is part of the [FileTable](../FileTable.md#inferSchema) abstraction.
3844

39-
`inferSchema` is part of the [FileTable](../../connector/FileTable.md#inferSchema) abstraction.
45+
---
46+
47+
`inferSchema` [infers the schema](ParquetUtils.md#inferSchema) (with the [options](#options) and the input Hadoop `FileStatus`es).
4048

4149
## <span id="newScanBuilder"> newScanBuilder
4250

@@ -45,9 +53,16 @@ newScanBuilder(
4553
options: CaseInsensitiveStringMap): ParquetScanBuilder
4654
```
4755

48-
`newScanBuilder` creates a [ParquetScanBuilder](ParquetScanBuilder.md) (with the [fileIndex](../../connector/FileTable.md#fileIndex), the [schema](../../connector/FileTable.md#schema) and the [dataSchema](../../connector/FileTable.md#dataSchema)).
56+
`newScanBuilder` is part of the [FileTable](../FileTable.md#newScanBuilder) abstraction.
57+
58+
---
4959

50-
`newScanBuilder` is part of the [FileTable](../../connector/FileTable.md#newScanBuilder) abstraction.
60+
`newScanBuilder` creates a [ParquetScanBuilder](ParquetScanBuilder.md) with the following:
61+
62+
* [fileIndex](../FileTable.md#fileIndex)
63+
* [schema](../FileTable.md#schema)
64+
* [dataSchema](../FileTable.md#dataSchema)
65+
* [options](#options)
5166

5267
## <span id="newWriteBuilder"> newWriteBuilder
5368

@@ -56,6 +71,26 @@ newWriteBuilder(
5671
info: LogicalWriteInfo): WriteBuilder
5772
```
5873

59-
`newWriteBuilder` creates a [WriteBuilder](../../connector/WriteBuilder.md) with [build](../../connector/WriteBuilder.md#build) that, when executed, creates a [ParquetWrite](ParquetWrite.md).
74+
`newWriteBuilder` is part of the [FileTable](../FileTable.md#newWriteBuilder) abstraction.
75+
76+
---
77+
78+
`newWriteBuilder` creates a [WriteBuilder](../../connector/WriteBuilder.md) that creates a [ParquetWrite](ParquetWrite.md) (when requested to [build a Write](../../connector/WriteBuilder.md#build)).
79+
80+
## <span id="supportsDataType"> supportsDataType
81+
82+
```scala
83+
supportsDataType(
84+
dataType: DataType): Boolean
85+
```
86+
87+
`supportsDataType` is part of the [FileTable](../FileTable.md#supportsDataType) abstraction.
88+
89+
---
90+
91+
`supportsDataType` supports all [AtomicType](../../types/AtomicType.md)s and the following complex [DataType](../../types/DataType.md)s with `AtomicType`s:
6092

61-
`newWriteBuilder` is part of the [FileTable](../../connector/FileTable.md#newWriteBuilder) abstraction.
93+
* [StructType](../../types/StructType.md)
94+
* [ArrayType](../../types/ArrayType.md)
95+
* `MapType`
96+
* `UserDefinedType`

mkdocs.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,6 @@ nav:
149149
- DataWriter: connector/DataWriter.md
150150
- DataWriterFactory: connector/DataWriterFactory.md
151151
- InputPartition: connector/InputPartition.md
152-
- FileTable: connector/FileTable.md
153152
- MetadataColumn: connector/MetadataColumn.md
154153
- MetadataColumnHelper: connector/MetadataColumnHelper.md
155154
- MetadataColumnsHelper: connector/MetadataColumnsHelper.md
@@ -832,7 +831,7 @@ nav:
832831
- CSV:
833832
- CSVFileFormat: datasources/csv/CSVFileFormat.md
834833
- CSVScanBuilder: datasources/csv/CSVScanBuilder.md
835-
- Files:
834+
- File-Based:
836835
- BasicWriteJobStatsTracker: datasources/BasicWriteJobStatsTracker.md
837836
- BasicWriteTaskStats: datasources/BasicWriteTaskStats.md
838837
- FileBatchWrite: datasources/FileBatchWrite.md
@@ -850,6 +849,7 @@ nav:
850849
- FilePartitionReaderFactory: datasources/FilePartitionReaderFactory.md
851850
- FileScan: datasources/FileScan.md
852851
- FileScanBuilder: datasources/FileScanBuilder.md
852+
- FileTable: datasources/FileTable.md
853853
- FileWrite: datasources/FileWrite.md
854854
- FileWriterFactory: datasources/FileWriterFactory.md
855855
- HadoopFileLinesReader: datasources/HadoopFileLinesReader.md

0 commit comments

Comments
 (0)