FileTable

jaceklaskowski · jaceklaskowski · commit d932d984652f · 2023-02-10T16:41:04.000+01:00
diff --git a/docs/connector/SupportsRead.md b/docs/connector/SupportsRead.md
@@ -23,7 +23,7 @@ Used when:
 
 ## Implementations
 
-* [FileTable](FileTable.md)
+* [FileTable](../datasources/FileTable.md)
 * `JDBCTable`
 * [KafkaTable](../kafka/KafkaTable.md)
 * `MemoryStreamTable` ([Spark Structured Streaming]({{ book.structured_streaming }}/datasources/memory))
diff --git a/docs/connector/SupportsWrite.md b/docs/connector/SupportsWrite.md
@@ -24,7 +24,7 @@ Used when:
 ## Implementations
 
 * ConsoleTable (Spark Structured Streaming)
-* [FileTable](FileTable.md)
+* [FileTable](../datasources/FileTable.md)
 * ForeachWriterTable (Spark Structured Streaming)
 * [KafkaTable](../kafka/KafkaTable.md)
 * MemorySink (Spark Structured Streaming)
diff --git a/docs/connector/Table.md b/docs/connector/Table.md
@@ -78,7 +78,7 @@ Used when:
 ## Implementations
 
 * `ConsoleTable` (Spark Structured Streaming)
-* [FileTable](FileTable.md)
+* [FileTable](../datasources/FileTable.md)
 * `ForeachWriterTable` (Spark Structured Streaming)
 * [KafkaTable](../kafka/KafkaTable.md)
 * `MemorySink` (Spark Structured Streaming)
diff --git a/docs/datasources/FileIndex.md b/docs/datasources/FileIndex.md
@@ -59,7 +59,7 @@ Used when:
 * `DataSource` is requested to [getOrInferFileFormatSchema](../DataSource.md#getOrInferFileFormatSchema) and [resolve a FileFormat-based relation](../DataSource.md#resolveRelation)
 * `FallBackFileSourceV2` logical resolution rule is executed
 * [FileScanBuilder](FileScanBuilder.md) is created
-* `FileTable` is requested for [dataSchema](../connector/FileTable.md#dataSchema) and [partitioning](../connector/FileTable.md#partitioning)
+* `FileTable` is requested for [dataSchema](FileTable.md#dataSchema) and [partitioning](FileTable.md#partitioning)
 
 ### <span id="refresh"> Refreshing Cached File Listings
 
diff --git a/docs/datasources/FileTable.md b/docs/datasources/FileTable.md
@@ -1,28 +1,28 @@
 # FileTable
 
-`FileTable` is an [extension](#contract) of the [Table](Table.md) abstraction for [file-backed tables](#implementations) with support for [read](SupportsRead.md) and [write](SupportsWrite.md).
+`FileTable` is an [extension](#contract) of the [Table](../connector/Table.md) abstraction for [file-based tables](#implementations) with support for [read](../connector/SupportsRead.md) and [write](../connector/SupportsWrite.md).
 
 ## Contract
 
-### <span id="fallbackFileFormat"> fallbackFileFormat
+### <span id="fallbackFileFormat"> Fallback FileFormat
 
 ```scala
 fallbackFileFormat: Class[_ <: FileFormat]
 ```
 
-Fallback V1 [FileFormat](../datasources/FileFormat.md)
+Fallback V1 [FileFormat](FileFormat.md)
 
 Used when `FallBackFileSourceV2` extended resolution rule is executed (to resolve an `InsertIntoStatement` with a [DataSourceV2Relation](../logical-operators/DataSourceV2Relation.md) with a `FileTable`)
 
-### <span id="formatName"> formatName
+### <span id="formatName"> Format Name
 
 ```scala
 formatName: String
 ```
 
 Name of the file table (_format_)
 
-### <span id="inferSchema"> inferSchema
+### <span id="inferSchema"> Schema Inference
 
 ```scala
 inferSchema(
@@ -53,7 +53,7 @@ Default: All [DataType](../types/DataType.md)s are supported by default
 * `CSVTable`
 * `JsonTable`
 * `OrcTable`
-* [ParquetTable](../datasources/parquet/ParquetTable.md)
+* [ParquetTable](parquet/ParquetTable.md)
 * `TextTable`
 
 ## Creating Instance
@@ -73,15 +73,17 @@ Default: All [DataType](../types/DataType.md)s are supported by default
 capabilities: java.util.Set[TableCapability]
 ```
 
-`capabilities` are the following [TableCapabilities](TableCapability.md):
+`capabilities` is part of the [Table](../connector/Table.md#capabilities) abstraction.
 
-* [BATCH_READ](TableCapability.md#BATCH_READ)
-* [BATCH_WRITE](TableCapability.md#BATCH_WRITE)
-* [TRUNCATE](TableCapability.md#TRUNCATE)
+---
 
-`capabilities` is part of the [Table](Table.md#capabilities) abstraction.
+`capabilities` are the following [TableCapabilities](../connector/TableCapability.md):
 
-## <span id="dataSchema"> dataSchema
+* [BATCH_READ](../connector/TableCapability.md#BATCH_READ)
+* [BATCH_WRITE](../connector/TableCapability.md#BATCH_WRITE)
+* [TRUNCATE](../connector/TableCapability.md#TRUNCATE)
+
+## <span id="dataSchema"> Data Schema
 
 ```scala
 dataSchema: StructType
@@ -92,47 +94,71 @@ dataSchema: StructType
 ??? note "Lazy Value"
     `dataSchema` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and cached afterwards.
 
+---
+
 `dataSchema` is used when:
 
 * `FileTable` is requested for a [schema](#schema)
 * _others_ (in [FileTables](#implementations))
 
-## fileIndex
+## <span id="partitioning"> Partitioning
 
 ```scala
-fileIndex: PartitioningAwareFileIndex
+partitioning: Array[Transform]
 ```
 
-`fileIndex`...FIXME
+`partitioning` is part of the [Table](../connector/Table.md#partitioning) abstraction.
+
+---
 
-`fileIndex` is used when...FIXME
+`partitioning`...FIXME
 
-## partitioning
+## <span id="properties"> Properties
 
 ```scala
-partitioning: Array[Transform]
+properties: util.Map[String, String]
 ```
 
-`partitioning`...FIXME
+`properties` is part of the [Table](../connector/Table.md#properties) abstraction.
 
-`partitioning` is part of the [Table](Table.md#partitioning) abstraction.
+---
 
-## properties
+`properties` returns the [options](#options).
+
+## <span id="schema"> Table Schema
 
 ```scala
-properties: util.Map[String, String]
+schema: StructType
 ```
 
-`properties` is simply the [options](#options).
+`schema` is part of the [Table](../connector/Table.md#schema) abstraction.
 
-`properties` is part of the [Table](Table.md#properties) abstraction.
+---
 
-## <span id="schema"> schema
+`schema`...FIXME
+
+## <span id="fileIndex"> PartitioningAwareFileIndex
 
 ```scala
-schema: StructType
+fileIndex: PartitioningAwareFileIndex
 ```
 
-`schema`...FIXME
+??? note "Lazy Value"
+    `fileIndex` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
+
+    Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy).
+
+`fileIndex` creates one of the following [PartitioningAwareFileIndex](PartitioningAwareFileIndex.md)s:
+
+* `MetadataLogFileIndex` when reading from the results of a streaming query
+* [InMemoryFileIndex](InMemoryFileIndex.md)
+
+---
+
+`fileIndex` is used when:
 
-`schema` is part of the [Table](Table.md#schema) abstraction.
+* [FileTable](FileTable.md#implementations)s are requested for [FileScanBuilder](FileScanBuilder.md#fileIndex)s
+* `Dataset` is requested for the [inputFiles](../Dataset.md#inputFiles)
+* `CacheManager` is requested to [lookupAndRefresh](../CacheManager.md#lookupAndRefresh)
+* `FallBackFileSourceV2` is created
+* `FileTable` is requested to [dataSchema](#dataSchema), [schema](#schema), [partitioning](#partitioning)
diff --git a/docs/datasources/InMemoryFileIndex.md b/docs/datasources/InMemoryFileIndex.md
@@ -21,7 +21,7 @@ While being created, `InMemoryFileIndex` [refresh0](#refresh0).
 * `HiveMetastoreCatalog` is requested to [inferIfNeeded](../hive/HiveMetastoreCatalog.md#inferIfNeeded)
 * `CatalogFileIndex` is requested for the [partitions by the given predicate expressions](CatalogFileIndex.md#filterPartitions) for a non-partitioned Hive table
 * `DataSource` is requested to [createInMemoryFileIndex](../DataSource.md#createInMemoryFileIndex)
-* `FileTable` is requested for a [PartitioningAwareFileIndex](../connector/FileTable.md#fileIndex)
+* `FileTable` is requested for a [PartitioningAwareFileIndex](FileTable.md#fileIndex)
 
 ## <span id="refresh"> Refreshing Cached File Listings
 
diff --git a/docs/datasources/PartitioningAwareFileIndex.md b/docs/datasources/PartitioningAwareFileIndex.md
@@ -61,7 +61,7 @@ allFiles(): Seq[FileStatus]
 
 * `DataSource` is requested to [getOrInferFileFormatSchema](../DataSource.md#getOrInferFileFormatSchema) and [resolveRelation](../DataSource.md#resolveRelation)
 * `PartitioningAwareFileIndex` is requested for [files matching filters](#listFiles), [input files](#inputFiles), and [size](#sizeInBytes)
-* `FileTable` is requested for a [data schema](../connector/FileTable.md#dataSchema)
+* `FileTable` is requested for a [data schema](FileTable.md#dataSchema)
 
 ## <span id="listFiles"> Files Matching Filters
 
diff --git a/docs/datasources/parquet/ParquetTable.md b/docs/datasources/parquet/ParquetTable.md
@@ -1,6 +1,6 @@
 # ParquetTable
 
-`ParquetTable` is a [FileTable](../../connector/FileTable.md).
+`ParquetTable` is a [FileTable](../FileTable.md).
 
 ## Creating Instance
 
@@ -15,28 +15,36 @@
 
 `ParquetTable` is created when:
 
-* `ParquetDataSourceV2` is requested to [getTable](ParquetDataSourceV2.md#getTable)
+* `ParquetDataSourceV2` is requested for a [Table](ParquetDataSourceV2.md#getTable)
 
-## <span id="formatName"> formatName
+## <span id="formatName"> Format Name
 
 ```scala
 formatName: String
 ```
 
-`formatName` is `Parquet`.
+`formatName` is part of the [FileTable](../FileTable.md#formatName) abstraction.
 
-`formatName` is part of the [FileTable](../../connector/FileTable.md#formatName) abstraction.
+---
 
-## <span id="inferSchema"> inferSchema
+`formatName` is the following text:
+
+```text
+Parquet
+```
+
+## <span id="inferSchema"> Schema Inference
 
 ```scala
 inferSchema(
   files: Seq[FileStatus]): Option[StructType]
 ```
 
-`inferSchema` [infers the schema](ParquetUtils.md#inferSchema) (with the [options](#options) and the input Hadoop `FileStatus`es).
+`inferSchema` is part of the [FileTable](../FileTable.md#inferSchema) abstraction.
 
-`inferSchema` is part of the [FileTable](../../connector/FileTable.md#inferSchema) abstraction.
+---
+
+`inferSchema` [infers the schema](ParquetUtils.md#inferSchema) (with the [options](#options) and the input Hadoop `FileStatus`es).
 
 ## <span id="newScanBuilder"> newScanBuilder
 
@@ -45,9 +53,16 @@ newScanBuilder(
   options: CaseInsensitiveStringMap): ParquetScanBuilder
 ```
 
-`newScanBuilder` creates a [ParquetScanBuilder](ParquetScanBuilder.md) (with the [fileIndex](../../connector/FileTable.md#fileIndex), the [schema](../../connector/FileTable.md#schema) and the [dataSchema](../../connector/FileTable.md#dataSchema)).
+`newScanBuilder` is part of the [FileTable](../FileTable.md#newScanBuilder) abstraction.
+
+---
 
-`newScanBuilder` is part of the [FileTable](../../connector/FileTable.md#newScanBuilder) abstraction.
+`newScanBuilder` creates a [ParquetScanBuilder](ParquetScanBuilder.md) with the following:
+
+* [fileIndex](../FileTable.md#fileIndex)
+* [schema](../FileTable.md#schema)
+* [dataSchema](../FileTable.md#dataSchema)
+* [options](#options)
 
 ## <span id="newWriteBuilder"> newWriteBuilder
 
@@ -56,6 +71,26 @@ newWriteBuilder(
   info: LogicalWriteInfo): WriteBuilder
 ```
 
-`newWriteBuilder` creates a [WriteBuilder](../../connector/WriteBuilder.md) with [build](../../connector/WriteBuilder.md#build) that, when executed, creates a [ParquetWrite](ParquetWrite.md).
+`newWriteBuilder` is part of the [FileTable](../FileTable.md#newWriteBuilder) abstraction.
+
+---
+
+`newWriteBuilder` creates a [WriteBuilder](../../connector/WriteBuilder.md) that creates a [ParquetWrite](ParquetWrite.md) (when requested to [build a Write](../../connector/WriteBuilder.md#build)).
+
+## <span id="supportsDataType"> supportsDataType
+
+```scala
+supportsDataType(
+  dataType: DataType): Boolean
+```
+
+`supportsDataType` is part of the [FileTable](../FileTable.md#supportsDataType) abstraction.
+
+---
+
+`supportsDataType` supports all [AtomicType](../../types/AtomicType.md)s and the following complex [DataType](../../types/DataType.md)s with `AtomicType`s:
 
-`newWriteBuilder` is part of the [FileTable](../../connector/FileTable.md#newWriteBuilder) abstraction.
+* [StructType](../../types/StructType.md)
+* [ArrayType](../../types/ArrayType.md)
+* `MapType`
+* `UserDefinedType`
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -149,7 +149,6 @@ nav:
       - DataWriter: connector/DataWriter.md
       - DataWriterFactory: connector/DataWriterFactory.md
       - InputPartition: connector/InputPartition.md
-      - FileTable: connector/FileTable.md
       - MetadataColumn: connector/MetadataColumn.md
       - MetadataColumnHelper: connector/MetadataColumnHelper.md
       - MetadataColumnsHelper: connector/MetadataColumnsHelper.md
@@ -832,7 +831,7 @@ nav:
     - CSV:
       - CSVFileFormat: datasources/csv/CSVFileFormat.md
       - CSVScanBuilder: datasources/csv/CSVScanBuilder.md
-    - Files:
+    - File-Based:
       - BasicWriteJobStatsTracker: datasources/BasicWriteJobStatsTracker.md
       - BasicWriteTaskStats: datasources/BasicWriteTaskStats.md
       - FileBatchWrite: datasources/FileBatchWrite.md
@@ -850,6 +849,7 @@ nav:
       - FilePartitionReaderFactory: datasources/FilePartitionReaderFactory.md
       - FileScan: datasources/FileScan.md
       - FileScanBuilder: datasources/FileScanBuilder.md
+      - FileTable: datasources/FileTable.md
       - FileWrite: datasources/FileWrite.md
       - FileWriterFactory: datasources/FileWriterFactory.md
       - HadoopFileLinesReader: datasources/HadoopFileLinesReader.md