diff --git a/PROTOCOL.md b/PROTOCOL.md index 07df3add9a2..4119bde42d5 100644 --- a/PROTOCOL.md +++ b/PROTOCOL.md @@ -521,7 +521,7 @@ Specifically, to read the row-level changes made in a version, the following str Field Name | Data Type | Description -|-|- _commit_version|`Long`| The table version containing the change. This can be derived from the name of the Delta log file that contains actions. - _commit_timestamp|`Timestamp`| The timestamp associated when the commit was created. This can be derived from the file modification time of the Delta log file that contains actions. + _commit_timestamp|`Timestamp`| The timestamp associated when the commit was created. Depending on whether [In-Commit Timestamps](#in-commit-timestamps) are enabled, this is derived from either the `inCommitTimestamp` field of the `commitInfo` action of the version's Delta log file, or from the Delta log file's modification time. ##### Note for non-change data readers @@ -620,6 +620,8 @@ A delta file can optionally contain additional provenance information about what Implementations are free to store any valid JSON-formatted data via the `commitInfo` action. +When [In-Commit Timestamps](#in-commit-timestamps) are enabled, writers are required to include a `commitInfo` action with every commit, which must include the `inCommitTimestamp` field. Also, the `commitInfo` action must be first action in the commit. + An example of storing provenance information related to an `INSERT` operation: ```json { @@ -1255,6 +1257,41 @@ The example above converts `configuration` field into JSON format, including esc } ``` +# In-Commit Timestamps + +The In-Commit Timestamps writer feature strongly associates a monotonically increasing timestamp with each commit by storing it in the commit's metadata. + +Enablement: +- The table must be on Writer Version 7. +- The feature `inCommitTimestamps` must exist in the table `protocol`'s `writerFeatures`. +- The table property `delta.enableInCommitTimestamps` must be set to `true`. + +## Writer Requirements for In-Commit Timestamps + +When In-Commit Timestamps is enabled, then: +1. Writers must write the `commitInfo` (see [Commit Provenance Information](#commit-provenance-information)) action in the commit. +2. The `commitInfo` action must be the first action in the commit. +3. The `commitInfo` action must include a field named `inCommitTimestamp`, of type `long` (see [Primitive Types](#primitive-types)), which represents the time (in milliseconds since the Unix epoch) when the commit is considered to have succeeded. It is the larger of two values: + - The time, in milliseconds since the Unix epoch, at which the writer attempted the commit + - One millisecond later than the previous commit's `inCommitTimestamp` +4. If the table has commits from a period when this feature was not enabled, provenance information around when this feature was enabled must be tracked in table properties: + - The property `delta.inCommitTimestampEnablementVersion` must be used to track the version of the table when this feature was enabled. + - The property `delta.inCommitTimestampEnablementTimestamp` must be the same as the `inCommitTimestamp` of the commit when this feature was enabled. +5. The `inCommitTimestamp` of the commit that enables this feature must be greater than the file modification time of the immediately preceding commit. + +## Recommendations for Readers of Tables with In-Commit Timestamps + +For tables with In-Commit timestamps enabled, readers should use the `inCommitTimestamp` as the commit timestamp for operations like time travel and [`DESCRIBE HISTORY`](https://docs.delta.io/latest/delta-utility.html#retrieve-delta-table-history). +If a table has commits from a period before In-Commit timestamps were enabled, the table properties `delta.inCommitTimestampEnablementVersion` and `delta.inCommitTimestampEnablementTimestamp` would be set and can be used to identify commits that don't have `inCommitTimestamp`. +To correctly determine the commit timestamp for these tables, readers can use the following rules: +1. For commits with version >= `delta.inCommitTimestampEnablementVersion`, readers should use the `inCommitTimestamp` field of the `commitInfo` action. +2. For commits with version < `delta.inCommitTimestampEnablementVersion`, readers should use the file modification timestamp. + +Furthermore, when attempting timestamp-based time travel where table state must be fetched as of `timestamp X`, readers should use the following rules: +1. If `timestamp X` >= `delta.inCommitTimestampEnablementTimestamp`, only table versions >= `delta.inCommitTimestampEnablementVersion` should be considered for the query. +2. Otherwise, only table versions less than `delta.inCommitTimestampEnablementVersion` should be considered for the query. + + # Requirements for Writers This section documents additional requirements that writers must follow in order to preserve some of the higher level guarantees that Delta provides. diff --git a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableConfig.java b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableConfig.java index a78af3bb259..b00be334998 100644 --- a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableConfig.java +++ b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableConfig.java @@ -85,7 +85,7 @@ public class TableConfig { */ public static final TableConfig IN_COMMIT_TIMESTAMPS_ENABLED = new TableConfig<>( - "delta.enableInCommitTimestamps-preview", + "delta.enableInCommitTimestamps", "false", /* default values */ (engineOpt, v) -> Boolean.valueOf(v), value -> true, @@ -97,7 +97,7 @@ public class TableConfig { */ public static final TableConfig> IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION = new TableConfig<>( - "delta.inCommitTimestampEnablementVersion-preview", + "delta.inCommitTimestampEnablementVersion", null, /* default values */ (engineOpt, v) -> Optional.ofNullable(v).map(Long::valueOf), value -> true, @@ -110,7 +110,7 @@ public class TableConfig { */ public static final TableConfig> IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP = new TableConfig<>( - "delta.inCommitTimestampEnablementTimestamp-preview", + "delta.inCommitTimestampEnablementTimestamp", null, /* default values */ (engineOpt, v) -> Optional.ofNullable(v).map(Long::valueOf), value -> true, diff --git a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java index 271621bb28b..d8b709acaff 100644 --- a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java +++ b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java @@ -38,7 +38,7 @@ public class TableFeatures { new HashSet() { { add("appendOnly"); - add("inCommitTimestamp-preview"); + add("inCommitTimestamp"); add("columnMapping"); } }); @@ -84,8 +84,8 @@ public static void validateReadSupportedTable( *
    *
  • protocol writer version 1. *
  • protocol writer version 2 only with appendOnly feature enabled. - *
  • protocol writer version 7 with {@code appendOnly}, {@code inCommitTimestamp-preview}, - * {@code columnMapping} feature enabled. + *
  • protocol writer version 7 with {@code appendOnly}, {@code inCommitTimestamp}, {@code + * columnMapping} feature enabled. *
* * @param protocol Table protocol @@ -121,7 +121,7 @@ public static void validateWriteSupportedTable( // Only supported writer features as of today in Kernel case "appendOnly": break; - case "inCommitTimestamp-preview": + case "inCommitTimestamp": break; case "columnMapping": break; @@ -158,9 +158,9 @@ public static Tuple2 minProtocolVersionFromAutomaticallyEnable /** * Extract the writer features that should be enabled automatically based on the metadata which - * are not already enabled. For example, the {@code inCommitTimestamp-preview} feature should be - * enabled when the delta property name (delta.enableInCommitTimestamps-preview) is set to true in - * the metadata if it is not already enabled. + * are not already enabled. For example, the {@code inCommitTimestamp} feature should be enabled + * when the delta property name (delta.enableInCommitTimestamps) is set to true in the metadata if + * it is not already enabled. * * @param engine the engine to use for IO operations * @param metadata the metadata of the table @@ -184,7 +184,7 @@ public static Set extractAutomaticallyEnabledWriterFeatures( */ private static int getMinReaderVersion(String feature) { switch (feature) { - case "inCommitTimestamp-preview": + case "inCommitTimestamp": return 3; default: return 1; @@ -199,7 +199,7 @@ private static int getMinReaderVersion(String feature) { */ private static int getMinWriterVersion(String feature) { switch (feature) { - case "inCommitTimestamp-preview": + case "inCommitTimestamp": return 7; default: return 2; @@ -218,7 +218,7 @@ private static int getMinWriterVersion(String feature) { private static boolean metadataRequiresWriterFeatureToBeEnabled( Engine engine, Metadata metadata, String feature) { switch (feature) { - case "inCommitTimestamp-preview": + case "inCommitTimestamp": return TableConfig.isICTEnabled(engine, metadata); default: return false; diff --git a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java index 222d4693fee..b9908deb913 100644 --- a/kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java +++ b/kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java @@ -177,7 +177,7 @@ public static long getRequiredInCommitTimestamp( new InvalidTableException( dataPath.toString(), String.format( - "This table has the feature inCommitTimestamp-preview " + "This table has the feature inCommitTimestamp " + "enabled which requires the presence of the CommitInfo action " + "in every commit. However, the CommitInfo action is " + "missing from commit version %s.", @@ -187,7 +187,7 @@ public static long getRequiredInCommitTimestamp( new InvalidTableException( dataPath.toString(), String.format( - "This table has the feature inCommitTimestamp-preview " + "This table has the feature inCommitTimestamp " + "enabled which requires the presence of inCommitTimestamp in the " + "CommitInfo action. However, this field has not " + "been set in commit version %s.", diff --git a/kernel/kernel-api/src/test/scala/io/delta/kernel/internal/TableFeaturesSuite.scala b/kernel/kernel-api/src/test/scala/io/delta/kernel/internal/TableFeaturesSuite.scala index 73da8b7f171..4330a8b455d 100644 --- a/kernel/kernel-api/src/test/scala/io/delta/kernel/internal/TableFeaturesSuite.scala +++ b/kernel/kernel-api/src/test/scala/io/delta/kernel/internal/TableFeaturesSuite.scala @@ -68,7 +68,7 @@ class TableFeaturesSuite extends AnyFunSuite { checkSupported(createTestProtocol(minWriterVersion = 7)) } - Seq("appendOnly", "inCommitTimestamp-preview", "columnMapping") + Seq("appendOnly", "inCommitTimestamp", "columnMapping") .foreach { supportedWriterFeature => test(s"validateWriteSupported: protocol 7 with $supportedWriterFeature") { checkSupported(createTestProtocol(minWriterVersion = 7, supportedWriterFeature)) diff --git a/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala b/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala index bf2651b4e5b..4d3cd179674 100644 --- a/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala +++ b/kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala @@ -77,7 +77,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { assert(ver0Snapshot.getTimestamp(engine) === beforeCommitAttemptStartTime + 1) assert( getInCommitTimestamp(engine, table, version = 0).get === ver0Snapshot.getTimestamp(engine)) - assertHasWriterFeature(ver0Snapshot, "inCommitTimestamp-preview") + assertHasWriterFeature(ver0Snapshot, "inCommitTimestamp") } } @@ -94,7 +94,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { val ver0Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl] assertMetadataProp(engine, ver0Snapshot, IN_COMMIT_TIMESTAMPS_ENABLED, false) - assertHasNoWriterFeature(ver0Snapshot, "inCommitTimestamp-preview") + assertHasNoWriterFeature(ver0Snapshot, "inCommitTimestamp") assert(getInCommitTimestamp(engine, table, version = 0).isEmpty) setTablePropAndVerify( @@ -106,7 +106,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { expectedValue = true) val ver1Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl] - assertHasWriterFeature(ver1Snapshot, "inCommitTimestamp-preview") + assertHasWriterFeature(ver1Snapshot, "inCommitTimestamp") assert(ver1Snapshot.getTimestamp(engine) > ver0Snapshot.getTimestamp(engine)) assert( getInCommitTimestamp(engine, table, version = 1).get === ver1Snapshot.getTimestamp(engine)) @@ -168,7 +168,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { assert(ex.getMessage.contains(String.format( "This table has the feature %s enabled which requires the presence of the " + "CommitInfo action in every commit. However, the CommitInfo action is " + - "missing from commit version %s.", "inCommitTimestamp-preview", "0"))) + "missing from commit version %s.", "inCommitTimestamp", "0"))) } } @@ -214,7 +214,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { assert(ex.getMessage.contains(String.format( "This table has the feature %s enabled which requires the presence of " + "inCommitTimestamp in the CommitInfo action. However, this field has not " + - "been set in commit version %s.", "inCommitTimestamp-preview", "0"))) + "been set in commit version %s.", "inCommitTimestamp", "0"))) } } @@ -299,7 +299,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { expectedValue = true) val protocol = getProtocolActionFromCommit(engine, table, 0) assert(protocol.isDefined) - assert(VectorUtils.toJavaList(protocol.get.getArray(3)).contains("inCommitTimestamp-preview")) + assert(VectorUtils.toJavaList(protocol.get.getArray(3)).contains("inCommitTimestamp")) setTablePropAndVerify( engine = engine, @@ -349,9 +349,9 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { " \"name\" : \"id\",\n \"type\" : \"integer\",\n \"nullable\" : true, \n" + " \"metadata\" : {}\n} ]\n}', " + "partitionColumns=List(), createdTime=Optional[%s], " + - "configuration={delta.enableInCommitTimestamps-preview=true, " + - "delta.inCommitTimestampEnablementVersion-preview=1, " + - "delta.inCommitTimestampEnablementTimestamp-preview=%s}}", + "configuration={delta.inCommitTimestampEnablementTimestamp=%s, " + + "delta.enableInCommitTimestamps=true, " + + "delta.inCommitTimestampEnablementVersion=1}}", metadata.getId, metadata.getCreatedTime.get, inCommitTimestamp.toString)) @@ -397,7 +397,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { verifyWrittenContent(tablePath, testSchema, expData) verifyTableProperties(tablePath, ListMap(IN_COMMIT_TIMESTAMPS_ENABLED.getKey -> true, - "delta.feature.inCommitTimestamp-preview" -> "supported", + "delta.feature.inCommitTimestamp" -> "supported", IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP.getKey -> getInCommitTimestamp(engine, table, version = 1).get, IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION.getKey -> 1L), @@ -542,7 +542,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase { assert(ex.getMessage.contains(String.format( "This table has the feature %s enabled which requires the presence of the " + "CommitInfo action in every commit. However, the CommitInfo action is " + - "missing from commit version %s.", "inCommitTimestamp-preview", "2"))) + "missing from commit version %s.", "inCommitTimestamp", "2"))) } } diff --git a/protocol_rfcs/README.md b/protocol_rfcs/README.md index f42282bf10d..d5f1fadc2d0 100644 --- a/protocol_rfcs/README.md +++ b/protocol_rfcs/README.md @@ -18,7 +18,6 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, | Date proposed | RFC file | Github issue | RFC title | |:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:---------------------------------------| -| 2023-02-02 | [in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps | | 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening | | 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits | | 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking | @@ -30,6 +29,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024, | Date proposed | Date accepted | RFC file | Github issue | RFC title | |:-|:-|:-|:-|:-| | 2023-02-28 | 2023-03-26 |[vacuum-protocol-check.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/vacuum-protocol-check.md)| https://github.com/delta-io/delta/issues/2630 | Enforce Vacuum Protocol Check | +| 2023-02-02 | 2023-07-24 |[in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps | ### Rejected RFCs diff --git a/protocol_rfcs/in-commit-timestamps.md b/protocol_rfcs/accepted/in-commit-timestamps.md similarity index 100% rename from protocol_rfcs/in-commit-timestamps.md rename to protocol_rfcs/accepted/in-commit-timestamps.md diff --git a/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala b/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala index e5b79a8ad17..486408607ba 100644 --- a/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala +++ b/spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala @@ -766,7 +766,7 @@ trait DeltaConfigsBase extends DeltaLogging { " commit-coordinator.") val IN_COMMIT_TIMESTAMPS_ENABLED = buildConfig[Boolean]( - "enableInCommitTimestamps-preview", + "enableInCommitTimestamps", false.toString, _.toBoolean, validationFunction = _ => true, @@ -778,7 +778,7 @@ trait DeltaConfigsBase extends DeltaLogging { * inCommitTimestamps were enabled. */ val IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION = buildConfig[Option[Long]]( - "inCommitTimestampEnablementVersion-preview", + "inCommitTimestampEnablementVersion", null, v => Option(v).map(_.toLong), validationFunction = _ => true, @@ -791,7 +791,7 @@ trait DeltaConfigsBase extends DeltaLogging { * the version specified in [[IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION]]. */ val IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP = buildConfig[Option[Long]]( - "inCommitTimestampEnablementTimestamp-preview", + "inCommitTimestampEnablementTimestamp", null, v => Option(v).map(_.toLong), validationFunction = _ => true, diff --git a/spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala b/spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala index a18c71ecedb..751666e2d97 100644 --- a/spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala +++ b/spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala @@ -803,7 +803,7 @@ object TypeWideningTableFeature * every writer write a monotonically increasing timestamp inside the commit file. */ object InCommitTimestampTableFeature - extends WriterFeature(name = "inCommitTimestamp-preview") + extends WriterFeature(name = "inCommitTimestamp") with FeatureAutomaticallyEnabledByMetadata with RemovableFeature {