You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Call DeltaLog.update in Delta streaming source to ensure we use latest table schema
When a `DeltaLog` instance is cached, `DeltaSource` will get the cached `DeltaLog` when calling `DeltaLog.forTable`. However, it doesn't call `DeltaLog.update`. This means if nobody on the same cluster touches `DeltaLog`, running the streaming query on this cluster will always use a stale `Snapshot` in the cached `DeltaLog`.
This breaks one use case: when a streaming query detects a schema change in a Delta table, it will fail. But when the streaming query gets restarted on the same cluster, it should recover and continue to run like running on a different cluster. Due to the above bug, the streaming query cannot get the latest schema (`DeltaSource.schema` is using the stale `Snapshot` to get the schema) and fail during restart.
This PR adds the missing `update` calls to make sure `DeltaDataSource.sourceSchema` and `DeltaSource.schema` always get the latest table schema.
The new added unit test.
GitOrigin-RevId: b5488671ceaf942e48c4cbdb068b305fdc582d46
0 commit comments