Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support default values for specified fields #5023

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

davidyuan1223
Copy link

… string type)

Purpose

Linked issue: close #5015

paimon spark support default value for specified fields.(Current only string type)
(Testing, not need check)

Tests

API and Format

Documentation

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Feb 7, 2025

Thanks for the implementation, We hope its final form refers to https://issues.apache.org/jira/browse/SPARK-38334:

  1. Allow table creation using Spark's default value syntax, while also saving it in paimon metadata format to facilitate recognition by other clients.
  2. Refer to Spark's own implementation of default values as much as possible.

@davidyuan1223
Copy link
Author

Thanks for the implementation, We hope its final form refers to https://issues.apache.org/jira/browse/SPARK-38334:

  1. Allow table creation using Spark's default value syntax, while also saving it in paimon metadata format to facilitate recognition by other clients.
  2. Refer to Spark's own implementation of default values as much as possible.

lgtm, i will try to implement this

@davidyuan1223
Copy link
Author

davidyuan1223 commented Feb 10, 2025

@Zouxxyy hello, wanna ask a question, if we load create table with defalt value's logical plan, the default filed should put in where? In fields?

{
  "version" : 3,
  "id" : 0,
  "fields" : [ {
    "id" : 0,
    "name" : "id",
    "type" : "INT"
  }, {
    "id" : 1,
    "name" : "t1",
    "type" : "INT"
  }, {
    "id" : 2,
    "name" : "t2",
    "type" : "INT"
  } ],
  "highestFieldId" : 2,
  "partitionKeys" : [ ],
  "primaryKeys" : [ ],
  "options" : {
    "owner" : "xxx"
  },
  "timeMillis" : 1739187234928
}

like this? Because the spark field has metadata property, i think we should follow it, WDYT?

{
  "version" : 3,
  "id" : 0,
  "fields" : [ {
    "id" : 0,
    "name" : "id",
    "type" : "INT",
    "metadata": {}
  }, {
    "id" : 1,
    "name" : "t1",
    "type" : "INT",
    "metadata": {}
  }, {

    "id" : 2,
    "name" : "t2",
    "type" : "INT",
    "metadata": {}
  } ],
  "highestFieldId" : 2,
  "partitionKeys" : [ ],
  "primaryKeys" : [ ],
  "options" : {
    "owner" : "xxx"
  },
  "timeMillis" : 1739187234928
}

@davidyuan1223
Copy link
Author

davidyuan1223 commented Feb 10, 2025

Thanks for the implementation, We hope its final form refers to https://issues.apache.org/jira/browse/SPARK-38334:

  1. Allow table creation using Spark's default value syntax, while also saving it in paimon metadata format to facilitate recognition by other clients.
  2. Refer to Spark's own implementation of default values as much as possible.

lgtm, i will try to implement this

basic of this, i implement an simple pr, we could discuss how to improve it

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Feb 24, 2025

@Zouxxyy hello, wanna ask a question, if we load create table with defalt value's logical plan, the default filed should put in where? In fields?

@davidyuan1223 Since Paimon already has a prop for default values, we can just use that, what do you think @JingsongLi

So I think we can implement it in the following steps:

  1. Support Paimon's built-in default value syntax for spark reading (this is actually your first PR)
CREATE TABLE t (id INT, name STRING)TBLPROPERTIES ('fields.name.default-value' = 'default v');

-- check
INSERT INTO t (id) values (1);
SELECT * FROM t;
  1. Support Spark's DDL syntax (for Spark version 3.4 and above), which will save 'fields.name.default-value' to Paimon's properties.
CREATE TABLE t (id INT, name STRING DEFAULT 'default v');

@davidyuan1223
Copy link
Author

@Zouxxyy hello, wanna ask a question, if we load create table with defalt value's logical plan, the default filed should put in where? In fields?

@davidyuan1223 Since Paimon already has a prop for default values, we can just use that, what do you think @JingsongLi

So I think we can implement it in the following steps:

  1. Support Paimon's built-in default value syntax for spark reading (this is actually your first PR)
CREATE TABLE t (id INT, name STRING)TBLPROPERTIES ('fields.name.default-value' = 'default v');

-- check
INSERT INTO t (id) values (1);
SELECT * FROM t;
  1. Support Spark's DDL syntax (for Spark version 3.4 and above), which will save 'fields.name.default-value' to Paimon's properties.
CREATE TABLE t (id INT, name STRING DEFAULT 'default v');

because i saw the pr https://issues.apache.org/jira/browse/SPARK-38334 which implementation metadata field, so i implementation a pr with the metadata field in paimon metadata file, shall we use this? or keep the properties?

@JingsongLi
Copy link
Contributor

Our previous default design was flawed, and we may need to consider refactoring, but this refactoring requires consideration of:

  1. Compatibility
  2. Multi engines
  3. Practices in other formats

This may require a larger design, preferably with a PIP to discuss it in detail.

@Zouxxyy
Copy link
Contributor

Zouxxyy commented Feb 24, 2025

Agree, we can also consider supporting schema evolution, like changing the default value. These need to be carefully designed.

@davidyuan1223
Copy link
Author

LGTM, i think add metadata or other in filed detail may could help us compatiable with more components, we could discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support default values for specified fields.
3 participants