Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Specifications] Transformation artifacts #3

Open
tchataigner opened this issue Aug 26, 2021 · 9 comments
Open

[Specifications] Transformation artifacts #3

tchataigner opened this issue Aug 26, 2021 · 9 comments

Comments

@tchataigner
Copy link
Contributor

When compiling rust code to generate the wasm bytecode we should generates some artifacts to retrieve metadata from the base code. The format for the metadata should be human readable and editable as some documentation might have to be completed for example.

Here is a first proposal to open the discussion:

Artifacts generation

Deux fichiers : bytecode & metadata

Metadata file

Format: TOML

Objects

Metadata

Contains metadata to associate to a transformation bytecode

  • Name: String
  • Documentation: String
  • Version: String

Structure

Structure represents custom types defined by the transformation developer

  • Name: String
  • Documentation: String
  • Fields: Fields

Transformation

Transformation represents a function in the library

  • Name: String
  • Documentation: String
  • Inputs: Fields
  • Outputs: Fields

Fields

  • Key: String
  • Type: Enum(Bool, Integer, Float, Bytes, Text, <any_structure>)

Sample

[metadata]
version = "1.0.0"
name = "Library name"
documentation = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent quis eros quis ante facilisis ultrices id at neque. Praesent justo metus, volutpat sit amet nisi vel, molestie sagittis quam. Vivamus ante neque, sollicitudin vel viverra quis, ultricies ac ante. Donec et gravida purus. Vestibulum pharetra leo vel scelerisque maximus. Vivamus purus nisl, mollis in ante at, egestas euismod erat. Pellentesque ullamcorper bibendum orci a posuere"

[[transformations]]
name = "main"
documentation = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent quis eros quis ante facilisis ultrices id at neque. Praesent justo metus, volutpat sit amet nisi vel, molestie sagittis quam. Vivamus ante neque, sollicitudin vel viverra quis, ultricies ac ante. Donec et gravida purus. Vestibulum pharetra leo vel scelerisque maximus. Vivamus purus nisl, mollis in ante at, egestas euismod erat. Pellentesque ullamcorper bibendum orci a posuere"
  
  [[transformations.inputs]]
  key = "key0"
  type = "Null"
  
  [[transformations.output]]
  key = "key1"
  type = "Structure"
  
[[structures]]
name = "Strucutre"
documentation = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent quis eros quis ante facilisis ultrices id at neque. Praesent justo metus, volutpat sit amet nisi vel, molestie sagittis quam. Vivamus ante neque, sollicitudin vel viverra quis, ultricies ac ante. Donec et gravida purus. Vestibulum pharetra leo vel scelerisque maximus. Vivamus purus nisl, mollis in ante at, egestas euismod erat. Pellentesque ullamcorper bibendum orci a posuere"
  
  [[structures.fields]]
  key = "key0"
  type = "Text"
  
  
@PhilippeMts
Copy link
Contributor

Thank you Thomas for sharing these elements that early. I'm confident in our ability to iterate rapidly and formalize an interesting specification for these artifacts.

Choice of serialization format

I am not 100% aligned with your reasoning and its output on the choice of the file format.

You say that the file should be human-editable, which I disagree with. To me, the file will most likely be generated by the SDK (parsing code, special comments and options) which is most likely to enable the creation of rich artifacts, and allowing for or encouraging human edition will most likely open the door to more troubles than allow for interesting use cases.

In this context, I think that JSON could be a better option :

  • it is human-readable too
  • closer to CBOR
  • closer to JSON Schema which I think will likely be a good starting point to describe transformation signatures and related constraints
  • even more widely used

Structure section

I do not think a Structure section could have a place in such a format, but it is no problem to me because I don't think it is required.

NB : I understand that you added it in the context of a human-editable file, but my comment here suggests a human-readable-only file.

To make it short, Ti think that factorization of code (allowed by structures) is not a goal of ours, and that our artifacts have to focus on one major objective : standard representation of transformation signatures.

In fact, if we add to this goal an objective of determinism in signature representations, such a Structure section could unnecessarily, to my mind, make things harder or even hardly possible.

Transformation input/output key

Why keys ?

I strongly disagree with the presence of any key field related to parameter "names" coming from the source code. The protocol does not rely on any key, but only on scalar and key-less recursive values.

If these keys are just indicative names/tags, they should have a different name.

Ambiguous name

I would rather change the transformation name from main to sthg else in order to clarify things and help make the distinction between a user-developed transformation and the library entry-point standardized by the protocol and generated by the SDK.

Types and JSON-schema

Here is the part I think we will probably discuss and build upon the longest.

To me, it is clearly in our interest to stay as close as possible to JSON-schema specifications to describe I/O types and other constraints.

I shared some hopefully useful links on this topic at the end the description of this ticket.

@tchataigner
Copy link
Contributor Author

Thank you for the quick feedback @PhilippeMts .

Choice of serialization format

encouraging human edition will most likely open the door to more troubles

I agree with that affirmation. But I still think that we need an "easy" way for developers to document their transformation and associate that documentation to the metadata.

We can use JSON as metadata format and maybe associate a dedicated command to insert some documentation from a markdown file to the JSON object. What do you think of that proposal ?

Structure section

About that I think that we have a different approach to what metadata are suppose to accomplish.

standard representation of transformation signatures

I completely agree with that fact. It's THE main task of the file. But I also think that another one of the task is to help anybody understands what the transformation is about. And that requires enough information on all types, signatures and transformation names. That's why I chose to represent client implemented types and not only tree of node & primitive types.

But if we chose to do so we will need to find a way to ensure the second point I just enounced.

Transformation input/output key

Why keys ?

Same logic as the previous section. If we switch to JSON they will most likely be directly included in the schema so it is not a problem anymore.

Ambiguous name

The main name that I used is the name of the function that was implemented by the transformation developer. If the name in Rust would be from_cbor then it would have been the same here. With that in mind I do not quite get your comment. Did you understand it that way ?

Types and JSON-schema

To me, it is clearly in our interest to stay as close as possible to JSON-schema specifications to describe I/O types and other constraints.

Ok I get that's the main point of your comments ! While you answer my answers (:D) I will write down a new proposal based on it.

@PhilippeMts
Copy link
Contributor

Thanks Thomas. I think we are on our way to find a consensus on each of these points.

Choice of serialization format

But I still think that we need an "easy" way for developers to document their transformation.

I of course agree, but think that the primary source of trustful information should always be the source code (as it is obviously a required component of the compilation process).

Your solution seems to be a good one to ensure some level of data validation while opening the door to the scenario you describe, but I would assign it a lower priority than the default path to the generation of this artifact.

Structure section

I also think that another one of the task is to help anybody understands what the transformation is about. And that requires enough information on all types, signatures and transformation names.

You will be pleased to know that I totally agree with that. Of course, we have to provide a way to document transformation I/O parameters, in particular when they are organized around custom types (and Rust structures).

But my remark discusses more specifically the place of this information in your document. Why putting it into a new Holium-custom dedicated section ? Why not using the descriptive fields of JSON schema ?

Transformation input/output key

Same logic as the previous section. If we switch to JSON they will most likely be directly included in the schema so it is not a problem anymore.

Sweet. But do we agree on the fact that the keys (I here mean their names) do not have a place in this document, apart from for documentation purposes ?

The main name that I used is the name of the function that was implemented by the transformation developer.

Nice, that is exactly what I needed to know.

What I wanted to say in my comment is that, the approximate compilation workflow is the following :

  • the developer develops a library L containing multiple functions , like f0 and f1.
  • when using the SDK to compile it, a unique binary, executable in a WASM runtime, is built which entry-point ep is mainly made of a big switch that allows to run either f0 or f1.

I think main is a proper name for ep (but that is just my opinion), but here, from what I understand, you use main as f0's name. I think naming it to something more explicitly identifiable as a f0's name, like my_lib::my_func could improve clarity.

@tchataigner
Copy link
Contributor Author

tchataigner commented Aug 26, 2021

Thank you again for your feedback ! I tried to assemble a better spec by taking into account your review. I am posting it here. Please feel free to comment !

Artifacts generation

Second proposal

Metadata file

Format: JSON

Base JSON definition would be an object with only one key : transformations. All structure exported from the code should be defined in root &defs. The transformation are defined by the following objects

Transformation objects

The transformation object would have name composed as <lib_name>::<function_name>. The following properties would exist on the object.

Inputs

Key: inputs

The first property for a description would be inputs for the function arguments. All values would be represented inside a tuple.

So that main(structure: ToConcatenate, done: bool) with the ToConcatenate structure represented as so :

{
  "type": "object",
  "properties": {
    "a": {
      "type": "string"
    },
    "b": {
      "type": "string"
    }
  }
}

would become:

{
  "type": "array",
  "prefixItems": [
    {
      "type": "object",
      "properties": {
        "a": {
          "type": "string"
        },
        "b": {
          "type": "string"
        }
      }
    },
    {
      "type": "bool"
    }
  ]
}

Outputs

Key: outputs

Same as inputs

Sample

Lets say that we have the following functions signatures in our code:

pub fn add(values: Values) -> u32 {
  ...
}

pub fn substract(values: Values) -> u32 {
  ...
}

with the following structure:

struct Values {
  a: u32,
  b: u32
}

The following json would be generated

{
  "title": "mylib",
  "version": "1.0.0",
  "properties": {
    "transformations": {
      "type": "array",
      "description": "All transformation available in mylib",
      "prefixItems": [
        {
          "title": "mylib::add",
          "type": "object",
          "properties": {
            "inputs": {
              "type": "array",
              "description": "Input payload for the add transformation",
              "prefixItems": [
                { "$ref": "#/$defs/values" }
              ]
            },
            "outputs": {
              "type": "array",
              "description": "Output payload for the add transformation",
              "prefixItems": [
                {
                  "type": "number",
                  "description": "Result of the addition"
                }
              ]
            }
          }
        },
        {
          "title": "mylib::subtract",
          "type": "object",
          "properties": {
            "inputs": {
              "type": "array",
              "description": "Input payload for the subtract transformation",
              "prefixItems": [
                { "$ref": "#/$defs/values" }
              ]
            },
            "outputs": {
              "type": "array",
              "description": "Output payload for the subtract transformation",
              "prefixItems": [
                {
                  "type": "number",
                  "description": "Result of the subtraction"
                }
              ]
            }
          }
        }
      ]
    }
  },
  "$defs": {
    "values": {
      "type": "object",
      "description": "Value structure, containing the two values to add",
      "properties": {
        "a": {
          "type": "number",
          "description": "First value"
        },
        "b": {
          "type": "number",
          "description": "Second value"
        }
      }
    }
  }
}

@tchataigner
Copy link
Contributor Author

I will here answer questions from you previous comment:

Choice of serialization format

N/A

Structure section

Why not using the descriptive fields of JSON schema ?

Tried to do so !

Transformation input/output key

But do we agree on the fact that the keys (I here mean their names) do not have a place in this document, apart from for documentation purposes ?

Yes I do !

@PhilippeMts
Copy link
Contributor

PhilippeMts commented Aug 31, 2021

Sorry Thomas for spending too much time since your last comment.

I need to take a step back to really frame what we are currently working on.

Here is the result of my personal reflexion on this frame. As you will understand, elements of answer to your last comment will logically come in a second comment of mine.


Context: previous elements from the design under construction

References

IPLD Schema documentation

### Standard metadata

Metadata objects are basically made of key-free mappings to maximize flexibility at the schema-level. Here are some
recommended keys.

#### Transformation metadata

| Key | Value | Example |
|---|---|---|
| `name` | Package name. | `date-and-time` |
| `version` | Package version. | `1.0.1` |

#### Wet execution metadata

| Key | Value | Example |
|---|---|---|
| `time` | Date and time of the execution in ISO 8601 format. | `2021-07-14T10:12:37Z` |

Schemata themselves

## -----
## Metadata
## -----
##
## In the Holium Framework, optional metadata schemas include links to the kinds they refer to, and not the other way
## around. This design choice ensures that core objects' content identifiers (CIDs) do not depend on more volatile
## metadata.
## TransformationMetadata_v0 adds metadata to some transformation bytecode or some data source (the `transformation`
## field would be null in the latter case).
type TransformationMetadata_v0 struct {
    transformation  nullable    &TransformationBytecode
    metadata                    {String:String}
} representation tuple

Key insights

  • The protocol should allow schema-free metadata so that any user/application can add/remove their own metadata fields.
  • Some levels of standardization of metadata schemas should be enabled, so that common keys (for all metadata, or in some sectors, etc) are used when possible.
  • A common structure should be used for transformation metadata and data source metadata if possible.

Why transformation handles should be indices, not strings

Many advantages for integer indices instead of string handles to identify a transformation inside a compiled package bytecode.

  • every thing that is prone to error and that can be removed from the protocol should be removed from the protocol
  • a layered architecture allows:
    • to edit/fix one part without touching the other
    • for easy & efficient internationalization of metadata
  • the protocol should be kept as minimal as possible
  • there is a distinction between what is necessary to the machine (the protocol) and what is useful to the human user (metadata, which may include descriptive strings)
  • everything that helps to run the protocol with not documentation is good to have

Recommendation : use the [0..n[ list of indices as handles to identify, at the protcol level, each transformation in a compiled package bytecode.

What should and shouldn't contain the metadata object stored on IPFS/IPLD ?

Most elements shared in the Context section are still true and ensure a coherent higher architecture.

Thus, once stored on IPLD, it still seems interesting for the metadata object to be made of 3 parts :

  1. the holium IPLD version header
  2. an IPLD link to the bytecode
  3. metadata

However, to really meets expected insights shared earlier, a {String : String} metadata field is not efficient nor sufficient, and could be improved to enable recursive values.

To meet expected insights, what we need is broadly:

  1. valid CBORed JSON objects
    • this allows schema-free metadata
  2. and schemas for them
    • this enables standardization of metadata schemas

Why not including links to the bytecode inside the metadata field

In fact, it comes very handy two state a clear distinction between two worlds:

  1. the world of pipelines, with 'JSON' and Wasm bytecode
  2. the world of identifiable resources, with IPLD, CIDs and CBOR.

The strenght of this distinction is verified in its ability to use the same structure for the metadata field, either to describe transformation metadata or data source metadata (same structure, one linked to bytecode in the world of identifiable resources, the other not).

Why using CBOR to store the 3 fields (including the metadata field) and not JSON

  • The previous distinction stands true : at the applicative layer, the metadata can still really be 'JSON', validating JSON schemas, while taking a CBOR format when stored in IPLD.

  • All JSON can be CBORed

    The JSON generic data model (implicit in [RFC8259]) is a subset of
    the generic data model of CBOR.

  • Binary representation is more efficient for storage and bandwidth consumption

  • Requires only one IPLD driver to develop / maintain, and only one driver throughout the framework

Should JSON Schema be used to describe metadata, or CDDL, or BSON schema,… ?

The metadata field only holds informative data, not any data formally required by the protocol. This question should thus be solved once placed in the human, transformation world, not in the IPLD world.

In this context, JSON Schema, way more popular than other options, looks like a good candidate. But when we here choose a format, we here make a very important choice which impacts (or is imposed by) the type of data we may store within the Holium Framework.

  • should any distinction be made between different types of numbers, like this is the case in CDDL or BSON schema, but not in JSON schema ?
  • should we handle a type dedicated to binary data like this is the case in CDDL or BSON schema, but not in JSON schema ?

Only answers to these questions may guide our design.

Sub-rationale : why having a type system that includes bin data ?

  • because a lot of languages (most of them) that would be used to develop transformations have a way to represent arrays of btyes, and (de)ser from/to them can and should (it would benefit the protoc) be facilitated. There is no reason to represent them as base64 strings if it is possible to store them more efficiently.
  • there is a need for bin data than does not when specifying the JSON format

Sub-rationale : why not other more precise number types, like for BSON or CDDL ?

  • we try to unify representation (the shortest deterministic representation) of values, not to impose a schema on data based on transformation signatures.
  • we will to enable standard representation across languages of development of transformations, not to have the most efficient way of storing with-schema data

TL;DR

My recommendations are thus the following :

  • keep a 3-parted (IPLD Holium object identifier header // optional link to bytecode // metadata) CBOR representation on IPLD
  • move away from the {String : String} format for metadata to enable any valid CBORed 'JSON' value
  • standardization of metadata formats is just a standardization of 'JSON' schemas
  • when we talked about 'JSON' schema, we in fact need something in the way between JSON schema (too restricted) and BSON schema (with too many types) ; JSON schema is a good base specification that should and could easily be lightly extended to meet our precise requirements.

@tchataigner what do you think of these rationals and final recommendations ?

If you agree, in particular with the fact that standardizing formats of metadata only means standardizing 'JSON' schema, then our next step would logically be to formalize a first useful schema integrating the elements you shared in your last comment.

@PhilippeMts
Copy link
Contributor

PhilippeMts commented Aug 31, 2021

This comment tries to provide a follow-up to last pending question in the context described in the last comment.

Main interrogation

I feel like we may not be aligned on what kind of objects we are dealing with:

  • JSON objects
  • or JSON Schemas
  • or meta schema (ie a JSON schema validating JSON schemas)

From my point of view, most of a metadata object should be a simple JSON object. When we want to document a transformation or data source with names, long documentation, version numbers, we only need JSON objects. And to standardize and document these objects, we need JSON schemas.

One part of this metadata could benefit from being a bit more than just a JSON object : transformation signatures. Indeed, we can envision a will to use JSON schema keys like minItems and maxItems to describe some dynamic-length parameter arrays. Indeed we may also be willing to validate an input data, for instance, against the input part of a transformation signature. In such cases, what we need is thus for transformation signatures to be JSON schemas, and to define that we need a meta schema.

However, we are not sure yet of the complexity we would like to implement in transformation signatures, and in any case, JSON schemas are still valid JSON objects. We can thus start with only the first part (JSON objects validated by JSON schemas) and delay the implementation of the second part (transformation signatures as JSON schemas validated by meta schemas). This is my recommendation.

I would recommend to:

  • only define metadata as JSON objects, and standardization of metadata formats as an effort to standardize JSON schemas
  • open a ticket to tackle the design and implementation of transformation signatures as 'JSON' schemas validating meta schemas

First intuition of a metadata schema

In that context, here is a first intuition of a JSON schema for these transformation / data source metadata.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://holium.org/holium-transformation-metadata.schema.json",
  "title": "Holium Transformation Metadata",
  "description": "Metadata complementary to some transformation bytecode or data source in the Holium Framework.",
  "type": "object",
  "properties": {
    "author": {
      "type": "string",
      "description": "The author of the package."
    },
    "name": {
      "type": "string",
      "description": "The name of the package."
    },
    "version": {
      "type": "string",
      "description": "The version of the package. Semantic versioning should be preferred."
    },
    "transformations": {
      "type": "array",
      "description": "Ordered list of all transformations included in the package.",
      "items": {
        "$ref": "#/$defs/transformation"
      }
    }
  },
  "$defs": {
    "transformation": {
      "type": "object",
      "description": "Metadata related to a transformation.",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of the transformation."
        },
        "description": {
          "type": "string",
          "description": "This description documenting the use of the transformation."
        },
        "deprecated": {
          "type": "boolean",
          "description": "Specifies that the transformation has been deprecated."
        },
        "inputs": {
          "type": "array",
          "description": "Input part of a transformation signature.",
          "items": {
            "$ref": "#/$defs/parameter"
          }
        },
        "outputs": {
          "type": "array",
          "description": "Output part of a transformation signature.",
          "items": {
            "$ref": "#/$defs/parameter"
          }
        }
      }
    },
    "parameter": {
      "type": "object",
      "description": "Describes an input or output parameter.",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of the parameter."
        },
        "description": {
          "type": "string",
          "description": "A long description of the parameter"
        },
        "holiumType": {
          "description": "The type of the parameter.",
          "oneOf": [
            {
              "$ref": "#/$defs/scalarParameterType"
            },
            {
              "$ref": "#/$defs/recursiveParameterType"
            }
          ]
        }
      }
    },
    "scalarParameterType": {
      "type": "string",
      "enum": [
        "null",
        "boolean",
        "number",
        "string",
        "binData"
      ]
    },
    "recursiveParameterType": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/parameter"
      }
    }
  }
}

NB : deprecated could be associated to other lifecycle elements (planned deprecation, experimental,…). Still it seems useful to ship an option like deprecated at quite an early stage of development of the Holium Framework.

Example

Using your examples, and omitting the substract transformation, the following metadata would be considered valid :

{
  "$schema": "https://holium.org/holium-transformation-metadata.schema.json",
  "name": "mylib",
  "version": "1.0.0",
  "transformations": [
    {
      "name": "add",
      "description": "Description of the add transformation",
      "inputs": [
        {
          "name": "values",
          "description": "Input payload for the add transformation",
          "holiumType": [
            {
              "name": "a",
              "description": "First value",
              "holiumType": "number"
            },
            {
              "name": "b",
              "description": "Second value",
              "holiumType": "number"
            }
          ]
        }
      ],
      "outputs": [
        {
          "description": "Output payload for the subtract transformation",
          "holiumType": "number"
        }
      ]
    }
  ]
}

If the ticket about signatures as JSON schemas was implemented, we could also add a $defs section and reference to it to prevent repeating details about the Value structure.

Other comments on your suggestions

Comment #0

All structure exported from the code should be defined in root &defs.

I don't think we need SHOULD keywords in this part of our spec, in a broad sense. I think we are just shipping a first JSON schema to provide the community with and to build upon.

One may or may not use $defs in his metadata. As long as it validates the schema, it should be considered as a proper metadata value, in my opinion.

Comment #1
The transformation object would have name composed as <lib_name>::<function_name>.

I don't quite get the reasoning behind this assertion.

To me, names should be free fields, with one name for the package, and one name for transformations, all unformatted to start with.

Comment #2

All values would be represented inside a tuple

Not sure what tuple you are talking about as I can't find one in your example.

@tchataigner
Copy link
Contributor Author

tchataigner commented Sep 1, 2021

Thanks for the nice answers Philippe ! I think we will be able to finalize these specifications

1rst post


Why transformation handles should be indices, not strings

Recommendation : use the [0..n[ list of indices as handles to identify, at the protcol level, each transformation in a compiled package bytecode.

Ok, that sounds good to me. I wanted to do so w/ my first proposition but by reading what you had written I thaught you were against it.

What should and shouldn't contain the metadata object stored on IPFS/IPLD ?

👍

Why not including links to the bytecode inside the metadata field

👍

Why using CBOR to store the 3 fields (including the metadata field) and not JSON

👍

Should JSON Schema be used to describe metadata, or CDDL, or BSON schema,… ?

The metadata field only holds informative data, not any data formally required by the protocol.

I do not get the affirmation. When you will be connecting two transformations how is the protocol supposed to know that the connection is a good one ? Run the pipeline and check if it fails ? Isn't it rough around the edges to do so ?

Sub-rationale : why having a type system that includes bin data ?

I guess it's alright to add a type as it is allowed by the specifications

Sub-rationale : why not other more precise number types, like for BSON or CDDL ?

This I agree on the fact that we do not need it.


2nd post


First intuition of a metadata schema

In that context, here is a first intuition of a JSON schema for these transformation / data source metadata.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://holium.org/holium-transformation-metadata.schema.json",
  "title": "Holium Transformation Metadata",
  "description": "Metadata complementary to some transformation bytecode or data source in the Holium Framework.",
  "type": "object",
  "properties": {
    "author": {
      "type": "string",
      "description": "The author of the package."
    },
    "name": {
      "type": "string",
      "description": "The name of the package."
    },
    "version": {
      "type": "string",
      "description": "The version of the package. Semantic versioning should be preferred."
    },
    "transformations": {
      "type": "array",
      "description": "Ordered list of all transformations included in the package.",
      "items": {
        "$ref": "#/$defs/transformation"
      }
    }
  },
  "$defs": {
    "transformation": {
      "type": "object",
      "description": "Metadata related to a transformation.",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of the transformation."
        },
        "description": {
          "type": "string",
          "description": "This description documenting the use of the transformation."
        },
        "deprecated": {
          "type": "boolean",
          "description": "Specifies that the transformation has been deprecated."
        },
        "inputs": {
          "type": "array",
          "description": "Input part of a transformation signature.",
          "items": {
            "$ref": "#/$defs/parameter"
          }
        },
        "outputs": {
          "type": "array",
          "description": "Output part of a transformation signature.",
          "items": {
            "$ref": "#/$defs/parameter"
          }
        }
      }
    },
    "parameter": {
      "type": "object",
      "description": "Describes an input or output parameter.",
      "properties": {
        "name": {
          "type": "string",
          "description": "The name of the parameter."
        },
        "description": {
          "type": "string",
          "description": "A long description of the parameter"
        },
        "holiumType": {
          "description": "The type of the parameter.",
          "oneOf": [
            {
              "$ref": "#/$defs/scalarParameterType"
            },
            {
              "$ref": "#/$defs/recursiveParameterType"
            }
          ]
        }
      }
    },
    "scalarParameterType": {
      "type": "string",
      "enum": [
        "null",
        "boolean",
        "number",
        "string",
        "binData"
      ]
    },
    "recursiveParameterType": {
      "type": "array",
      "items": {
        "$ref": "#/$defs/parameter"
      }
    }
  }
}

NB : deprecated could be associated to other lifecycle elements (planned deprecation, experimental,…). Still it seems useful to ship an option like deprecated at quite an early stage of development of the Holium Framework.

Example

Using your examples, and omitting the substract transformation, the following metadata would be considered valid :

{
  "$schema": "https://holium.org/holium-transformation-metadata.schema.json",
  "name": "mylib",
  "version": "1.0.0",
  "transformations": [
    {
      "name": "add",
      "description": "Description of the add transformation",
      "inputs": [
        {
          "name": "values",
          "description": "Input payload for the add transformation",
          "holiumType": [
            {
              "name": "a",
              "description": "First value",
              "holiumType": "number"
            },
            {
              "name": "b",
              "description": "Second value",
              "holiumType": "number"
            }
          ]
        }
      ],
      "outputs": [
        {
          "description": "Output payload for the subtract transformation",
          "holiumType": "number"
        }
      ]
    }
  ]
}

If the ticket about signatures as JSON schemas was implemented, we could also add a $defs section and reference to it to prevent repeating details about the Value structure.

Other comments on your suggestions

Comment #0

One may or may not use $defs in his metadata. As long as it validates the schema, it should be considered as a proper metadata value, in my opinion.

Alright, sounds good to me.

Comment #1

I don't quite get the reasoning behind this assertion.

Reasoning was based on one of your comment. But if it is alright for you we will use package and transformations names as is.

Comment #2

All values would be represented inside a tuple

In Json schema this kind of specifications:

{
    "type": "array",
    "description": "Input payload for the add transformation",
    "prefixItems": [
        { "$ref": "#/$defs/values" }
     ]
}

can be considered a tuple. It can be found at inputs and outputs in my proposal.


Conclusion

I think we can take what you propose and start with a base implementation. I will get to that once I've handled all feedback from you on this PR

@PhilippeMts
Copy link
Contributor

Thanks Thomas. Here are some of the (hopefully) last answers.

1rst post

Should JSON Schema be used to describe metadata, or CDDL, or BSON schema,… ?

Sub-rationale : why having a type system that includes bin data ?

I guess it's alright to add a type as it is allowed by the specifications

To be sure that I was clear enough, I support the integration of such a type in our system. So we agree.

(I'm not sure the section of the specifications you link to really relate to this problematic though, as the format keyword can be used to assess more restrictive rules, not more permissive ones. But the important part is that we agree on the need for a binData type.)


Conclusion

I think we can take what you propose and start with a base implementation. I will get to that once I've handled all feedback from you on this PR

Neat! 📐

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants