Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefixes for extensions #36

Open
m-mohr opened this issue Jun 24, 2024 · 12 comments · May be fixed by #39
Open

Prefixes for extensions #36

m-mohr opened this issue Jun 24, 2024 · 12 comments · May be fixed by #39
Assignees
Labels
help wanted Extra attention is needed

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Jun 24, 2024

From the fiboa Slack:

@cholmes wrote:

I'm working on a Planet converter (got to a PR that needs some help) and started on an extension for Planet's extra values, to try to do it a bit more 'proper'. I did them as planet:qa and planet:mcid, but I'm questioning if we should bring our stac convention of the : with a prefix over or not. It doesn't work great in SQL (you have to quote it), and I feel like it's a special character in other places too. I'm sorta contemplating just not doing (as much?) prefixing. Like it'll be easier to have Planet's geopackage (that's not on fiboa) have an equivalent geoparquet/fiboa that has mostly the same fields, especially for the ones where we don't have a 'standard'. And then also wondering about just doing like tillage_occurred instead of tillage:occurred for places where a prefix does make some sense.

@m-mohr wrote:

The : is indeed not ideal, but on the other hand _ is too generic and may lead to conflicts with existing fields much more often. Also, the _ might be ambiguous... crop_type_identifier_code - What's the prefix? crop? crop_type? What if someone defines an extension crop with a field type_identifier and an extension crop_type with a field identifier?

@andyjenkinson wrote:

I think the colon has a clear place as a URI prefix. However one thing we could do is to actually make the spec JSON-LD. That means the schema properties are full URIs, and an extension is just a URI prefix (namespace) for the properties within that extension. Then you use a JSON-LD context object (which can also be shifted to an external URL) that maps property keys to a full URI. That makes your JSON look like normal JSON (and can even be a way of keeping all the proprietary property keys you already have) but automatically convertible to the common standard. Basically you could largely retrofit FIBOA extension compatibility in an almost invisible way. By the way it also means you can literally reuse concepts that already exist in other ontologies, like DCAT for datasets, PROV-O, agri domain ontologies etc.
To me, I think a great target would to aim that an implementation can hit FIBOA, JSON-LD and OGC Features API compatibility all at the same time. The actual spec for an extension would be expressed in JSON-LD, but all of the examples would look like plain old JSON with one ‘@context’ property that contains the schema mapping.

@m-mohr m-mohr added this to the 0.3.0 milestone Jun 24, 2024
@m-mohr m-mohr added this to fiboa Jun 24, 2024
@github-project-automation github-project-automation bot moved this to Backlog in fiboa Jun 24, 2024
@m-mohr m-mohr self-assigned this Jun 24, 2024
@m-mohr m-mohr moved this from Backlog to Todo in fiboa Jun 24, 2024
@cholmes
Copy link
Contributor

cholmes commented Jun 24, 2024

Also, the _ might be ambiguous... crop_type_identifier_code - What's the prefix? crop? crop_type? What if someone defines an extension crop with a field type_identifier and an extension crop_type with a field identifier?

Yeah, I think the essence of the idea to me is more to stop worrying about prefixes. Like letting Planet do 'mcid' instead of 'planet:mcid' or 'planet_mcid'. This may just be a bad idea - but I'm just sorta wondering what we've really gained by having all the prefixes. If you have a data model and want to validate it with a few different extensions then you'll be making choices about what to validate it with. The chances of overlap seem small, and if there's a set of 'known' extensions then people introducing new extensions that might need to be compatible can tweak their names. It would essentially just be a 'looser' approach to the ecosystem - here's a set of attributes that mean these things, but they're not trying to define things for all times.

So like Planet would have 'mcid' defined at https://planet.github.io/fiboa/planet-fb-extension/v0.1.0/schema.yaml, some other org can have 'mcid' (and maybe it means something different...) defined at https://company.com/fiboa/company-extension/v0.1.0/schema.yaml, but if they wanted to make the ecosystem more compatible then they could just establish a new community extension at https://fiboa.org/mcid-extension/v1.0.0/schema.yaml, but the field name would stay 'mcid' - it'd just use the community-build JSON schema to validate.

I think the colon has a clear place as a URI prefix. However one thing we could do is to actually make the spec JSON-LD. That means the schema properties are full URIs, and an extension is just a URI prefix (namespace) for the properties within that extension.

And yeah, I think this is the other extreme of the approach, attempting to have the prefix have 'real' meaning. The original idea of the prefix in STAC was inpired by JSON-LD, with the intent to try to do just what you're saying @andyjenkinson - tie the prefixes to full URI's with well-known meaning in JSON-LD. I think one thing that threw it off is that the 'geo' representation in JSON-LD wasn't great if I remember right, and very few tools had support for it. But it's probably worth taking another run at figuring out if we could fully support it - I agree that hitting 'FIBOA, JSON-LD and OGC Features API compatibility all at the same time' would be really great.

The actual spec for an extension would be expressed in JSON-LD, but all of the examples would look like plain old JSON with one ‘@context

But for extensions wouldn't you need to one 'context' per extension? Unless you put all the extensions into a single 'fiboa' context? Like if you don't put them all in a single context then there'd still be prefixes for all the ones that aren't the primary / default context?

It also seems like when you map from JSON-LD to GeoParquet you'd need to bring the prefixes back in consistently, or else try to fully represent the URI's in Parquet.

@andyjenkinson
Copy link
Contributor

The context is a property included at the root of each payload, the value of which is either a context object or a URL of one, rather like a hypermedia link. So it can be unique to each implementation/dataset, and can include terms from any number of extensions. The extensions would give example contexts that correspond to example payloads, but when you implement FIBOA you can either:

  • merge the contexts from the extensions you want verbatim into your own context file, and then name all your properties like in the examples
  • rename some properties in your context to tailor your format to the property names you want (particularly useful if you have an existing implementation you don’t want to change)

Bear in mind JSON-LD contexts can remap any property to a URI, not only expand a prefix. This would allow Planet to have whatever terms it wanted in its payloads, they wouldn't even have to match the name of the property in the FIBOA spec and don't have to contain any prefixes; the mapping to FIBOA would be done entirely by the context file. All the machine readable stuff like validation, conversion etc can use the processed JSON-LD representation, but the files look completely normal to users and 'FIBOA unaware' software.

Regarding parquet, yes either you define them as full URIs, or you would carry forward the JSON-LD context mappings into the headers (and vice versa of course) so that they look ‘normal’ in any existing software that processes parquet data. Basically the headers would have to say "these are the property names, and these are their equivalent URIs.

The one thing I’m unsure about is the geo stuff. You may know there is a GeoJSON-LD context but I have not looked at it in detail. I would not want to abandon GeoJSON for some other random representation of geometry in JSON-LD, it’s about making a standard GeoJSON payload processable as JSON-LD. In fact that context is a good example explaining what I mean above about using the context object to essentially make JSON-LD look like completely normal JSON unchanged from its original format. All it is is a context, which maps all the original GeoJSON schema items to URIs exactly as they are.

Personally I am not a fan of going the other end of the scale and just allowing a free for all on names. I get namespaces are annoying but I can see clashes happening especially for terms like "crop", and in particular it's useful to distinguish 'uncontrolled' terms - personally I'm not sure it's necessary to make a Planet extension as by definition there won't be any terms in common with anyone else. So long as FIBOA allows additional properties just document your schema and anything proprietary doesn't need a prefix. Then focus on trying to standardise things that seem common in a topic- not vendor-specific extension. Unless you adopt JSON-LD or something like it, someone's going to have to change their schema anyway.

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2024

@m-mohr - why didn't you use a prefix on flik-extension?

@m-mohr
Copy link
Contributor Author

m-mohr commented Jun 28, 2024

Good question, it was my first one and more an example. The fields in the original had no prefix, I guess I either forgot it or thought it's simpler, can't remember 😅

@cholmes
Copy link
Contributor

cholmes commented Jun 28, 2024

Cool. Yeah, both reasons to me point a bit to how it could be nicer to not have to think about them.

Curious what you think about JSON-LD and contexts, and if you'd be up to dig into it a bit. Like if there is a way to enable us to pass through the 'schema' information without including the prefix, like all the way through geoparquet. I'm a bit less sure how much GeoParquet metadata should really handle - I wonder if there's any other examples of JSON-LD -> Parquet. And if it'll work when pulling a few different 'contexts' into one.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jun 28, 2024

I'm travelling next week, but I can dig into it afterwards, but it will likely take a three weeks or so. It doesn't seem to solve the colon / quote issue though. I'm not sure whether we can solve that if the allowed set of characters for SQL names is A-Z, 0-9 and _.

I worry a bit that without a prefixes we'll end up with various extension that use crop_id and maybe even datasets that use no extension and have crop_id. If you want to merge them, what do you do with the fields names? You can't do it because the field is differently defined.

In STAC we see that many clients actually don't check stac_extensions array and just use the fields because they can be sure there are usually no conflicts. Clients would need to be developed with more care, they may even need to read all schemas.

So I tend towards a prefix at least for "common"/sebatable field names. If you have names like "flik" that are unlikely to conflict I could see that we allow without prefixes. We also sometimes do that in STAC.

JSON-LD is an open question for now.

@andyjenkinson
Copy link
Contributor

andyjenkinson commented Jun 29, 2024

It could solve the colon issue because there won’t be any colons in the payload any more, only in the context file which only ever needs to be read when doing things like validating or converting. And they’ll be in values, not keys. You can put whatever property names you like in your implementation, they don’t have to be named the same as the ‘standard’ ones, the context file provides a mapping. So the GeoJSON file literally looks like any normal JSON with one extra property ‘@context’ that links to the context. Everything else can be your native implementation, and if you want that to be a translation of a SQL schema, have at it. Think of the context as a set of instructions of how to convert the GeoJSON Feature/FeatureCollection to a FIBOA JSON-LD schema. It’s pretty much a glorified ‘find and replace’.

So for example in your JSON you could have:

{
 '@context': 'https://planet.com/path/to/my/context.json',
 'id': 'ABCD1334',
 'date': '2024-06-29'
}

And after mapping if would look something like:

{
 'http://purl.org/dc/terms/identifier': 'ABCD1234',
 'https://fiboa.org/schema/core#determination_date': '2024-06-29'
}

Here, I use an example where the FIBOA core specification would define the 'id' property, as a mapping to the existing Dublin core vocabulary term 'identifier', as well as its own unique properties.

Meanwhile the FIBOA examples (and the format you'd use if you were creating a file from scratch) could look like:

{
 '@context': 'https://fiboa.org/schema/context.json',
 'id': 'ABCD1234',
 'determination_date': '2024-06-29',
 'myextension:image_resolution': '10m'
}

And that would map to an identical RDF graph as the Planet example:


{
 'http://purl.org/dc/terms/identifier': 'ABCD1234',
 'https://fiboa.org/schema/core#determination_date': '2024-06-29',
 'https://fiboa.org/schema/myextension#image_resolution': '10m',
}

I've simplified the structure of all these of course. I'm typing on my phone.

The standard context file would contain all the mappings from the core and extensions. Here the only reason for the colon is for the same reason we have it today: to allow independent development of extensions which might simultaneously use the same property name. But if this isn't important (eg each extension is allowed to basically claim a property name by using it first) it could be removed. Either way, each extension would have to provide a standard JSON-LD context that would translate the "nice plain JSON" to full messy URIs. The key to this is to hide as much as possible the mechanics of JSON-LD to make creating and maintaining extensions and making compatible features as easy as possible.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jul 15, 2024

Okay, I'm currently trying to wrap my head around this. I might have misunderstood parts of JSON-LD, please let me know if that's the case.

Note

Our extension mechanism doesn't require by any means a colon, it's a (pretty undocumented) best practice.

We are used to it through STAC and it's primarily used to easily distinguish the fields while not looking at the list of implemented extensions. All fields can be unambiguously identified through their name + extension URI, but many STAC readers (except for the validators) actually do not use the stac_extensions to actually verify which extension fields belong to. They simply use the field name as it is and assume that field names are unique. If someone would create a new extension with overlapping names of an existing well-known extension (e.g. an eo2 extension with eo:cloud_cover from 0 - 1), most STAC implementation would not differentiate it and assume it's a percentage from 0-100.

JSON-LD vs fiboa extensions

My current understanding of JSON LD and our current extension mechanism in fiboa is that they are very similar conceptually.
JSON-LD is a bit more advanced, but is also more complex to implement.
The fields @contexts and fiboa_extensions are pretty much the same, they define through URIs how certain fields are named and their corresponding semantics. Through that the fields can be identified unambiguously, if clients actually make use of the URIs.

This seems to be rarely the case (in STAC and fiboa) because it makes implementations more complex. While we can implement reference implementations that do that, user land implementations have proven through STAC that people take the simplest route and don't actually check against the provided extensions whether their assumptions about a field are actually true.

Let's say we have three extensions that all define a crop field, but one of the extensions is very dominant and popular. Tools can distinguish the fields if implemented properly. But people would usually just address it through the crop name and not always check whether it's actually the right extension. With crop you may figure it out due to unexpected values, but the cloud cover example above is not so obvious.

For JSON there is LD tooling that could probably mitigate this, but for GeoParquet there's not. The question is whether people would use such tooling or whether they just use their normal JSON or Parquet reader. In this case they won't read the URIs (neither context nor fiboa extensions) and then you run into potential issues. For Parquet, you could prefix all fields by URI, e.g. https://planet.con/ext/crop.yml:crop), but that looks even worse than planet:crop.

Prefixes

I think JSON-LD (or fiboa extensions) without prefixes looks nice, but has quite a number of hurdles.
While quoting fields with prefixes is an annoyance, it's likely nothing that actually affects the analysis and people are used to this, I think.
On the other hand, the crop or cloud cover examples could actually negatively affect the analysis when conflicts are not properly resolved.
I feel like we should provide a framework where people can work with easily and that prevents error. As such I'd keep the prefixes for fields where conflicts are likely to happen. Having that said, you can actually derive from that. I don't expect conflicts for flik and as such having no extension for the sake of keeping the files as close to the source as possible is an option.

Merging files

The "merge" issue that I spoke about in another comment before, can be resolved by implementing the merge tool in a way that if conflicts arise, the full URI will be used as a prefix. So for example the following two files would be merged as follows;:

  • File 1 (columns: id, geometry, crop [from https://example.com/crop.yml])
  • File 2 (columns: id, geometry crop [from https://planet.con/ext/crop.yml])

Merged file has columns: id, geometry, https://example.com/crop.yml:crop, https://planet.con/ext/crop.yml:crop)

Renaming (for compatibility)

The "rename" machanism in JSON-LD looks good on first hand as it allows to use fiboa without changing the actual field names in an existing implementation, e.g. OGC API - Features. Users need good clients or a good understanding of LD to make the connection though.

What about the values? If you need to change the values, e.g. from area in meters to area in hectares, how do you express in JSON-LD that you need to divide by a certain value? I couldn't find anything about it, so I assume that's not a thing. So the rename covers just a small part of being compatible with an existing response and as such I feel like it's not worth the hassle.

Compatibility

The compatibility between fiboa, JSON-LD and OGC API - Features would indeed be nice.
I don't quite see issues with OGC API compatibility at this point though. It might conflict to specific implementations and content schemes that providers offer, but fiboa is not incompatible to OGC API - Features afaik.

The field renaming could solve that partially, but I think in many cases it would just cover parts of it (see the area example above). So you may still not be compatible with specific implementations and content schemes although you renamed everything nicely.

So it leaves us with JSON-LD vs fiboa. I'm not quite decided on this yet. I generally find it hard to navigate the JSON-LD vocabularies. Like how can I verify whether there is an field boundary related vocabulary? There's GeoJSON-LD, but is there more? I'm not sure whether compatibility with JSON-LD would help us a lot. Is anyone asking for it? The fiboa extension mechanism is very similar. We'd need to define JSON-LD vocabularies in addition to the relatively simple fiboa Schemas, not sure whether they could be generated from the fiboa Schemas.

We could probably go for JSON-LD if we think it's wort it, but currently, I'm not sure whether it's worth the effort. It might be worth the effort if we do more then field boundaries in the future, but I feel like field boundaries wouldn't benefit a lot from it unless we find existing LD vocabularies for it.

@andyjenkinson
Copy link
Contributor

I think it's advisable to some extent to try to "unlearn" some STAC idioms, just because it's something you worked on before doesn't mean that anyone expects it to work in the same way. People will intuitively understand that there is a need to define a group of unique terms for the properties of a GeoJSON feature/parquet object, and they will just be reading the docs explaining the terms constrained by the schema and what they mean exactly like it works in all OGC specs.

Having said all that, it's for sure a useful observation from your experience of STAC that developers will tend to follow the path of least resistance and make assumptions wherever possible, and the examples you give of that are good ones. I think it would be a mistake to not make clear that uniqueness is a necessity between extensions, so would not be in favour of a solution that is neither JSON-LD nor uses a prefix, but the beauty of JSON-LD is that you can both make them unique whilst not looking strange to humans who reading plain JSON.

For that reason of "the simple thing should just work", if you did it as JSON-LD (which I don't see as "vs" FIBOA btw, just the way FIBOA would be implemented) then I think the key is to get across that all of the terms defined in a FIBOA schema - whether core or extension - are unequivocally URIs first and foremost and the JSON must be valid GeoJSON-LD. Those URIs are strings just like eo:cloud_cover and cloud_cover are strings and developers can treat them as such. But when all the JSON examples and others' implementations shorten or change those URIs to something else and add a @context property it will be plainly obvious that all machine-based interpretation of a boundary GeoJson object must first use the context to turn them into the 'canonical' URIs listed in the specification - the developers don't have to know how that works at all, and any FIBOA libraries or validators would do it automatically as a matter of course using the plethora of standard JSON-LD processing libraries. The validator would always be validating the URIs, it wouldn't care about what the unprocessed JSON looked like but would fail if the payload didn't provide a valid context that produced correct FIBOA schema URIs. There is a bit more to understand from data publishers (actually, mainly extension developers choosing those URIs), but typically there are fewer data publishers than there are consumers so I don't think this is much of a problem - it will be much easier to do in practice with a couple of examples done. I expect that 9 times out of 10 developers will simply copy the terms used in the examples; the only case where you would need to change the context in any way is when trying to make an existing JSON structure compatible with FIBOA without making a breaking change to an existing property name. However, some of us do have that problem and I don't see how else you can tackle it.

I should probably point out also that eo:cloud_cover is already a valid URI (a URN is a type of URI) so this really isn't as strange as it may appear. A URI simply requires a namespace (prefix) and identifier. It's just that by convention, JSON-LD encourages reuse of existing terms and for those to be resolvable, and hence be not only URIs but URLs. I agree that finding suitable existing vocabularies for the terms we might want to create in extensions can be difficult, but I don't think that is a function of the JSON-LD technology but of the core problem of standardising semantics. We should be looking to adopt those vocabularies anyway (we should not be inventing our own crop vocabulary if one exists already!), it's just that using JSON-LD makes that more visible and concrete a requirement - you can adopt those terms directly in a FIBOA schema, rather than just copying them. To me this is a pure pro, not a con.

I think geoparquet is certainly a different story, there is no concept of converting between short property names to URIs in parquet like there is for JSON (ie JSON-LD). In that case what I would probably do is just always use the native URIs for geoparquet files - they're just unique strings after all, and as I mentioned above these would be the 'normative' definitions that the schema is constraining. I think this is workable especially as the use cases for geoparquet are typically more analytical than visual - nobody is reading geoparquet files like humans do with JSON and it's also a more specialised technical community than JSON is touching upon, so the requirements of mapping from URIs to friendly quasi-readable property names just isn't as important, one may argue? I'm also not aware of anyone actively publishing geoparquet field boundaries today either, so the backwards compatibility requirement does not seem to exist there like it does for JSON.

@m-mohr
Copy link
Contributor Author

m-mohr commented Jul 15, 2024

Fair points, thank you. I have some additional questions and comments.

In that case what I would probably do is just always use the native URIs for geoparquet files

Not sure I follow: You mean we should use column names such as https://planet.con/ext/crop.yml:crop?

[T]he beauty of JSON-LD is that you can both make them unique whilst not looking strange to humans who reading plain JSON.
[...]
I think geoparquet is certainly a different story, there is no concept of converting between short property names to URIs in parquet like there is for JSON (ie JSON-LD).

This brings up the question which encoding is the priority. Do we optimize for JSON or for tabular formats such as GeoParquet or flatgeobuf? How the work is going right now, it seems tabular is the priority and there we could end up with very weird behavior.
If I load the geoparquet into duckdb and want write a query, I end up with something like:

  • SELECT 'https://planet.con/ext/crop.yml:crop' FROM ...

This looks really annoying and is something I'd want to avoid for sure.

nobody is reading geoparquet files like humans do with JSON and it's also a more specialised technical community than JSON is touching upon, so the requirements of mapping from URIs to friendly quasi-readable property names just isn't as important, one may argue?

I think I disagree with regards to the more specialised technical community. The reading is true, but then you are writing these column names in SQL for example, see the example above.

I'm also not aware of anyone actively publishing geoparquet field boundaries today either, so the backwards compatibility requirement does not seem to exist there like it does for JSON.

True, but then on the other hand for some tooling it's irrelevant from which file format they read. But if the structure of the file changes it makes a difference. On the other hand, we also don't really consider that right now. The structure also often changes with the current extension approach.

if you did it as JSON-LD (which I don't see as "vs" FIBOA btw, just the way FIBOA would be implemented)

Me neither by the way, that was poorly phrased.

then I think the key is to get across that all of the terms defined in a FIBOA schema - whether core or extension - are unequivocally URIs first and foremost

That's also the case with the current extension approach!

the JSON must be valid GeoJSON-LD. Those URIs are strings just like eo:cloud_cover and cloud_cover are strings and developers can treat them as such. But when all the JSON examples and others' implementations shorten or change those URIs to something else and add a @context property

But if they don't resolve the context, then the names are different and the interoperability gets lost?! Isn't it annoying if people see different names everywhere and first need to verify via context what it is?

it will be plainly obvious that all machine-based interpretation of a boundary GeoJson object must first use the context to turn them into the 'canonical' URIs listed in the specification - the developers don't have to know how that works at all

Doesn't that need a very specialised technical community? Why doesn't the developer need to know how it works?

the only case where you would need to change the context in any way is when trying to make an existing JSON structure compatible with FIBOA without making a breaking change to an existing property name. However, some of us do have that problem and I don't see how else you can tackle it.

Yes, I see how the property name thing could be solved, but that seems to be only a small part of the game.
If your area property is in square meters and fiboa uses hectares, how would that be solved?
If that is not solved, then I feel like it may be breaking in some/many cases anyway?
Or is the idea here "just" to avoid conflicts because you can rename additional fiboa properties to something else because they may conflict with existing properties?

I should probably point out also that eo:cloud_cover is already a valid URI (a URN is a type of URI)

Yeah, I'm aware from OGC work and URN's are from hell...

We should be looking to adopt those vocabularies anyway (we should not be inventing our own crop vocabulary if one exists already!)

Agreed, but that's independant of whether we adopt JSON-LD or not. In any case we should reuse existing vocabularies and not invent our own. Unfortunately, I didn't find a lot of related vocabularies or standards yet. I still have to look into ADAPT...

For crop classification, we already start by adopting HCAT through an extension, but there are so many out there and none is commonly used, it seems.

it's just that using JSON-LD makes that more visible and concrete a requirement

How does it do that? I feel like it's not very obvious. You could also always just define your own. I don't really see yet how JSON-LD encourages or requires reuse more than any other thing that we discussed or are using so far.

@m-mohr m-mohr moved this from Todo to Blocked in fiboa Aug 1, 2024
@m-mohr
Copy link
Contributor Author

m-mohr commented Aug 30, 2024

Results from the discussion yesterday:

With regards to extension prefixes, we keep it open. Although our extensions currently use the colon, we don't require it.
User could for example also use one or two underscores as a separator for the prefix.
Although in SQL field names with a colon need to be quoted, we didn't identify it as not a major annoyance. We'll further listen to user feedback and may change later depending on feedback.

@m-mohr m-mohr moved this from Blocked to Backlog in fiboa Aug 30, 2024
@m-mohr m-mohr removed this from the 0.3.0 milestone Aug 30, 2024
@m-mohr m-mohr added help wanted Extra attention is needed and removed discussion needed labels Aug 30, 2024
@andyjenkinson
Copy link
Contributor

Sorry I lost track of this issue and didn't address your previous questions @m-mohr.

Regarding geoparquet, what I was suggesting is that you do what JSON-LD does for JSON to convert strings into URIs: you simply define a mapping table (that's what the context object is, effectively). This means that in the data they are just normal strings, but if you really needed to convert to URIs (which only makes sense when you're doing something programmatic that depends on the data being Linked Data/RDF) then the converter pulls the 'context' property from the header and maps them. Essentially replicating what JSON-LD adds to JSON by doing the same thing for geoparquet.

However to be honest this whole topic is to me not a very useful one - by definition JSON-LD only makes sense for JSON and I see no reason to replicate it inside a parquet file anyway. It's not a format you'd ever expect to combine directly with non-spatial semantic data so I don't see much value, and even if you did you can still use the JSON-LD context from the spec.

all of the terms defined in a FIBOA schema - whether core or extension - are unequivocally URIs first and foremost

That's also the case with the current extension approach!

The FIBOA core schema doesn't contain URIs. for example 'id' is not a URI, it's just a string.

it's just that using JSON-LD makes [using external vocabularies] more visible and concrete a requirement

How does it do that?

Because JSON-LD schemas are already expressed in RDF like ontologies such as DCAT are. Instead of copy-pasting values from an external vocabulary under a different property name (effectively 'forking' it) and mentioning in the human-readable documentation that it's borrowed from another place, you simply link to it directly and now when you merge into a graph containing other data already mapped to that ontology or with software that understands those ontology terms they are natively integrated using exactly the same URI.

Perhaps to try to put a lid on this for now whilst providing a better summary for the future... basically if you were to implement JSON-LD as the schema engine for FIBOA JSON what I'm suggesting is you'd do it like this, preserving the benefits of the current implementation:

  1. The core schema and the examples all look as they do today
  2. Extensions can still have an extension prefix to avoid conflicts between extensions in 'human readable' JSON and in geoparquet if you want:
{
 'id': 'ABCD1234',
 'myextension:coolness_factor': 4
}

However this choice for FIBOA is entirely orthogonal - using JSON-LD guarantees globally unique terms anyway.
3. The core specification adds a JSON-LD context property, which extends from the GeoJSON-LD context to add the FIBOA-specific terms. This 'context' is used only by machines to map each property in the schema to a fully-qualified URL, and transforms all FIBOA JSON into valid RDF with all of the consequences implied there.
4. Every extension adds to the JSON-LD context definition. Most of the time (for extensions using a prefix already) this is as simple as one property:
{ 'myextension': 'https://fiboa.org/schema/myextension#' }
Which basically just means "any time you see the prefix 'myextension', you can replace it with this to turn it into a URL".
5. If an extension re-uses terms from other vocabularies, they can be used directly without having to 'fork' them:
{ 'myextension': 'https://fiboa.org/schema/myextension#', 'myextension:license': 'http://purl.org/dc/terms/license' }
6. When validating JSON, the validator uses the URL form (e.g. https://fiboa.org/schema/myextension#coolness_factor), not the plain string form (myextension:coolness_factor). By the way this makes the validation much more powerful, as you can use semantic reasoning like type hierarchies and again the tools already exist - you can validate it as RDF using any ontology you like, not only as FIBOA and GeoJSON
7. There is no effect on geoparquet - since it isn't JSON it doesn't have anything to do with JSON-LD and all the column names look like strings exactly as they did before
8. For anyone creating FIBOA datasets from scratch they don't have to know anything about JSON-LD, they just follow the documentation to create files with normal-looking JSON properties, with a '@context' property that references the standard context hosted on fiboa.org like all the examples do
9. However existing GeoJSON implementations are able to replace the 'default' context with a modified version, which allows them to keep their existing property names e.g. from their API, and be automatically compatible with FIBOA.
10. The CLI can also very easily convert this back to 'normal' FIBOA files with 'normal' property names, as well as convert to geoparquet.

Basically what you'd be doing is defining the FIBOA specification in such a way that FIBOA-compliant payloads are compatible with all of these at the same time:

  • FIBOA
  • GeoJSON
  • OGC Features API
  • JSON-LD
  • RDF
  • Dublin Core
  • DCAT

Bear in mind that the purpose for suggesting JSON-LD in the first place is to enable point 9, e.g. Chris wanting to use a Planet format that is Planet-first, instead of having to rename all their properties "planet:prop1", "planet:prop2" etc. This is the main issue with namespacing - FIBOA right now is taking a "FIBOA is the centre of everything" approach which works fine for its of its core academic community creating greenfield FIBOA-compliant files from scratch, but is problematic to existing data providers to whom FIBOA is more like a 'bridge' and they need to fit their existing data into it. What JSON-LD provides is a standardised well-adopted mechanism (i.e. not re-inventing the wheel) of adding an explicit semantic schema on top of 'normal' JSON that happens to also be compatible with RDF and existing RDF-described ontologies and vocabularies.

JSON-LD and RDF already exist as W3C specs with a huge arsenal of supporting code to validate semantics, inferencing, format conversion, data import etc whereas FIBOA is basically just adding proprietary rules about what certain strings mean on top of GeoJSON. Functionally for FIBOA it provides a way for e.g. Planet to make their datasets FIBOA-compliant by adding one 'context' property to the JSON, whilst appearing to a human as identical as it always was and how it is described in Planet's documentation. It also happens to mean that, at a stroke all FIBOA data will now also be valid RDF with zero effort from developers, meaning you can import it into a graph database with zero code, linking the geospatial world with all the other properties associated with the field that are not part of the boundary itself.

I myself don't particularly care if you do or don't do it, we probably can't support FIBOA natively anyway as its schema is too simplistic to accommodate the deduplication and merging of datasets we do, so it's only ever going to be an interchange format for us. If we do it, it will only ever be an 'additional optional format' or maybe some sort of lossy converter. I just find myself in the position of posting about it because I have used it before and understand its relevance outside of the (sometimes insular) geospatial community and I can see an opportunity to re-use a lot of stuff.

@m-mohr m-mohr linked a pull request Mar 10, 2025 that will close this issue
2 tasks
@m-mohr m-mohr moved this from Backlog to Review in fiboa Mar 12, 2025
@m-mohr m-mohr linked a pull request Mar 12, 2025 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: Review
Development

Successfully merging a pull request may close this issue.

3 participants