Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing "Links" harvested from DCAT sources #237

Open
rhodges opened this issue Feb 25, 2025 · 0 comments
Open

Exposing "Links" harvested from DCAT sources #237

rhodges opened this issue Feb 25, 2025 · 0 comments

Comments

@rhodges
Copy link

rhodges commented Feb 25, 2025

In GeoPortal Harvester v2.7.2 (and v2.6.5, both tested), the list of links for DCAT records (as served by Esri HUB, CKAN, Socrata, et al.) are not being populated in catalog's 'links' dropdown.

Southern California Coastal Water Research Project (SCCWRP) has an ESRI HUB instance. We harvest from it as a DCAT source using the URL https://dataportal.sccwrp.org/api/feed/dcat-us/1.1.json. That metadata holds a long list of useful links that seem similar to the list of links offered when harvesting other formats of metadata.

My research indicates that the 'distribution' section is the correct place to store these sorts of links in DCAT: https://www.w3.org/TR/vocab-dcat-2/#Class:Distribution

An example:
Here is a record called "Bight 18 Sediment Toxicity Summary Results" (https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results/about). When harvested, the source JSON includes the following:

{
       "@type": "dcat:Dataset",
       "identifier": "https://www.arcgis.com/home/item.html?id=c71310596dae42efa5d076f993bdbb37&sublayer=0",
       "landingPage": "https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results",
       "title": "Bight 18 Sediment Toxicity Summary Results",
        …
        "distribution": [
            {
                "@type": "dcat:Distribution",
                "title": "ArcGIS Hub Dataset",
                "format": "Web Page",
                "mediaType": "text/html",
                "accessURL": "https://dataportal.sccwrp.org/datasets/sccwrp::bight-18-sediment-toxicity-summary-results"
            },
            {
                "@type": "dcat:Distribution",
                "title": "ArcGIS GeoService",
                "format": "ArcGIS GeoServices REST API",
                "mediaType": "application/json",
                "accessURL": "https://gis.sccwrp.org/arcserver/rest/services/Bight2018ToxicitySummaryResults/FeatureServer//0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "CSV",
                "format": "CSV",
                "mediaType": "text/csv",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/csv?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "Shapefile",
                "format": "ZIP",
                "mediaType": "application/zip",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/shapefile?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "GeoJSON",
                "format": "GeoJSON",
                "mediaType": "application/vnd.geo+json",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/geojson?layers=0"
            },
            {
                "@type": "dcat:Distribution",
                "title": "KML",
                "format": "KML",
                "mediaType": "application/vnd.google-earth.kml+xml",
                "accessURL": "https://dataportal.sccwrp.org/api/download/v1/items/c71310596dae42efa5d076f993bdbb37/kml?layers=0"
            }
        ],
   …
},...

When harvested, only the following is presented in the Catalog UI:

Image

This ignores all of the useful links to "ArcGIS Hub Dataset", "ArcGIS GeoService", "CSV", "Shapefile", "GeoJSON", and "KML".

A user IS able to find these, only if they dig through the raw JSON provided by the "JSON" link above by navigating to to:

{
	…
	"_source": {
		…
		"_json": {
			…
			"distribution": [ {in here} ],
		}
	}
}

Examples of the same functionality working for other Metadata formats

ISO 19135/19115-2:

NCEI's Passive Accoustic Sanctsound data:

Image

The links are stored with some similar language as DCAT: "MD_Distribution" as opposed to "distribution":

<gmd:distributionInfo>
<gmd:MD_Distribution>
	<gmd:distributor>
		<gmd:MD_Distributor>
			<gmd:distributor{X}>
				<gmd:...>
					<gmd:onLine>
						<gmd:CI_OnlineResource>
							<gmd:linkage>
								<gmd:URL>
								    {LINK HERE}
								</gmd…>

FGDC:

Image

The metadata looks roughly like this:

<metadata ….>
	<idinfo>
		<citation>
			<citeinfo>
				…
				<onlink>https://www.boem.gov/atl-5yr-2019-2024.zip</onlink>
				<onlink>https://www.boem.gov/Five-Year-Program/</onlink>
				<onlink>https://www.boem.gov/ak-5yr-2019-2024.zip/</onlink>
				<onlink>https://www.boem.gov/pac-5yr-2019-2024.zip</onlink>
				<onlink>https://www.data.boem.gov/Mapping/Files/Gom_5yr_2019_2024.zip</onlink>
           	     </citeinfo>
            </citation>
    </idinfo>
</metadata>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant