Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search / Multilingual support for contact, links and overview description (indexing and aggregations) #6588

Merged
merged 11 commits into from
Jan 16, 2023

Conversation

fxprunayre
Copy link
Member

@fxprunayre fxprunayre commented Oct 5, 2022

Multilingual contact organisation support (in aggregation and record view)

An index field used in aggregation may be multilingual. eg. When the field was based on a multilingual thesaurus,
the field key can be used in facet for translation (eg. th_httpinspireeceuropaeutheme-theme_tree.key) or a codelist field. In such case, the translation are loaded client side. But that was not supported for field not related to a thesaurus or a codelist eg. OrgForResource or tag.

For organisation field the changes required are:

  • Index organisation name as a multilingual field (was only indexing
    main language)
  • Change index field type to object (with language field typed as keyword (for making aggregation))
  • Translate recursively search response (only first level multilingual fields where translated eg. resourceTitleObject) - Based on geoadmin/geocat@68d60be
  • Add language replacer for aggregation based on the language strategy
"tag": {
  terms: {
    field: "tag.${aggLang}",
OrgForResource: {
  terms: {
    field: "OrgForResourceObject.${aggLang}",
    // field: "OrgForResourceObject.default",
    // field: "OrgForResourceObject.langfre",

Based on the strategy, aggLang is:

  • if forcedLanguage or searchInThatLanguage, then the forced language
  • if searchInDetectedLanguage, then the detected language
  • if searchInUILanguage or searchInAllLanguages, then the search UI language

Note: We can't use OrgForResourceObject.* to create an aggregation
combining all values.

Index Changes:

  • organisation fields are now multilingual fields eg. OrgForResource > OrgForResourceObject
    OrgForResourceObject": {
      "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
      "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
      "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
    },
    "custodianOrgForResourceObject": {
      "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
      "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
      "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
    },
    "contactForResource": [
      {
        "organisationObject": {
          "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
          "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
          "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
          "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
          "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
        },
        "role": "custodian",

It does not require changes on the client app as multilingual fields are
converted to simple field based on UI language. OrgForResourceObject
field is translated and a field OrgForResource containing the value in
UI language or the default value is created. So now record view also
support organisation name encoded using multilingual encoding.

  • UI in english, organisation filter in english

image

  • UI in french, organisation filter in french

image

  • If enabling language detection, UI in english but search in french, filter is in french (as search will probably mainly return french content)

image

Inspired by some work done by @fgravin and @cmangeat for geocat.ch which choose the facet in the UI language.

Multilingual links support

In ISO19139, name and description can be multilingual. ISO19115-3 adds the possibility to also provide one URL per languages.

image

With the migration to Elasticsearch only the main language was indexed and used in the UI. Like for organisation name, record view now display link details (URL, name and description) using the UI language if available

image

Multilingual overview support

An overview can have a description. Display it based on UI language.

image

ISO19115-3 / Editor / Fix update and indexing of links (not in distribution)

eg. data quality report and legend can now be updated properly in multilingual records.

fgravin and others added 3 commits October 3, 2022 15:27
if the term field ends with Object, then it considers it as multilingual and it creates the facet on the UI language indexed field.
An index field used in aggregation may be multilingual. eg. When the field was based on a multilingual thesaurus,
the field key can be used in facet for translation (eg. th_httpinspireeceuropaeutheme-theme_tree.key) or a codelist field. In such case, the translation are loaded client side. But that was not supported for field not related to a thesaurus or a codelist eg. OrgForResource or tag.

For organisation field the changes required are:
* Index organisation name as a multilingual field (was only indexing
  main language)
* Change index field type to object (with language field typed as keyword (for making aggregation))
* Translate recursively search response (only first level multilingual fields where translated eg. resourceTitleObject) - Based on geoadmin/geocat@68d60be
* Add language replacer for aggregation based on the language strategy

```js
"tag": {
  terms: {
    field: "tag.${aggLang}",
OrgForResource: {
  terms: {
    field: "OrgForResourceObject.${aggLang}",
    // field: "OrgForResourceObject.default",
    // field: "OrgForResourceObject.langfre",
```

Based on the strategy, aggLang is:
* if forcedLanguage or searchInThatLanguage, then the forced language
* if searchInDetectedLanguage, then the detected language
* if searchInUILanguage or searchInAllLanguages, then the search UI language

Note: We can use `OrgForResourceObject.*` to create an aggregation
combining all values.

Index Changes:
* organisation fields are now multilingual fields eg. `OrgForResource` > `OrgForResourceObject`
```js
    OrgForResourceObject": {
      "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
      "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
      "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
    },
    "custodianOrgForResourceObject": {
      "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
      "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
      "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
      "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
    },
    "contactForResource": [
      {
        "organisationObject": {
          "default": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
          "langeng": "FPS Finance - General Administration of Patrimonial Documentation (GAPD)",
          "langfre": "SPF Finances - Administration Générale de la Documentation Patrimoniale (AGDP)",
          "langdut": "FOD Financien - Algemene Administratie van de Patrimoniumdocumentatie (AAPD)",
          "langger": "FOD Finanzen - Generalverwaltung Vermögensdokumentation (GVVD)"
        },
        "role": "custodian",

```

It does not require changes on the client app as multilingual fields are
converted to simple field based on UI language. `OrgForResourceObject`
field is translated and a field `OrgForResource` containing the value in
UI language or the default value is created. So now record view also
support organisation name encoded using multilingual encoding.

Inspired by some work done by @fgravin and @cmangeat for geocat.ch which choose the facet in the UI language.
@fxprunayre fxprunayre added this to the 4.2.2 milestone Oct 5, 2022
@fxprunayre fxprunayre added the index structure change Indicate that this work introduces an index change. label Oct 5, 2022
@fxprunayre fxprunayre changed the title Search / Multilingual aggregation support Search / Multilingual support for contact and links (indexing and aggregations) Oct 5, 2022
@fxprunayre fxprunayre added the schema plugin change Indicate that this work introduces a schema plugin change. label Oct 5, 2022
@fxprunayre fxprunayre changed the title Search / Multilingual support for contact and links (indexing and aggregations) Search / Multilingual support for contact, links and overview description (indexing and aggregations) Oct 5, 2022
@fxprunayre fxprunayre modified the milestones: 4.2.2, 4.2.3 Dec 7, 2022
@fxprunayre fxprunayre force-pushed the 422-multilingualcontact branch from 56b9d84 to da8448c Compare December 9, 2022 09:54
@fxprunayre fxprunayre requested a review from josegar74 January 10, 2023 08:14
<element>additionalDocumentation</element>
<element>specification</element>
<element>reportReference</element>
</doc>
</xsl:variable>

<xsl:template name="collect-documents">
<xsl:variable name="root" select="."/>
<xsl:param name="forIndexing" select="false()" as="xs:boolean"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a code comment to explain what is the usage of this parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description 80fbd5f

select="substring-before(., $valueSeparator)"/>
select="if ($valueSeparator != '')
then substring-before(., $valueSeparator)
else substring(., 1, 2)"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the value 2 is for the iso2lang code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description 80fbd5f

select="substring-after(., $valueSeparator)"/>
select="if ($valueSeparator != '')
then substring-after(., $valueSeparator)
else substring(., 4)"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What means the value 4? If not obvious, can you add a code comment to explain it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description 80fbd5f

it is a particular case for the onlinesrc-add.xsl which can add URL containing the separator '#'

select="substring-after(., $valueSeparator)"/>
select="if ($valueSeparator != '')
then substring-after(., $valueSeparator)
else substring(., 4)"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as previous comment

@josegar74
Copy link
Member

josegar74 commented Jan 12, 2023

Testing the pull request, the Organisation facet list seems not translated.

I've created a metadata with English and French, using these values for the organisation name:

  • Test organisation (en)
  • Test organisation (fr)

I get the following in French UI:

facet-organisation-fr

But doing the following steps in the French UI. only displays the French organisation, none of the other that are not from multilingual metadata:

  • Go to the home page and click the search button --> Search results / facets are like the previous image
  • Click the reload button in the browser, then:

facet-organisation-fr-2

@fxprunayre
Copy link
Member Author

You have to test with the various language option and find which one is best for your catalogue content
image

Copied some info from that PR description in the code to explain how the aggLang is set in 321c552

@josegar74
Copy link
Member

josegar74 commented Jan 12, 2023

I see the results vary depending on that selection, that might make sense, but it's a bit confusing if you have multilingual and non-multilingual metadata and select to search in the in UI language with the metadata alternate language (for example, French). In this case only get the organisations with a value in French in the multilingual metadata, but not in the non-multilingual metadata (if defined in other language than French).

It's not a bug, but I'm not sure if it might be better to index the multilingual values ​​for non-multilingual metadata with the primary language value. In any case, this can be done in a separate pull request, if required.


Independently of that, the case I described causes different results (I see it's in both cases selected the value in all languages):

But doing the following steps in the French UI. only displays the French organisation, none of the other that are not from multilingual metadata:

  • Go to the home page and click the search button --> Search results / facets are like the previous image
  • Click the reload button in the browser, then:

@fxprunayre
Copy link
Member Author

but it's a bit confusing if you have multilingual and non-multilingual metadata

It really depends on what UI languages you provided and what is the content of your catalogue.

If you have UI language list set to english and french and also a mix of metadata not all translated (or a minority translated), then you would configure the facet to be

              OrgForResource: {
                terms: {
                  field: "OrgForResourceObject.default",

if all (or most) are translated, I would use:

            languageStrategy: "searchInDetectedLanguage",
            languageWhitelist: ["eng", "fre"],
            facetConfig: {
              OrgForResource: {
                terms: {
                  field: "OrgForResourceObject.${aggLang}",

if you propose only UI in english and have a mix of non english monolingual metadata harvested from various places:

            languageStrategy: "searchInAllLanguages",
            facetConfig: {
              OrgForResource: {
                terms: {
                  field: "OrgForResourceObject.default",

it really depends on the content and targeted audience languages.

languageStrategy is part of the search URL state so it set if you reload search page :
http://localhost:8080/geonetwork/srv/eng/catalog.search#/search?isTemplate=n&resourceTemporalDateRange=%7B%22range%22:%7B%22resourceTemporalDateRange%22:%7B%22gte%22:null,%22lte%22:null,%22relation%22:%22intersects%22%7D%7D%7D&sortBy=relevance&sortOrder=&from=1&to=30&languageStrategy=searchInDetectedLanguage

@josegar74
Copy link
Member

Thanks for the clarifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
index structure change Indicate that this work introduces an index change. schema plugin change Indicate that this work introduces a schema plugin change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants