Skip to content

Commit

Permalink
feat: ADR for incremental algolia indexing
Browse files Browse the repository at this point in the history
  • Loading branch information
johnnagro committed Feb 14, 2024
1 parent 03c953d commit 08493ca
Showing 1 changed file with 67 additions and 0 deletions.
67 changes: 67 additions & 0 deletions docs/decisions/0009-incremental-algolia-indexing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
Incremental Algolia Indexing
============================


Status
------
Draft


Context
-------
The Enterprise Catalog Service produces an Algolia-based search index of its Content Metadata and Course Catalog
database. This index is entirely rebuilt at least nightly, resulting in a wholesale replacement of the prior
Algolia index. This job is time consuming and memory intensive. This job relies heavily on the search/all
functionality of Course Discovery to determine content metadata catalog membership. This job is brittle - either
entirely successful or entirely unsuccessful.


Solution Approach
-----------------
The goals should include:
- Run alongside/augemt the existing indexer until we’re able to entirely cut-over
- Support all current metadata types but doesn’t need to support them all on day 1
- Support multiple methods of triggering: event bus, on-demand from django admin, on a schedule, from the existing
update_content_metadata job, etc.
- Higher parallelization factor, i.e. 1 content item per celery task worker (and no task group coordination required)
- Provide a content-oriented method of determining content catalog membership


Decision
--------
We want to follow updates to catalogs and content then incrementally make updates to Algolia. To do this I propose we
both create new functionality and reuse some existing functionality of our Algolia indexing infrastructure.

First, the existing indexing process begins with executing catalog queries against `search/all` to determine which
courses exist and belong to which catalogs. In order for incremental updates to work we first need to provide the
opposite semantic and instead be able to determine catalog membership from a given course (rather than courses from a
given catalog). We can make use of the new `apps.catalog.filters` python implementation which can take a catalog query
and a piece of content metadata and determine if the content matches the query (without the use of course discovery).

Second, in order to incrementally update the Algolia index we need to build the ability to replace individual
object-shard documents in the index (today we just replace the whole index). This can be implemented by creating
methods to determine which Algolia object-shards exist for a piece of content. Once we have those IDs we are able to
determine if a create, update, or delete of them is required. For simplicity sake an update will likely be a delete
followed by the creation of new objects.

Third, we need to provide new methods of indexing based on an individual object change. This method will determine if
the content metadata change should result in a create, update, or delete of object-shards in Algolia. If a create or
update action is required, it will determine catalog membership via the new `apps.catalog.filters` tooling. Then it
will re-use much of the existing Algolia indexing code to create the new set of document object shards to send to
Algolia. Finally, it will issue any required deletes of existing objects and creates of any new or updated objects.

Lastly, incremental updates will need to be triggered by something - such as polling of updated content from Course
Discovery, consumption of event-bus events, and/or triggering based on a nightly Course Discovery crawl or Django
Admin button.


Consequences
------------
Ideally this incremental process will allow us to provide a closer to real-time index using fewer resources. It will
also provide us with more flexibility about including non-course-discovery content in catalogs because we will
no-longer rely on `search/all`.


Alternatives Considered
-----------------------
No alternatives were considered.

0 comments on commit 08493ca

Please sign in to comment.