Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change JobList ergnomics - add filtering by states, return cursor from JobList function #304

Merged
merged 7 commits into from
Apr 19, 2024

Conversation

bgentry
Copy link
Contributor

@bgentry bgentry commented Apr 19, 2024

A tweaked version of #236, removing queries which cannot be efficient on a large river_job table due to the lack of indexes.

Users are of course free to add their own indexes and write their own query functions, but we're a bit uncomfortable offering official APIs which will scale poorly.

@bgentry bgentry force-pushed the bg-jos-improvements branch from 0688e07 to 5d7d3e7 Compare April 19, 2024 00:59
@bgentry bgentry requested a review from brandur April 19, 2024 00:59
@bgentry bgentry marked this pull request as ready for review April 19, 2024 00:59
@bgentry bgentry added the breaking change Breaking API change label Apr 19, 2024
Copy link
Contributor

@brandur brandur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! IMO let's get this in. As noted in chat I wouldn't mind trying to take a stab at a couple more small tweaks, but let's do that in a follow up.

// This indicates the user overrode the States list with only non-finalized
// states prior to then requesting FinalizedAt ordering.
if len(currentNonFinalizedStates) == 0 {
return nil, errors.New("cannot order by finalized_at with non-finalized state filters")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put the non-finalized states into the error message with a %+v or something? Just helps make the error quicker to resolve.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, tweaked ✌️

River does not keep indexes on `created_at` and `attempted_at` fields
and we do not want to do so for performance reasons. We shouldn't offer
a built-in API for sorting by these fields which will be guaranteed to
have poor performance on large tables.

Additionally, while `finalized_at` _is_ indexed, it's indexed via a
_partial_ index on non-null values (i.e. only for finalized jobs). We
can still make it possible to order based upon this field, but only when
also filtering to only finalized states. This still _may_ be inefficient
at times, but it should hopefully be able to at least avoid a full table
scan.
@bgentry bgentry force-pushed the bg-jos-improvements branch from 5d7d3e7 to f61b83b Compare April 19, 2024 01:56
@bgentry bgentry enabled auto-merge (squash) April 19, 2024 01:58
@bgentry bgentry merged commit b0fa941 into master Apr 19, 2024
10 checks passed
@bgentry bgentry deleted the bg-jos-improvements branch April 19, 2024 01:59
brandur added a commit that referenced this pull request Apr 20, 2024
Here, in addition to the new job list sort orders added by #304, add one
more for sorting by job ID, and make it the default.

This is another breaking change in the job list API, and I wouldn't
normally make it at this point, but our next release will contain a
number of breaking changes, including to the job list API, so the time
is now.

My rationale for the change:

* Listing and paginating by ID is an overwhelmingly common pattern in
  web and database APIs, and I think it being the default would be more
  intuitive for more people.

* Ordering by ID is more ergonomic because no `JobListParams.States`
  invocation needs to made. It's shorter, and especially when a fairly
  normal use case will be to iterate across all job rows, this is a
  minor nicety.

* Ordering by ID allows the entire job collection to be iterated
  regardless of job state. `JobListOrderByTime` requires a state, and
  some order is needed for cursors to work.

* The behavior of `JobListOrderByTime` is a little odd in that it
  changes dynamically based on the requested list states, and there's no
  way to intuit what the order will be without knowing a lot about River
  internals and thinking very carefully about it. Furthermore, the time
  that'd be chosen wasn't documented anywhere, so the only way to know
  for sure what it would be was to read River's source code.

* With the inclusion of #304, `JobListOrderByTime`'s behavior has gotten
  even a little more surprising because the state to be chosen to select
  a timestamp was the _first_ value sent to `JobListParams.States`, with
  any additional values sent ignored, also creating a somewhat nonsensical
  result (e.g. `States(running, available)` would select `attempted_at`,
  but would be `NULL` for any jobs in the `available` state). This
  behavior was not documented.

I also found a bug that was a hold over from #304 as more than one sort
order became available. The function `JobListCursorFromJob` took a sort
order, but would produce the wrong result unless the user remembered to
set the exact same sort order on their job list parameters. For example,
this would do the wrong thing:

    res, err = client.JobList(ctx, NewJobListParams().After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

`JobListCursorFromJob` would extract `scheduled_at` from `job4`, but
then list using to the default job list order. Previously that was based
on time, so this result would've been wrong _unless_ the job list
parameters filtered to state `available`, `retryable`, or `scheduled` so
that `scheduled_at` was also used when comparing to other jobs.

A caller could compensate by specifying sort order in both places, but
this is pretty ugly, and there was no check to make sure that the list
paramaters and cursor shared the same order:

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

The corrected version of this doesn't use an order when initializing the
cursor, instead using the one from the job list params, meaning that the
same time field is always used between list query and what's extracted
from the cursor's job row.

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4)))

This also makes the invocation shorter and more ergonomic to call.

Along with the above, we also make a few tweaks to documentation.
`JobListOrderByTime` now documents which timestamps it uses, and
indicates that only the first value sent to `JobListParams.States` will
be respected.
brandur added a commit that referenced this pull request Apr 21, 2024
Here, in addition to the new job list sort orders added by #304, add one
more for sorting by job ID, and make it the default.

This is another breaking change in the job list API, and I wouldn't
normally make it at this point, but our next release will contain a
number of breaking changes, including to the job list API, so the time
is now.

My rationale for the change:

* Listing and paginating by ID is an overwhelmingly common pattern in
  web and database APIs, and I think it being the default would be more
  intuitive for more people.

* Ordering by ID is more ergonomic because no `JobListParams.States`
  invocation needs to made. It's shorter, and especially when a fairly
  normal use case will be to iterate across all job rows, this is a
  minor nicety.

* Ordering by ID allows the entire job collection to be iterated
  regardless of job state. `JobListOrderByTime` requires a state, and
  some order is needed for cursors to work.

* The behavior of `JobListOrderByTime` is a little odd in that it
  changes dynamically based on the requested list states, and there's no
  way to intuit what the order will be without knowing a lot about River
  internals and thinking very carefully about it. Furthermore, the time
  that'd be chosen wasn't documented anywhere, so the only way to know
  for sure what it would be was to read River's source code.

* With the inclusion of #304, `JobListOrderByTime`'s behavior has gotten
  even a little more surprising because the state to be chosen to select
  a timestamp was the _first_ value sent to `JobListParams.States`, with
  any additional values sent ignored, also creating a somewhat nonsensical
  result (e.g. `States(running, available)` would select `attempted_at`,
  but would be `NULL` for any jobs in the `available` state). This
  behavior was not documented.

I also found a bug that was a hold over from #304 as more than one sort
order became available. The function `JobListCursorFromJob` took a sort
order, but would produce the wrong result unless the user remembered to
set the exact same sort order on their job list parameters. For example,
this would do the wrong thing:

    res, err = client.JobList(ctx, NewJobListParams().After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

`JobListCursorFromJob` would extract `scheduled_at` from `job4`, but
then list using to the default job list order. Previously that was based
on time, so this result would've been wrong _unless_ the job list
parameters filtered to state `available`, `retryable`, or `scheduled` so
that `scheduled_at` was also used when comparing to other jobs.

A caller could compensate by specifying sort order in both places, but
this is pretty ugly, and there was no check to make sure that the list
paramaters and cursor shared the same order:

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

The corrected version of this doesn't use an order when initializing the
cursor, instead using the one from the job list params, meaning that the
same time field is always used between list query and what's extracted
from the cursor's job row.

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4)))

This also makes the invocation shorter and more ergonomic to call.

Along with the above, we also make a few tweaks to documentation.
`JobListOrderByTime` now documents which timestamps it uses, and
indicates that only the first value sent to `JobListParams.States` will
be respected.
brandur added a commit that referenced this pull request Apr 21, 2024
Here, in addition to the new job list sort orders added by #304, add one
more for sorting by job ID, and make it the default.

This is another breaking change in the job list API, and I wouldn't
normally make it at this point, but our next release will contain a
number of breaking changes, including to the job list API, so the time
is now.

My rationale for the change:

* Listing and paginating by ID is an overwhelmingly common pattern in
  web and database APIs, and I think it being the default would be more
  intuitive for more people.

* Ordering by ID is more ergonomic because no `JobListParams.States`
  invocation needs to made. It's shorter, and especially when a fairly
  normal use case will be to iterate across all job rows, this is a
  minor nicety.

* Ordering by ID allows the entire job collection to be iterated
  regardless of job state. `JobListOrderByTime` requires a state, and
  some order is needed for cursors to work.

* The behavior of `JobListOrderByTime` is a little odd in that it
  changes dynamically based on the requested list states, and there's no
  way to intuit what the order will be without knowing a lot about River
  internals and thinking very carefully about it. Furthermore, the time
  that'd be chosen wasn't documented anywhere, so the only way to know
  for sure what it would be was to read River's source code.

* With the inclusion of #304, `JobListOrderByTime`'s behavior has gotten
  even a little more surprising because the state to be chosen to select
  a timestamp was the _first_ value sent to `JobListParams.States`, with
  any additional values sent ignored, also creating a somewhat nonsensical
  result (e.g. `States(running, available)` would select `attempted_at`,
  but would be `NULL` for any jobs in the `available` state). This
  behavior was not documented.

I also found a bug that was a hold over from #304 as more than one sort
order became available. The function `JobListCursorFromJob` took a sort
order, but would produce the wrong result unless the user remembered to
set the exact same sort order on their job list parameters. For example,
this would do the wrong thing:

    res, err = client.JobList(ctx, NewJobListParams().After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

`JobListCursorFromJob` would extract `scheduled_at` from `job4`, but
then list using to the default job list order. Previously that was based
on time, so this result would've been wrong _unless_ the job list
parameters filtered to state `available`, `retryable`, or `scheduled` so
that `scheduled_at` was also used when comparing to other jobs.

A caller could compensate by specifying sort order in both places, but
this is pretty ugly, and there was no check to make sure that the list
paramaters and cursor shared the same order:

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4, JobListOrderByScheduledAt)))

The corrected version of this doesn't use an order when initializing the
cursor, instead using the one from the job list params, meaning that the
same time field is always used between list query and what's extracted
from the cursor's job row.

    res, err = client.JobList(ctx, NewJobListParams().OrderBy(JobListOrderByScheduledAt, SortOrderAsc).After(JobListCursorFromJob(job4)))

This also makes the invocation shorter and more ergonomic to call.

Along with the above, we also make a few tweaks to documentation.
`JobListOrderByTime` now documents which timestamps it uses, and
indicates that only the first value sent to `JobListParams.States` will
be respected.
@jos-oai
Copy link
Contributor

jos-oai commented Apr 22, 2024

Thanks so much! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Breaking API change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants