Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMs from new sources not immediately displayed in PMs list #3654

Closed
rk-for-zulip opened this issue Oct 18, 2019 · 14 comments · Fixed by #3877
Closed

PMs from new sources not immediately displayed in PMs list #3654

rk-for-zulip opened this issue Oct 18, 2019 · 14 comments · Fixed by #3877
Assignees
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority

Comments

@rk-for-zulip
Copy link
Contributor

When a user receives a PM from a new source (i.e., one that had not previously sent them a PM), the center icon will correctly display a notification for a new message... but when tapped, that conversation-source does not appear in the PMs list; it remains unrefreshed.

(Force-stopping and reentering the app will also force a refresh.)

@gnprice gnprice added a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority labels Oct 18, 2019
@gnprice
Copy link
Member

gnprice commented Oct 18, 2019

Yikes, that's a pretty bad symptom!

Would definitely be good to fix. The bug will very probably be in one of the "reducer" files maintaining the data structures that PM conversation list relies on.

@chrisbobbe chrisbobbe self-assigned this Jan 10, 2020
@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jan 11, 2020

Aha! The 'narrows' state is not being updated to reflect the new messages, when it should.

Here's why: When a PM arrives and an EVENT_NEW_MESSAGE action is dispatched, action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is used by the narrowsReducer (in the eventNewMessage function) to decide whether to update the state. action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is just taken from the previous state, seen in the 'message' case in eventToAction.js. If the user has seen the most recent message, i.e., if action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is true for that narrow, the state updates. If not, then the user is scrolling through old messages, probably from a link to some ancient history, and it would be confusing to append new messages directly after those messages, so the state does not update:

(src/chat/narrowsReducer.js, in eventNewMessage)

    const isInNarrow = isMessageInNarrow(action.message, JSON.parse(key), action.ownEmail);
    const isCaughtUp = action.caughtUp[key] && action.caughtUp[key].newer;
    const messageDoesNotExist = state[key].find(id => action.message.id === id) === undefined;

    if (isInNarrow && isCaughtUp && messageDoesNotExist) {
      stateChange = true;
      newState[key] = [...state[key], action.message.id];
    } else {
      newState[key] = state[key];
    }

But action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is always false, despite the fact that (I believe) it should actually be always true. There's an initial fetch specifically of private messages that should be getting all of the most recent ones, and should be changing that .newer to true. (/messages is called from fetchPrivateMessages which is dispatched during doInitialFetch in src/message/fetchActions.js.) But the response to that /messages call has false for found_newest, so perhaps there's a server-side bug, or something wrong with the /messages request.

Further investigation will almost certainly involve #3133 (currently open as P1 from November 2018); I just wanted to jot this finding down before I forget.

@gnprice
Copy link
Member

gnprice commented Jan 13, 2020

Thanks @chrisbobbe for the investigation and that writeup!

There's an initial fetch specifically of private messages that should be getting all of the most recent ones, [... b]ut the response to that /messages call has false for found_newest, so perhaps there's a server-side bug, or something wrong with the /messages request.

Yeah, this sounds like a point where something's already gone wrong, then.

The code that makes that request (fetchPrivateMessages which is dispatched during doInitialFetch in src/message/fetchActions.js, as you said) looks like this:

const fetchPrivateMessages = () => async (dispatch: Dispatch, getState: GetState) => {
  const auth = getAuth(getState());
  const { messages, found_newest, found_oldest } = await tryUntilSuccessful(() =>
    api.getMessages(auth, ALL_PRIVATE_NARROW, LAST_MESSAGE_ANCHOR, 100, 0),

where

export const LAST_MESSAGE_ANCHOR = Number.MAX_SAFE_INTEGER;

and that api.getMessages binding looks like

export default async (
  auth: Auth,
  narrow: Narrow,
  anchor: number,
  numBefore: number,
  numAfter: number,
  useFirstUnread: boolean = false,
): Promise<ApiResponseMessages> => {
  const response: ServerApiResponseMessages = await apiGet(auth, 'messages', {
    narrow: JSON.stringify(narrow),
    anchor,
    num_before: numBefore,
    num_after: numAfter,
    apply_markdown: true,
    use_first_unread_anchor: useFirstUnread,
  });

It sure looks to me like that ought to result in found_newest being true -- we're asking specifically for the very latest messages. Perhaps there's a subtlety in the API here, or perhaps just a server-side bug. @timabbott , do you have any ideas on why a request like this would produce found_newest false?

@rk-for-zulip
Copy link
Contributor Author

This may be due to our use of stringified JSON objects as keys in state.narrows – unless someone's imposed new restrictions while I wasn't looking, the JSON.stringify of two structurally identical objects is not guaranteed to be identical. (I'm not even sure two consecutive serializations of the same object are guaranteed to be identical.)

@timabbott
Copy link
Member

timabbott commented Jan 13, 2020

I'd certainly expect that the result for any query of that form to have found_newest=true; if it's not, I can investigate. It'd be pretty surprising if it didn't; let me know if you see that behavior through logging.

@chrisbobbe
Copy link
Contributor

chrisbobbe commented Jan 14, 2020

Hmm, so with some quick logging on the server (please let me know if there's a better way to do this or you need something more specific) --

(zerver/views/messages.py, get_messages_backend)
image

-- I see that it's showing False:

image

@timabbott
Copy link
Member

Interesting! I think what's going on is a consequence of your requesting 0 messages as num_after, and so the server didn't check whether indeed there are any messages with a newer message ID, and thus cannot prove that it had fetched the latest messages.

Writing more details now.

@timabbott
Copy link
Member

timabbott commented Jan 14, 2020

For now, I think against legacy Zulip servers with the current behavior, while the mobile app is using the specific type of query it's using, it should assume found_newest=true if it set a max_int style anchor value, because the current behavior is at best confusing, and maybe just wrong.

Here's the relevant code:

    # In a function just for setting the anchor when use_first_unread_anchor=True
    if len(first_unread_result) > 0:                                                                 
        anchor = first_unread_result[0][0]                                                           
    else:                                                                                            
        anchor = LARGER_THAN_MAX_MESSAGE_ID                                                          
    ...
    anchored_to_right = (anchor == LARGER_THAN_MAX_MESSAGE_ID)                                       
    ...
    found_newest = anchored_to_right or (len(after_rows) < num_after)                                

The logic is as follows, roughly:

  • We set found_newest using found_newest = anchored_to_right or (len(after_rows) < num_after). The second case is if we actually fetched messages for an "num_after>0` query; the first case is intended to cover the case that we're doing a max_int style anchor and didn't try to look for messages newer than the provided anchor.
  • The code for setting anchored_to_right is a bit hacky if use_first_unread_anchor=true. If we found at least 1 unread message, it assumes found_newest=False (since there could be more messages newer than the first unread message, and isn't willing to certify that). If there are none, it hackily sets anchor to LARGER_THAN_MAX_MESSAGE_ID = 10000000000000000 (which when written was intended to be a purely internal optimization within that code path).
  • The webapp, perhaps critically for this not having been noticed the server's behavior is bad, also uses 10000000000000000 as its message ID for !use_first_unread_anchor queries with num_after=0 (the webapp code to do that was added much after; not sure if this was intentional; the JS code was certainly added much later than the server code). So the webapp is as of November's merge of 452e226ea25c1315edb1f31be406475d82c56a88 accidentally relying on this weird behavior with that specific number, I think.

I haven't had time to think through this deeply but here are a few thoughts:

  • From a specification perspective, found_newest=False was meant to mean that the client needs to fetch from the relevant narrow again with num_after>0 in order to get the latest messages. I don't think we thought about defining what it should mean when you didn't ask for any newer messages. In theory, there could have been some with a higher message ID than the number passed, and the server didn't check.
  • I suspect the mobile app could switch to using that 10000000000000000 value as its anchor and have the problem go away; we should test that. This might be the best option for backward-compatibility with older servers giving us freedom to fix the server. And then maybe we should document that number as such if indeed it is meant to be.
  • For the server, I think there's a few options for how we could proceed.
    • I don't really want to require that the server do an extra database query by setting num_after=1 to check if there are any newer messages, but in theory it could do that for correctness' sake. There's unfortunately not a super cheap way to check if a value is above the maximum message ID in the database. But we could potentially declare that message IDs being less than 10000000000000000 is a documented invariant that clients can rely on.
    • We could consider having the server not send a found_newest at all when an anchor is specific and num_after=0 to be clear that the server didn't check.
    • We might want to use a >= check rather than an equal check if we're using this 10000000000000000 number at all.

@chrisbobbe
Copy link
Contributor

Thanks for the writeup and suggestions, @timabbott! I haven't forgotten about this, but I'd like to check in with @gnprice about it next week when he's in the office.

@gnprice
Copy link
Member

gnprice commented Jan 28, 2020

Thanks @timabbott for the investigation!

For now, I think against legacy Zulip servers with the current behavior, while the mobile app is using the specific type of query it's using, it should assume found_newest=true if it set a max_int style anchor value, because the current behavior is at best confusing, and maybe just wrong.

This sounds like a good client-side fix. I think what this comes out to is that whenever we use LAST_MESSAGE_ANCHOR as the anchor, we should ignore the found_newest the server returns and instead use true there.

(And when further we're passing use_first_unread_anchor: false? That should always be the case when using LAST_MESSAGE_ANCHOR anyway.)

... Oh aha, but:

  • I suspect the mobile app could switch to using that 10000000000000000 value as its anchor and have the problem go away; we should test that. This might be the best option for backward-compatibility with older servers giving us freedom to fix the server.

That sounds like a simpler workaround 🙂 We'd coax the server into giving the right answer in the first place, rather than fix up the answer with our own logic. I think this'd be just a one-line change to the LAST_MESSAGE_ANCHOR definition.


For the server-side fix:

  • I think there definitely ought to be a way to express a query "give me the very latest messages, whatever the message IDs might be." That's a very natural query to make.
    • It seems like both the webapp and mobile app have hacked this by passing a giant message ID that's assumed to be larger than any real one.
    • But better yet would be a way to express that that's explicitly understood by both client and server. This is stronger than e.g. "message IDs being less than 10000000000000000 is a documented invariant that clients can rely on" -- it'd mean the client can rely on the server relying on that fact.
    • In particular, the server is giving wrong answers here for found_newest. It's doing that because, as you say, to directly check in the DB whether there's a later message isn't free. But when the point of the query is "give me the very latest messages", no DB query is needed -- the server just needs a way to be able to rely on the fact that's the point, including in the bit of code that's computing found_newest.
  • One way to do that would be to declare that the value 10000000000000000 means that.
    • Better, all values at least that, as you say.
  • Another would be to pick some other value, perhaps a string constant like 'latest' (so anchor: 'latest'), as a way of expressing that.
    • This feels cleaner in some ways; it's more explicit. The type is then number | 'latest'.
    • We'd still use 10000000000000000 in the app for now for compatibility, but a comment would point to the update so in a few years it's easy to say "oh, Zulip 2.2, that's ancient, we can use the explicit form now."
      • For that matter, we could use number | 'latest' already when passing anchors around our own code, and just convert to the special number at the edge in the API binding.

@timabbott
Copy link
Member

Yeah, anchor: latest seems like a reasonable way to declare that interface; it seems a lot cleaner than adding a new parameter or something.

timabbott added a commit to timabbott/zulip that referenced this issue Jan 29, 2020
A wart that has long been present inin Zulip's get_messages API is how
to request "the latest messages" in the API.  Previously, the
recommendation was basically to pass anchor=10000000000000000 (for an
appropriately huge number). An accident of the server's implementation
meant that specific number of 0s was actually important to avoid a
buggy (or at least wasteful) value of found_newest=False if the query
had specified num_after=0 (since we didn't check).

This was the cause of the mobile issue
zulip/zulip-mobile#3654.

The solution is to allow passing a special value of anchor='newest',
basically a special string-type value that the server can interpret as
meaning the user precisely just wants the most recent messages.  We
also add an analogous anchor='oldest' or similar to avoid folks
needing to write a somewhat ugly anchor=0 for fetching the very first
messages.

We may want to also replace the use_first_unread_anchor argument to be
a "first_unread" value for the anchor parameter.

While it's not always ideal to make a value have a variable type like
this, in this case it seems like a really clean way to express the
idea of what the user is asking for in the API.
timabbott added a commit to timabbott/zulip that referenced this issue Jan 29, 2020
A wart that has long been present inin Zulip's get_messages API is how
to request "the latest messages" in the API.  Previously, the
recommendation was basically to pass anchor=10000000000000000 (for an
appropriately huge number). An accident of the server's implementation
meant that specific number of 0s was actually important to avoid a
buggy (or at least wasteful) value of found_newest=False if the query
had specified num_after=0 (since we didn't check).

This was the cause of the mobile issue
zulip/zulip-mobile#3654.

The solution is to allow passing a special value of anchor='newest',
basically a special string-type value that the server can interpret as
meaning the user precisely just wants the most recent messages.  We
also add an analogous anchor='oldest' or similar to avoid folks
needing to write a somewhat ugly anchor=0 for fetching the very first
messages.

We may want to also replace the use_first_unread_anchor argument to be
a "first_unread" value for the anchor parameter.

While it's not always ideal to make a value have a variable type like
this, in this case it seems like a really clean way to express the
idea of what the user is asking for in the API.
timabbott added a commit to timabbott/zulip that referenced this issue Jan 29, 2020
A wart that has long been present inin Zulip's get_messages API is how
to request "the latest messages" in the API.  Previously, the
recommendation was basically to pass anchor=10000000000000000 (for an
appropriately huge number). An accident of the server's implementation
meant that specific number of 0s was actually important to avoid a
buggy (or at least wasteful) value of found_newest=False if the query
had specified num_after=0 (since we didn't check).

This was the cause of the mobile issue
zulip/zulip-mobile#3654.

The solution is to allow passing a special value of anchor='newest',
basically a special string-type value that the server can interpret as
meaning the user precisely just wants the most recent messages.  We
also add an analogous anchor='oldest' or similar to avoid folks
needing to write a somewhat ugly anchor=0 for fetching the very first
messages.

We may want to also replace the use_first_unread_anchor argument to be
a "first_unread" value for the anchor parameter.

While it's not always ideal to make a value have a variable type like
this, in this case it seems like a really clean way to express the
idea of what the user is asking for in the API.
timabbott added a commit to zulip/zulip that referenced this issue Jan 29, 2020
A wart that has long been present inin Zulip's get_messages API is how
to request "the latest messages" in the API.  Previously, the
recommendation was basically to pass anchor=10000000000000000 (for an
appropriately huge number). An accident of the server's implementation
meant that specific number of 0s was actually important to avoid a
buggy (or at least wasteful) value of found_newest=False if the query
had specified num_after=0 (since we didn't check).

This was the cause of the mobile issue
zulip/zulip-mobile#3654.

The solution is to allow passing a special value of anchor='newest',
basically a special string-type value that the server can interpret as
meaning the user precisely just wants the most recent messages.  We
also add an analogous anchor='oldest' or similar to avoid folks
needing to write a somewhat ugly anchor=0 for fetching the very first
messages.

We may want to also replace the use_first_unread_anchor argument to be
a "first_unread" value for the anchor parameter.

While it's not always ideal to make a value have a variable type like
this, in this case it seems like a really clean way to express the
idea of what the user is asking for in the API.
@timabbott
Copy link
Member

The server-side change, zulip/zulip/#13747 is merged and deployed on chat.zulip.org. We went with oldest / newest / first_unread for the supported string parameter values for anchor.

Thanks for all the design work on this; that "giant message ID" business has since we first designed the GET /messages API the main thing I didn't like about it, and I'm very happy with where we've landed here.

As we discussed, to handle backwards-compatibility on mobile, you'll just want to use that special number for older servers, and then add a conditional for Zulip 2.2.x to use the new interface.

@chrisbobbe
Copy link
Contributor

Great! 🙂

and then add a conditional for Zulip 2.2.x to use the new interface.

We'll store the server version number on login and update it from a /server_settings request done along with every /register request, following #3839 (or something to that effect). 🙂

@gnprice
Copy link
Member

gnprice commented Jan 31, 2020

I think in fact the fix is simpler even than that: we should just start passing that special number, 10000000000000000 (that's 16 zeroes) when we want the latest, instead of the other giant number we use today.

Optionally in the future we can add a conditional to do something else for newer server versions. But the fix for this bug should not block on that.

(See also zulip/zulip#13747 (comment) where I said this earlier.)

chrisbobbe pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Feb 6, 2020
Fixes: zulip#3654.

The server correctly interprets this special number as indicating
that we want the latest messages (0 is used to request the oldest
messages). Later, we can conditionalize on the server version to
support the updated API merged in
zulip/zulip#13747, that use the strings
'oldest' and 'newest' instead of 0 and 10000000000000000.
chrisbobbe pushed a commit to chrisbobbe/zulip-mobile that referenced this issue Feb 6, 2020
Fixes: zulip#3654.

The server correctly interprets this special number as indicating
that we want the latest messages (0 is used to request the oldest
unread messages). Later, we can conditionalize on the server version
to support the updated API merged in
zulip/zulip#13747, that use the strings
'oldest' and 'newest' instead of 0 and 10000000000000000.
@gnprice gnprice closed this as completed in afb320c Feb 7, 2020
Maskedman99 pushed a commit to Maskedman99/zulip-mobile that referenced this issue Feb 12, 2020
Fixes: zulip#3654.

The server correctly interprets this special number as indicating
that we want the latest messages.  The larger value we had been
using will certainly get the latest messages, because it's larger
than any possible message ID, but the server won't make that
inference and can incorrectly report `found_newest: false`.

Later, we can conditionalize on the server version to use the
updated API merged in zulip/zulip#13747,
which uses the string 'newest' instead of a special number.

[greg: revised commit message]
stableapple pushed a commit to stableapple/zulip that referenced this issue Mar 27, 2020
A wart that has long been present inin Zulip's get_messages API is how
to request "the latest messages" in the API.  Previously, the
recommendation was basically to pass anchor=10000000000000000 (for an
appropriately huge number). An accident of the server's implementation
meant that specific number of 0s was actually important to avoid a
buggy (or at least wasteful) value of found_newest=False if the query
had specified num_after=0 (since we didn't check).

This was the cause of the mobile issue
zulip/zulip-mobile#3654.

The solution is to allow passing a special value of anchor='newest',
basically a special string-type value that the server can interpret as
meaning the user precisely just wants the most recent messages.  We
also add an analogous anchor='oldest' or similar to avoid folks
needing to write a somewhat ugly anchor=0 for fetching the very first
messages.

We may want to also replace the use_first_unread_anchor argument to be
a "first_unread" value for the anchor parameter.

While it's not always ideal to make a value have a variable type like
this, in this case it seems like a really clean way to express the
idea of what the user is asking for in the API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants