PMs from new sources not immediately displayed in PMs list #3654

rk-for-zulip · 2019-10-18T01:02:34Z

When a user receives a PM from a new source (i.e., one that had not previously sent them a PM), the center icon will correctly display a notification for a new message... but when tapped, that conversation-source does not appear in the PMs list; it remains unrefreshed.

(Force-stopping and reentering the app will also force a refresh.)

gnprice · 2019-10-18T01:36:56Z

Yikes, that's a pretty bad symptom!

Would definitely be good to fix. The bug will very probably be in one of the "reducer" files maintaining the data structures that PM conversation list relies on.

chrisbobbe · 2020-01-11T02:22:47Z

Aha! The 'narrows' state is not being updated to reflect the new messages, when it should.

Here's why: When a PM arrives and an EVENT_NEW_MESSAGE action is dispatched, action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is used by the narrowsReducer (in the eventNewMessage function) to decide whether to update the state. action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is just taken from the previous state, seen in the 'message' case in eventToAction.js. If the user has seen the most recent message, i.e., if action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is true for that narrow, the state updates. If not, then the user is scrolling through old messages, probably from a link to some ancient history, and it would be confusing to append new messages directly after those messages, so the state does not update:

(src/chat/narrowsReducer.js, in eventNewMessage)

    const isInNarrow = isMessageInNarrow(action.message, JSON.parse(key), action.ownEmail);
    const isCaughtUp = action.caughtUp[key] && action.caughtUp[key].newer;
    const messageDoesNotExist = state[key].find(id => action.message.id === id) === undefined;

    if (isInNarrow && isCaughtUp && messageDoesNotExist) {
      stateChange = true;
      newState[key] = [...state[key], action.message.id];
    } else {
      newState[key] = state[key];
    }

But action.caughtUp[ALL_PRIVATE_NARROW_STR].newer is always false, despite the fact that (I believe) it should actually be always true. There's an initial fetch specifically of private messages that should be getting all of the most recent ones, and should be changing that .newer to true. (/messages is called from fetchPrivateMessages which is dispatched during doInitialFetch in src/message/fetchActions.js.) But the response to that /messages call has false for found_newest, so perhaps there's a server-side bug, or something wrong with the /messages request.

Further investigation will almost certainly involve #3133 (currently open as P1 from November 2018); I just wanted to jot this finding down before I forget.

gnprice · 2020-01-13T21:24:19Z

Thanks @chrisbobbe for the investigation and that writeup!

There's an initial fetch specifically of private messages that should be getting all of the most recent ones, [... b]ut the response to that /messages call has false for found_newest, so perhaps there's a server-side bug, or something wrong with the /messages request.

Yeah, this sounds like a point where something's already gone wrong, then.

The code that makes that request (fetchPrivateMessages which is dispatched during doInitialFetch in src/message/fetchActions.js, as you said) looks like this:

const fetchPrivateMessages = () => async (dispatch: Dispatch, getState: GetState) => {
  const auth = getAuth(getState());
  const { messages, found_newest, found_oldest } = await tryUntilSuccessful(() =>
    api.getMessages(auth, ALL_PRIVATE_NARROW, LAST_MESSAGE_ANCHOR, 100, 0),

where

export const LAST_MESSAGE_ANCHOR = Number.MAX_SAFE_INTEGER;

and that api.getMessages binding looks like

export default async (
  auth: Auth,
  narrow: Narrow,
  anchor: number,
  numBefore: number,
  numAfter: number,
  useFirstUnread: boolean = false,
): Promise<ApiResponseMessages> => {
  const response: ServerApiResponseMessages = await apiGet(auth, 'messages', {
    narrow: JSON.stringify(narrow),
    anchor,
    num_before: numBefore,
    num_after: numAfter,
    apply_markdown: true,
    use_first_unread_anchor: useFirstUnread,
  });

It sure looks to me like that ought to result in found_newest being true -- we're asking specifically for the very latest messages. Perhaps there's a subtlety in the API here, or perhaps just a server-side bug. @timabbott , do you have any ideas on why a request like this would produce found_newest false?

rk-for-zulip · 2020-01-13T21:49:49Z

This may be due to our use of stringified JSON objects as keys in state.narrows – unless someone's imposed new restrictions while I wasn't looking, the JSON.stringify of two structurally identical objects is not guaranteed to be identical. (I'm not even sure two consecutive serializations of the same object are guaranteed to be identical.)

timabbott · 2020-01-13T23:16:09Z

I'd certainly expect that the result for any query of that form to have found_newest=true; if it's not, I can investigate. It'd be pretty surprising if it didn't; let me know if you see that behavior through logging.

chrisbobbe · 2020-01-14T00:39:16Z

Hmm, so with some quick logging on the server (please let me know if there's a better way to do this or you need something more specific) --

(zerver/views/messages.py, get_messages_backend)

-- I see that it's showing False:

timabbott · 2020-01-14T00:56:41Z

Interesting! I think what's going on is a consequence of your requesting 0 messages as num_after, and so the server didn't check whether indeed there are any messages with a newer message ID, and thus cannot prove that it had fetched the latest messages.

Writing more details now.

timabbott · 2020-01-14T01:18:54Z

For now, I think against legacy Zulip servers with the current behavior, while the mobile app is using the specific type of query it's using, it should assume found_newest=true if it set a max_int style anchor value, because the current behavior is at best confusing, and maybe just wrong.

Here's the relevant code:

    # In a function just for setting the anchor when use_first_unread_anchor=True
    if len(first_unread_result) > 0:                                                                 
        anchor = first_unread_result[0][0]                                                           
    else:                                                                                            
        anchor = LARGER_THAN_MAX_MESSAGE_ID                                                          
    ...
    anchored_to_right = (anchor == LARGER_THAN_MAX_MESSAGE_ID)                                       
    ...
    found_newest = anchored_to_right or (len(after_rows) < num_after)

The logic is as follows, roughly:

We set found_newest using found_newest = anchored_to_right or (len(after_rows) < num_after). The second case is if we actually fetched messages for an "num_after>0` query; the first case is intended to cover the case that we're doing a max_int style anchor and didn't try to look for messages newer than the provided anchor.
The code for setting anchored_to_right is a bit hacky if use_first_unread_anchor=true. If we found at least 1 unread message, it assumes found_newest=False (since there could be more messages newer than the first unread message, and isn't willing to certify that). If there are none, it hackily sets anchor to LARGER_THAN_MAX_MESSAGE_ID = 10000000000000000 (which when written was intended to be a purely internal optimization within that code path).
The webapp, perhaps critically for this not having been noticed the server's behavior is bad, also uses 10000000000000000 as its message ID for !use_first_unread_anchor queries with num_after=0 (the webapp code to do that was added much after; not sure if this was intentional; the JS code was certainly added much later than the server code). So the webapp is as of November's merge of 452e226ea25c1315edb1f31be406475d82c56a88 accidentally relying on this weird behavior with that specific number, I think.

I haven't had time to think through this deeply but here are a few thoughts:

From a specification perspective, found_newest=False was meant to mean that the client needs to fetch from the relevant narrow again with num_after>0 in order to get the latest messages. I don't think we thought about defining what it should mean when you didn't ask for any newer messages. In theory, there could have been some with a higher message ID than the number passed, and the server didn't check.
I suspect the mobile app could switch to using that 10000000000000000 value as its anchor and have the problem go away; we should test that. This might be the best option for backward-compatibility with older servers giving us freedom to fix the server. And then maybe we should document that number as such if indeed it is meant to be.
For the server, I think there's a few options for how we could proceed.
- I don't really want to require that the server do an extra database query by setting num_after=1 to check if there are any newer messages, but in theory it could do that for correctness' sake. There's unfortunately not a super cheap way to check if a value is above the maximum message ID in the database. But we could potentially declare that message IDs being less than 10000000000000000 is a documented invariant that clients can rely on.
- We could consider having the server not send a found_newest at all when an anchor is specific and num_after=0 to be clear that the server didn't check.
- We might want to use a >= check rather than an equal check if we're using this 10000000000000000 number at all.

chrisbobbe · 2020-01-16T22:48:25Z

Thanks for the writeup and suggestions, @timabbott! I haven't forgotten about this, but I'd like to check in with @gnprice about it next week when he's in the office.

gnprice · 2020-01-28T01:34:42Z

Thanks @timabbott for the investigation!

For now, I think against legacy Zulip servers with the current behavior, while the mobile app is using the specific type of query it's using, it should assume found_newest=true if it set a max_int style anchor value, because the current behavior is at best confusing, and maybe just wrong.

This sounds like a good client-side fix. I think what this comes out to is that whenever we use LAST_MESSAGE_ANCHOR as the anchor, we should ignore the found_newest the server returns and instead use true there.

(And when further we're passing use_first_unread_anchor: false? That should always be the case when using LAST_MESSAGE_ANCHOR anyway.)

... Oh aha, but:

I suspect the mobile app could switch to using that 10000000000000000 value as its anchor and have the problem go away; we should test that. This might be the best option for backward-compatibility with older servers giving us freedom to fix the server.

That sounds like a simpler workaround 🙂 We'd coax the server into giving the right answer in the first place, rather than fix up the answer with our own logic. I think this'd be just a one-line change to the LAST_MESSAGE_ANCHOR definition.

For the server-side fix:

I think there definitely ought to be a way to express a query "give me the very latest messages, whatever the message IDs might be." That's a very natural query to make.
- It seems like both the webapp and mobile app have hacked this by passing a giant message ID that's assumed to be larger than any real one.
- But better yet would be a way to express that that's explicitly understood by both client and server. This is stronger than e.g. "message IDs being less than 10000000000000000 is a documented invariant that clients can rely on" -- it'd mean the client can rely on the server relying on that fact.
- In particular, the server is giving wrong answers here for found_newest. It's doing that because, as you say, to directly check in the DB whether there's a later message isn't free. But when the point of the query is "give me the very latest messages", no DB query is needed -- the server just needs a way to be able to rely on the fact that's the point, including in the bit of code that's computing found_newest.
One way to do that would be to declare that the value 10000000000000000 means that.
- Better, all values at least that, as you say.
Another would be to pick some other value, perhaps a string constant like 'latest' (so anchor: 'latest'), as a way of expressing that.
- This feels cleaner in some ways; it's more explicit. The type is then number | 'latest'.
- We'd still use 10000000000000000 in the app for now for compatibility, but a comment would point to the update so in a few years it's easy to say "oh, Zulip 2.2, that's ancient, we can use the explicit form now."
  - For that matter, we could use number | 'latest' already when passing anchors around our own code, and just convert to the special number at the edge in the API binding.

timabbott · 2020-01-28T02:27:32Z

Yeah, anchor: latest seems like a reasonable way to declare that interface; it seems a lot cleaner than adding a new parameter or something.

A wart that has long been present inin Zulip's get_messages API is how to request "the latest messages" in the API. Previously, the recommendation was basically to pass anchor=10000000000000000 (for an appropriately huge number). An accident of the server's implementation meant that specific number of 0s was actually important to avoid a buggy (or at least wasteful) value of found_newest=False if the query had specified num_after=0 (since we didn't check). This was the cause of the mobile issue zulip/zulip-mobile#3654. The solution is to allow passing a special value of anchor='newest', basically a special string-type value that the server can interpret as meaning the user precisely just wants the most recent messages. We also add an analogous anchor='oldest' or similar to avoid folks needing to write a somewhat ugly anchor=0 for fetching the very first messages. We may want to also replace the use_first_unread_anchor argument to be a "first_unread" value for the anchor parameter. While it's not always ideal to make a value have a variable type like this, in this case it seems like a really clean way to express the idea of what the user is asking for in the API.

timabbott · 2020-01-29T20:23:44Z

The server-side change, zulip/zulip/#13747 is merged and deployed on chat.zulip.org. We went with oldest / newest / first_unread for the supported string parameter values for anchor.

Thanks for all the design work on this; that "giant message ID" business has since we first designed the GET /messages API the main thing I didn't like about it, and I'm very happy with where we've landed here.

As we discussed, to handle backwards-compatibility on mobile, you'll just want to use that special number for older servers, and then add a conditional for Zulip 2.2.x to use the new interface.

chrisbobbe · 2020-01-29T20:30:35Z

Great! 🙂

and then add a conditional for Zulip 2.2.x to use the new interface.

We'll store the server version number on login and update it from a /server_settings request done along with every /register request, following #3839 (or something to that effect). 🙂

gnprice · 2020-01-31T22:26:26Z

I think in fact the fix is simpler even than that: we should just start passing that special number, 10000000000000000 (that's 16 zeroes) when we want the latest, instead of the other giant number we use today.

Optionally in the future we can add a conditional to do something else for newer server versions. But the fix for this bug should not block on that.

(See also zulip/zulip#13747 (comment) where I said this earlier.)

Fixes: zulip#3654. The server correctly interprets this special number as indicating that we want the latest messages (0 is used to request the oldest messages). Later, we can conditionalize on the server version to support the updated API merged in zulip/zulip#13747, that use the strings 'oldest' and 'newest' instead of 0 and 10000000000000000.

Fixes: zulip#3654. The server correctly interprets this special number as indicating that we want the latest messages (0 is used to request the oldest unread messages). Later, we can conditionalize on the server version to support the updated API merged in zulip/zulip#13747, that use the strings 'oldest' and 'newest' instead of 0 and 10000000000000000.

Fixes: zulip#3654. The server correctly interprets this special number as indicating that we want the latest messages. The larger value we had been using will certainly get the latest messages, because it's larger than any possible message ID, but the server won't make that inference and can incorrectly report `found_newest: false`. Later, we can conditionalize on the server version to use the updated API merged in zulip/zulip#13747, which uses the string 'newest' instead of a special number. [greg: revised commit message]

A wart that has long been present inin Zulip's get_messages API is how to request "the latest messages" in the API. Previously, the recommendation was basically to pass anchor=10000000000000000 (for an appropriately huge number). An accident of the server's implementation meant that specific number of 0s was actually important to avoid a buggy (or at least wasteful) value of found_newest=False if the query had specified num_after=0 (since we didn't check). This was the cause of the mobile issue zulip/zulip-mobile#3654. The solution is to allow passing a special value of anchor='newest', basically a special string-type value that the server can interpret as meaning the user precisely just wants the most recent messages. We also add an analogous anchor='oldest' or similar to avoid folks needing to write a somewhat ugly anchor=0 for fetching the very first messages. We may want to also replace the use_first_unread_anchor argument to be a "first_unread" value for the anchor parameter. While it's not always ideal to make a value have a variable type like this, in this case it seems like a really clean way to express the idea of what the user is asking for in the API.

gnprice added a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority labels Oct 18, 2019

chrisbobbe self-assigned this Jan 10, 2020

chrisbobbe mentioned this issue Jan 13, 2020

internal: private messages can exist in state.narrows, but not in state.messages #3749

Open

rk-for-zulip mentioned this issue Jan 14, 2020

People tab not updated on PMing someone for the first time #3799

Closed

timabbott mentioned this issue Jan 28, 2020

Provide a better API for anchor=10000000000 zulip/zulip#13747

Closed

This was referenced Feb 1, 2020

Sometimes the app doesn't pick up messages #1465

Closed

New private message/group does not appear after making a new message group #3460

Closed

chrisbobbe mentioned this issue Feb 6, 2020

Set LAST_MESSAGE_ANCHOR to 10000000000000000 (sixteen zeroes). #3877

Merged

gnprice closed this as completed in afb320c Feb 7, 2020

chrisbobbe mentioned this issue Feb 20, 2020

Messages disappear on send #3117

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PMs from new sources not immediately displayed in PMs list #3654

PMs from new sources not immediately displayed in PMs list #3654

rk-for-zulip commented Oct 18, 2019

gnprice commented Oct 18, 2019

chrisbobbe commented Jan 11, 2020 •

edited

Loading

gnprice commented Jan 13, 2020

rk-for-zulip commented Jan 13, 2020

timabbott commented Jan 13, 2020 •

edited

Loading

chrisbobbe commented Jan 14, 2020 •

edited

Loading

timabbott commented Jan 14, 2020

timabbott commented Jan 14, 2020 •

edited

Loading

chrisbobbe commented Jan 16, 2020

gnprice commented Jan 28, 2020

timabbott commented Jan 28, 2020

timabbott commented Jan 29, 2020

chrisbobbe commented Jan 29, 2020

gnprice commented Jan 31, 2020 •

edited

Loading

PMs from new sources not immediately displayed in PMs list #3654

PMs from new sources not immediately displayed in PMs list #3654

Comments

rk-for-zulip commented Oct 18, 2019

gnprice commented Oct 18, 2019

chrisbobbe commented Jan 11, 2020 • edited Loading

gnprice commented Jan 13, 2020

rk-for-zulip commented Jan 13, 2020

timabbott commented Jan 13, 2020 • edited Loading

chrisbobbe commented Jan 14, 2020 • edited Loading

timabbott commented Jan 14, 2020

timabbott commented Jan 14, 2020 • edited Loading

chrisbobbe commented Jan 16, 2020

gnprice commented Jan 28, 2020

timabbott commented Jan 28, 2020

timabbott commented Jan 29, 2020

chrisbobbe commented Jan 29, 2020

gnprice commented Jan 31, 2020 • edited Loading

chrisbobbe commented Jan 11, 2020 •

edited

Loading

timabbott commented Jan 13, 2020 •

edited

Loading

chrisbobbe commented Jan 14, 2020 •

edited

Loading

timabbott commented Jan 14, 2020 •

edited

Loading

gnprice commented Jan 31, 2020 •

edited

Loading