Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMs screen doesn't show many conversations #3133

Closed
gnprice opened this issue Nov 10, 2018 · 14 comments · Fixed by #4392
Closed

PMs screen doesn't show many conversations #3133

gnprice opened this issue Nov 10, 2018 · 14 comments · Fixed by #4392
Assignees
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority

Comments

@gnprice
Copy link
Member

gnprice commented Nov 10, 2018

The PMs screen shows the PM conversations we've been in recently, but "recently" may be as few as the last 100 PM messages. For a user that heavily uses PMs, this can leave out conversations even from a day or two ago.

One user reported an example as a typical experience: they're looking for a specific conversation, and the last message in it was 3pm yesterday. It's 3:30pm at the time they look, just over 24hr later. It doesn't show up on the PMs screen, though 6 other conversations do.

When there's plenty of blank space on the screen, this looks wrong and flaky/broken. It also makes it more annoying to get to the desired conversation.

In particular, for a group PM conversation, it can look like there's no way to do that at all. (The existing answer is the "Create group" button, which is highly misleading; see #2694.)

@timabbott
Copy link
Member

With in part this issue in mind, the Zulip backend as of Zulip 1.9 added a nice database index for PMs that a given user has received; so we can very quickly do queries on that data.

The really dumb version that could be done with the API today would be to do a GET /messages is:private narrow query, but that would be a lot of data over the network. Probably what we should actually do is write a function similar to the "more topics" function that takes advantage of this index to get you e.g. the (PM recipient User ID list, last_message_id) tuples covering e.g. the last 1K PMs you've had, which probably is a <10ms query on the database end, and either include the data in /register or a separate endpoint (depending what's better for mobile's architecture); I expect it'd be <1hr work for me to add backend support for this if we had confirmation of what format the mobile app wanted. And the app should be able to maintain that data set after /register as new messages come in, since it's basically just "look up the user_id_list, and bump the last_message_id".

(Until the database index work in 1.9, this would have been prohibitively expensive, since we'd need to scan all messages to find enough PMs, which for low-PM users in an organization with a lot of streams traffic could have easily involved the database needing to look at 50K+ messages of history and thus adding 100sms of latency).

I don't think this data set would need unreads data, since the client should have that and be able to splice it in directly and it's probably cleaner to have one source of truth there, but it wouldn't be hard to include.

@borisyankov
Copy link
Contributor

Yeah, indeed the mobile app does not need the unread data. One of the bug that @rishig encountered was that he knew there were unread PMs (correctly) but couldn't see them because of the 100 messages limit.

It would be great to return the first message's raw content in the response for each PM entry so we can include a 'conversation preview' in the list like most other apps do.

@timabbott
Copy link
Member

OK; the message raw content feature would be some extra work (it will probably require an extra DB query and thus have a small perf impact) but not too much. That probably suggests we should use a dict data structure, not a tuple, e.g.

[{
    user_ids: [1, 2, 3],
    last_message_id: 23451,
    ...
}]

In any case, I'd probably want to start with a v1 that just provides the data needed for doing the current UI correctly, and then extend from there.

@gnprice
Copy link
Member Author

gnprice commented Feb 15, 2019

In any case, I'd probably want to start with a v1 that just provides the data needed for doing the current UI correctly, and then extend from there.

@timabbott This sounds good to me! And that format looks good.

I think including the data in /register is best, so that it's all part of the consistent, live-updating snapshot of data.

One optimization: I think we probably want a bound on the number of conversations represented, as well as the number of PMs. E.g., the last <=20 conversations found in the last <=1000 (as you suggested) PMs. That just saves some network traffic, and client-side processing, in extreme cases where the user has tons of different short PM conversations. (With SQL cleverness, perhaps the savings can even extend all the way to the database, but that isn't essential.)

What's a good next step -- should we open an issue on the server repo?

@timabbott
Copy link
Member

Yes, that's a good next step. I think the query we'll want is roughly UserMessage.objects.filter(user_profile=user_profile, flags=UserMessage.flags.is_private).order_by("-id")[0:1000] will get the conversations we want, and then we just need to format the data.

@timabbott
Copy link
Member

I opened the serve issue for this as zulip/zulip#11944

@timabbott
Copy link
Member

I spent this afternoon/evening building the backend that the mobile app will need for this in zulip/zulip#11946. I want to add some unit tests and get some opinions on the "recent_conversations" name, since I'm worried it's ambiguous and should instead be something that explicitly mentions private messages.

In any case, I've test-deployed that PR on chat.zulip.org, so you should be able to start fetching these data by passing the "recent_conversations" event type in the app's list of requested event types.

The data structure for state['recent_conversations'] is of this format:

[{'max_message_id': 700175, 'recipient_id': 955, 'user_ids': [801]},
 {'max_message_id': 703206, 'recipient_id': 5734, 'user_ids': [5140]},
 ...
}]

Each row has a 3 attributes:

  • The list of other user IDs in the conversation. We never list the user's own user ID.
  • The highest message ID in the conversation, which you can use for sorting. (I think the server's initial data will be sorted by this key)
  • A recipient_id, which is a unique key for the conversation. It's possible we shouldn't send this key at all, because it is potentially confusing. For a group PM, it's the recipient ID for the group PM (which you may not have access to in a client?); for a 1:1 PM, it's the 1:1 recipient ID of the other user (regardless of who sent the message). We need it internally for calculation but could easily drop it in these data if it's not useful to clients.

After fetching, clients will need to correctly maintain these data structures as new messages are sent or deleted (Though for deleted, probably you can just not worry about it too much? Unclear).

@gnprice
Copy link
Member Author

gnprice commented Mar 21, 2019

I spent this afternoon/evening building the backend that the mobile app will need for this

Awesome, thanks!

I want to add some unit tests and get some opinions on the "recent_conversations" name, since I'm worried it's ambiguous and should instead be something that explicitly mentions private messages.

I agree -- recent_pm_conversations would be much better. Or possibly recent_private_conversations.

In general I like using "conversation" to mean basically a thread, or the smallest unit it makes sense to narrow to and reply to: it's either

  • a stream + topic; or
  • a group PM conversation, aka a huddle; or
  • a 1:1 PM conversation.

The list of other user IDs in the conversation. We never list the user's own user ID.

Perhaps it'd be helpful to call it other_user_ids? In general I've found it a common source of small confusion whether we're looking at the list of all user IDs or just the ones excluding this user; starting to be explicit might help. ... Then again if we don't migrate existing uses (which it's doubtful we would), it may mostly just add confusion by suggesting those might be inclusive. Ah well.

Separately: how does this behave for the user's PM thread with themself?

A recipient_id, which is a unique key for the conversation. It's possible we shouldn't send this key at all, because it is potentially confusing.

Hmm -- yeah, I think the server "recipient" concept may not be exposed in the API at all right now. So adding one place that refers to it is potentially confusing.

Do these correspond bijectively to values of user_ids? If so, I'd rather avoid duplication; and in the absence of some other data structure to map "recipients" to sets of user IDs, that means keep user_ids.

Of if they don't correspond bijectively with user_ids, what are the cases where they don't?

After fetching, clients will need to correctly maintain these data structures as new messages are sent or deleted (Though for deleted, probably you can just not worry about it too much? Unclear).

Hmm -- yeah, in fact we can't in general maintain them exactly on deletion. If the message that was latest is deleted, we don't know what the next-latest would be.

I think it's OK for us to just say that the semantics of our data structure are slightly relaxed: max_message_id is actually an upper bound on the message IDs, one that's "close enough" that it's suitable for use in sorting the conversations by recency.

@timabbott
Copy link
Member

timabbott commented Mar 21, 2019

Separately: how does this behave for the user's PM thread with themself?

I haven't tested this with the current code, but if I had to guess, it'll include that recipient_id if there are recent messages, and have an empty list of user_ids? I think including it would probably be the correct behavior, certainly (client views could choose to filter it, but one can imagine reasonable client views that display it).

Do these correspond bijectively to values of user_ids? If so, I'd rather avoid duplication; and in the absence of some other data structure to map "recipients" to sets of user IDs, that means keep user_ids.
If if they don't correspond bijectively with user_ids, what are the cases where they don't?

Yes, it's bijective with sets of user_ids as long as we're in a PM context (for a stream message, it's bijective with stream IDs). It's useful for server internals to have them in the intermediate form we use to make apply_events fast and understandable, so it's just a question of whether we include or drop that field in the final post-processing step that turns a dict mapping recipient_id to the other data before sending it down to clients.

I think it's OK for us to just say that the semantics of our data structure are slightly relaxed: max_message_id is actually an upper bound on the message IDs, one that's "close enough" that it's suitable for use in sorting the conversations by recency.

Yeah, that's what I had in mind as well.

So to summarize, we should:

  • Drop the recipient_id key from the data structure (added as a comment on the PR)
  • Rename the structure to recent_private_conversations (added as a comment on the PR)
  • Decide what we're doing with self-PMs.

@gnprice
Copy link
Member Author

gnprice commented Mar 23, 2019

I haven't tested this with the current code, but if I had to guess, it'll include that recipient_id if there are recent messages, and have an empty list of user_ids? I think including it would probably be the correct behavior, certainly (client views could choose to filter it, but one can imagine reasonable client views that display it).

Yeah, that seems like the right behavior to me.

It'd be good to pin down what it in fact does. :-)

In particular I think we've had bits of hairy logic where for the self-PM thread, in some interface or data structure, user_ids is not empty -- instead it contains the user's own ID -- and client-side app code dealing with that gets messy. That in fact seems like an easy result to end up with by accident given the way PMs are implemented.

Yes, it's bijective with sets of user_ids as long as we're in a PM context (for a stream message, it's bijective with stream IDs).

Cool, sgtm then.

@timabbott
Copy link
Member

I fixed the couple naming/structure changes we had planned for zulip/zulip#11946. Also tested self-PMs: the current behavior of the backend PR for self-PMs is that it will never return them. I think I know how to fix this; I'll make a quick attempt at that and then go from there.

@timabbott
Copy link
Member

OK, I've successfully implemented self-PMs as giving [] as the user IDs list, and added a test for that behavior.

@timabbott
Copy link
Member

And merged zulip/zulip#11946, so this should now be ready for mobile app side implementation to take advantage of it; you just need the recent_private_conversations key in what you request from /register.

@gnprice while doing this I added a comment in the fetch_initial_state_data part of zulip/zulip@4c3c669 describing the format of this structure. Would consistent use of comments like that be helpful as a next step for documenting /register?

@gnprice
Copy link
Member Author

gnprice commented Apr 11, 2020

I've just gone back through Isham's PR #3535 from mid-last year, and rebased it and revised to address most of the comments I had when I last reviewed it.

At this point I think the main thing it needs is to take care that we're correctly handling the different forms of ways we identify a PM thread, in our codebase and in the Zulip server API. That's been an ongoing background need; I've just filed #4035 to describe it specifically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-data-sync Zulip's event system, event queues, staleness/liveness P1 high-priority
Projects
None yet
5 participants