Skip to content

fix: initial app status and zero-based id #18622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

code-asher
Copy link
Member

@code-asher code-asher commented Jun 26, 2025

We are discarding all "working" updates from the screen watcher because we cannot tell the difference between the agent or user changing the screen, but it makes sense to accept it as the very first update, because the agent could be working but neglected to report that fact, so you would never get an initial "working" update (it would just eventually go straight to "idle").

Also messages can start at zero, so I made a fix for that as well, although I think the first message will be from the LLM and we ignore those anyway, so this probably has no actual effect, but seems more technically correct.

And it seems I forgot to actually update the last message ID, which I think also does not actually matter for user messages (since I think the SSE endpoint will not re-emit a user message it has already emitted), but seems more technically correct to check.

I still need to fix coder/internal#713 but I will do that in a separate PR. Going to do the one second wait thing we talked about for updates from the watcher.

We are discarding all "working" updates from the screen watcher that
does not include a new user message, but it makes sense to accept the
very first message, because the agent could be working but neglected to
report that fact.
@code-asher code-asher force-pushed the asher/fix-initial-status branch 2 times, most recently from 62b94b7 to 7454e49 Compare June 26, 2025 21:03
Two things:

1. Message IDs can be zero-based.
2. We were not actually updating the last user message ID.

I refactored a little to keep track of the last message in lastReport.
@code-asher code-asher force-pushed the asher/fix-initial-status branch from 7454e49 to 9cef0df Compare June 26, 2025 21:23
@code-asher code-asher requested a review from hugodutka June 26, 2025 22:58
@hugodutka
Copy link
Contributor

On second glance, there seems to be a problem when an AI agent reports a completed status but continues working on the task without providing a new working status. Our web UI would render an idle status until the next user message, even though it's apparent the agent is still working. I think we should treat the agent as completely unreliable and only trust it for summaries. Whenever we get an MCP update, we should report a "working" status. We'd consider the agent idle only once AgentAPI reports that the screen is stable.

@code-asher
Copy link
Member Author

Oooo yeah that makes sense. I will make that change.

@code-asher code-asher requested a review from hugodutka June 27, 2025 20:43
@code-asher
Copy link
Member Author

Simple change, but re-requested review just in case I did something brain-dead 😄

@code-asher
Copy link
Member Author

Ah wait need to update tests 🤦

@code-asher code-asher force-pushed the asher/fix-initial-status branch from 24bd249 to d1a09cb Compare June 27, 2025 21:09
@code-asher
Copy link
Member Author

code-asher commented Jun 27, 2025

lol with this change the PR now fixes coder/internal#713 (in the sense that it no longer triggers the race) but I still plan to fix the race and add a new test that can reproduce the race

Before, I was relying on the agent updating to "failure", using it as a sentinel value to make sure that the previous update was skipped, but that is no longer possible (and it was unreliable anyway due to the race). I think maybe I might have to mock the queue itself to properly test but open to ideas if you have any.

Another idea is that if AgentAPI could report a failure state, I could use that as a sentinel value in the test. Or if it could also send a summary, I could use that. Any kind of third thing I could send to prove the update was skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestExpMcpReporter
2 participants