Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Task stuck in active state forever #420

Open
mailbaoer opened this issue Mar 17, 2022 · 15 comments
Open

[BUG] Task stuck in active state forever #420

mailbaoer opened this issue Mar 17, 2022 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@mailbaoer
Copy link

Describe the bug
I have some tasks not set timeout, for a long time run, I found some tasks are always in running state, I tried to cancel them in web ui or cli, they can't be canceled, and the state changed to running after canceling

To Reproduce
Steps to reproduce the behavior (Code snippets if applicable):
sorry, I don't know how reproduce

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
screenshot-20220317-101327

Environment (please complete the following information):

  • OS: [e.g. MacOS, Linux]
  • Version of asynq package [e.g. v1.0.0]

Additional context
Add any other context about the problem here.

@mailbaoer mailbaoer added the bug Something isn't working label Mar 17, 2022
@hibiken
Copy link
Owner

hibiken commented Mar 17, 2022

@mailbaoer Thank you for opening an issue!

Would you mind providing the version of asynq package you are using :)

@mailbaoer
Copy link
Author

I'm use 0.22.1 now, but this bug may exists before this version, I've see this in other versions, maybe 0.18 for my first time use asynq

@hibiken
Copy link
Owner

hibiken commented Mar 17, 2022

I see.

We've made some improvements around orphaned task recovery in v0.22. If you are using latest version of Web UI (v0.6.0), you'll see the status of the tasks will show "Orphaned". This happens when a worker start working on a task but crashes before completing the processing.

If you run a server against the same queue, they'll be recovered automatically after some time period (i.e. after a few heartbeat misses). Once the task is orphaned, they are no longer cancelable (the latest web UI will disable the cancel button)

Follow up questions:

  • Are you running a server against the queue these tasks are in?
  • Would you mind running this redis command to see what's in the lease set? (ZRANGE asynq:{default}:lease 0 -1 WITHSCORES)

@mailbaoer
Copy link
Author

  1. Yes, I started a worker service on my server only run tasks for background, but NewxxxTask run with another api service to add tasks
  2. I've run this command,but does not see anything
    screenshot-20220318-103721

@hibiken
Copy link
Owner

hibiken commented Mar 18, 2022

Ok, thanks for providing that info.

Would you mind running this command:

ZRANGE asynq:{default}:deadlines 0 -1 WITHSCORES

@mailbaoer
Copy link
Author

image
all the tasks stays in running for a lot of days, It's still running now, and I'm upgrade asynqmon to 0.6.1

screenshot-20220318-120234

@hibiken
Copy link
Owner

hibiken commented Mar 18, 2022

That's very strange. I thought you'd have entries in either asynq:{default}:deadlines (used by v0.21.x or below) or asynq:{default}:lease (used by v0.22.x). These zsets are used to recover orphaned tasks in case of worker crash, but the fact that there's no entries there seem something unexpected happened.
I'll keep this bug open to see if others have encountered similar issue and get more context.

Please let me know if you can reproduce this, I'd like to know how to reproduce this bug.


If you need to address this manually, you can get a list of "active" tasks and put their IDs back in the pending list (note: the IDs you see in the image above is just a prefix, so make sure to click into each row to get the full ID)
Once you have the IDs, you can

  1. delete them from the active list (LREM asynq:{default}:active 1 <task_id>
  2. put them back on the pending list LPUSH asynq:{default}:pending <task_id>

@hibiken hibiken changed the title [BUG] Can't cancel task [BUG] Task stuck in active state forever Mar 18, 2022
@mailbaoer
Copy link
Author

Thank you very much for your patience in answering, if I encounter this problem again I will check to see if I can reproduce it

@namhq1989
Copy link

any update? I met this bug too. asynq v0.22.1, redis v5.0.7

@hibiken
Copy link
Owner

hibiken commented Mar 22, 2022

@namhq1989 Thanks for the comment. We're looking for a way to reproduce this.

For anyone experienced this bug, please provide the following:

  • If possible, the steps to reproduce this bug.
  • Asynq version
  • Output of the following command (replace the <qname> with the queue name (e.g. asynq:{default}:lease):
    • (for asynq v0.22.x or above): ZRANGE asynq:{<qname>}:lease 0 -1 WITHSCORES
    • (for asynq v0.21.x or below): ZRANGE asynq:{<qname>}:deadlines 0 -1 WITHSCORES
  • Whether the IDs of the orphaned tasks are in the output above

@dokudoki
Copy link

dokudoki commented Mar 31, 2022

  • I think I ran into this issue when the worker throw a "fatal error" and exited.
  • Asynq version v0.21.0
 1) "4d841b6c-1de5-4a70-9233-db16a2493831"
 2) "1648764430"
 3) "af985397-6ee3-4f4d-825d-e926a0a6d1cd"
 4) "1648764430"
 5) "10dff61b-9f5c-4d30-a2f6-42cdb9a05f91"
 6) "1648764440"
 7) "45a12b75-0b83-4e2b-bc7c-93357207ed36"
 8) "1648764440"
 9) "d40431d2-e634-4922-acd0-ab435ec6ac47"
10) "1648764440"
11) "2117339c-db7a-4790-a255-7c9ef0a19b74"
12) "1648765083"
13) "5c1fb99c-60ec-49b1-b7d1-04d953c9d9f0"
14) "1648765083"
15) "932ad61e-d2c1-4f7f-8719-9af9db04f37a"
16) "1648765882"
17) "e1ae65d7-978c-4ac0-a152-3312e458df5e"
18) "1648765882"

@piperck
Copy link

piperck commented May 4, 2022

I met this too.. If it happens again, I will try to investigate it.

@paveljanda
Copy link

Hi, we never had this problem as we never used asynq before. BUT if anyone could describe how to reproduce this bug and it gets fixed/solved, it would help us choosing the right distributed task queue for our projects. Currently choosing from ~5 contestants.

Thanks a lot to everyone!

@zijiwork
Copy link

I encountered this problem in the following versions, the task is always in the active state after the worker restarts, and the task cannot be closed

asynq v0.19.0
asynqmon v0.4.0
redis_version 5.0.4
luban:0>ZRANGE asynq:{critical}:deadlines 0 -1 WITHSCORES
1) "bab56af5-f43e-4192-a5f3-b7eb03246b8b"
2) "1657874411"
3) "f559737f-f729-4716-bfc5-ed0fe6a720a4"
4) "1657874530"
5) "3b839bf5-9ba8-43c9-bc21-906a925e945e"
6) "1657875931"
7) "77147b6f-095d-47d6-a2ce-35d3ad3d80e2"
8) "1657875957"

@KrokoYR
Copy link

KrokoYR commented Oct 8, 2024

We accidentally met this kind of issue. What we did:

  • deployed a new queue service(let's call it NEW) with such setup: redis.Options{Addr: "{addr}", DB: 10}
  • already existing service(let's call it OLD) has the following setup: redis.Options{Addr: "{addr}", DB: 0}
  • both of each services have queue with same name, let's say some_queue

This led to a problem that those services started "stealing" each others tasks. So we made a mistake, that we didn't pass DB number into asynq.RedisClientOpt. Maybe it will help someone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants