Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Request.json() Hangs Forever #944

Closed
wants to merge 28 commits into from

Conversation

nikordaris
Copy link

@nikordaris nikordaris commented May 12, 2020

This fixes #847 by using the Request scope to store consumed stream data so that new instances of Request can access the Request stream content. Previously await Request(...).json() followed by another await Request(...).json() would hang forever because the receive stream was exhausted and await receive() doesn't timeout after message {"more_body": False} is received. This PR is focused only on enabling Request.body(), Request.form() and Request.json() to return the cached data as originally intended and does not address the issue of await receive() hanging forever inside of Request.stream()

@nikordaris nikordaris changed the title Fixes !847 Request Stream Hangs Fixes #847 Request Stream Hangs May 12, 2020
@nikordaris nikordaris changed the title Fixes #847 Request Stream Hangs Fixes Request Stream Hangs May 12, 2020
@nikordaris nikordaris changed the title Fixes Request Stream Hangs Fixes Request.json() Hangs Forever May 12, 2020
@lwinkler-cloudflight
Copy link

is there a reason nobody had a look at this yet? seems fairly straight forward and fixes an issue that we keep stumbling over.

@akarpodinis
Copy link

Any love for this issue? It's been quiet in here for months.

@graingert
Copy link
Member

is this fixed by the anyio port?

@pawelswiecki
Copy link

Any updates?

@gnat
Copy link

gnat commented Sep 24, 2021

Would be really nice to see this go in so multiple calls to await request.form() etc. are managed automatically--- I'm currently storing the result in request.state in the interim so I can make use of it in middlewares, as well as endpoints.

@gnat
Copy link

gnat commented Sep 26, 2021

@Kludex

This comment was marked as off-topic.

@gnat

This comment was marked as off-topic.

@moonso
Copy link

moonso commented Nov 24, 2021

Are there any progress or discussions going on around this?

andrew-chang-dewitt added a commit to andrew-chang-dewitt/hoops-api that referenced this pull request Nov 29, 2021
Required removing request body from NoResultFound handler as Starlette
hangs on parsing body multiple times (see issue encode/starlette/issues/847
PR encode/starlette/pull/944).
@Kludex Kludex requested a review from tomchristie December 14, 2021 18:23
@Kludex
Copy link
Member

Kludex commented Jan 29, 2022

@nikordaris Are you still interested in helping with this PR?

@nikordaris
Copy link
Author

I left a comment on Tom's comment. From my perspective I'm not seeing anything else for me to change with this PR. Let me know if there is anything simple I can tweak on this PR to help it get merged. As I'm no longer on this tech stack I won't likely have time to do any major modifications to this PR

@nikordaris
Copy link
Author

Ok. Where did you want the cache stored in the scope? In extensions? Directly on the scope? If extensions what namespace? starlette? request_cache? @tomchristie

I'll do one more pass at this and then I'm done. I'm not on this tech stack anymore and cant invest any more time on this. Someone else will have to fork my PR and resubmit it with any follow-up changes.

My approach will be to cache the raw stream only for body and form. If stream is used directly it will use the cache if it exists. If the stream is consumed and there is no cache then it will raise a runtime error.

@nikordaris
Copy link
Author

Ok, @tomchristie this is ready for review. The Request.stream no longer caches. If Request.body or Request.form is called it will cache the raw bytes stream into scope.extensions.starlette.stream. If Request.stream is called followed by Request.body, a runtime error will be thrown indicating the stream has been consumed. If Request.body or Request.form is called followed by Request.stream, no error is raised and the cached stream will be returned. I think this lines up with your concerns around a large stream getting stored in memory while still preserving the convenience of leveraging the cache from body or form. Once you decide where in the scope you want to store the cached stream, i'll update that location and then this PR should be complete.

@adriangb adriangb added the bug Something isn't working label Feb 2, 2022
@tomchristie
Copy link
Member

Just bumping by to mention that this is on my personal backlog at the moment, but I don't want to deal with it until I can give it some proper attention, since it's a bit of a fiddly area.

Please do feel free to nudge me next week if I don't already get onto looking at it before then.

@tomchristie
Copy link
Member

Okay, so I'd been pushing back against this approach because it changes the nature of streaming the request.

If we cache all the data we get when we stream a request, then we're always going to store the entire body in memory. That's find in cases where you're accessing the body (because you needed it all in memory in that cases), or in cases where you're accessing the request as JSON (because, same). Tho it's not ideal in cases where you're accessing request.form(), because in that case we're careful that the request parsing deliberately doesn't store the entire body in memory.

There's an alternate approach that'd essentially be to raise an error there's an attempt to stream the request more than once but that does cache the result of .body and .form.

However at this point, I'm kinda thinking that we should possibly just suck it up, and accept the trade-off because it results in a much simpler set of "here's what you can and can't do when repeatedly accessing the request content".

Anyone?

@adriangb
Copy link
Member

If we cache all the data we get when we stream a request, then we're always going to store the entire body in memory. That's fine in cases where you're accessing the body (because you needed it all in memory in that cases), or in cases where you're accessing the request as JSON (because, same). Tho it's not ideal in cases where you're accessing request.form(), because in that case we're careful that the request parsing deliberately doesn't store the entire body in memory.

I think I agree with what you say, but I'm not understanding the relationship to this PR.
Let me echo back to you my interpretation to see if I'm missing anything.

My understanding is that currently in Starlette if you do Request.body() (or anything that calls that, like Request.json()) the body gets cached. On the other hand, if you do Request.stream() (or anything that calls that, like Request.form()) then the body is not cached in memory. From my understanding of this PR that is preserved, the only difference is that when the body is cached it is cached into the ASGI scope instead of the Request instance. Right @nikordaris ?

More generally, I think it would make sense to have an API like Request.form(consume=False) and Request.json(consume=True) (with defaults to the current behavior) that allow users to control this. That would I think aleviate confusion because instead of having to dig into the implementation details of Request.json or Request.form (or look at some footnote in the docs) you can see just from the function signature what's going on. Not only that, but if you're looking for this info it's probably because you want the opposite behavior, so you also already know how to change it. But again, this would have to be another PR.

@tomchristie
Copy link
Member

tomchristie commented Feb 15, 2022

Okay - this looks good - I misunderstood slightly on first pass. (Thanks for clearing that up for me @adriangb)

I think we also need to be careful to preserve the RuntimeError("Stream consumed") case.

I've created #1519 as a similar approach that also handles the stream consumed state.

@nikordaris
Copy link
Author

nikordaris commented Feb 17, 2022

@tomchristie it looks like this PR should be closed out in favor of #1519 now, correct?

@nikordaris nikordaris closed this Feb 18, 2022
@gnat
Copy link

gnat commented Feb 19, 2022

Thank you @nikordaris for his valiant effort on this!! Looking forward to seeing the resolution to this in #1519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Middleware Request parse Hangs forever