-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cap preview cols to prevent stall when csv is parsed with the wrong delimiter #3786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for digging into this! The way you've addressed it looks good. A couple of minor comments:
Is your thinking with the underscored _tr similar to this? https://github.com/ankitects/anki/pull/3640/files/14c022f6de65491d5832949df802ba2ae7c8074b#diff-5f735d59ca95151c8a3a6aca3a068eef5ea8a3a9d3ff9abdebd7040c9f6722e6
Yep, it'd make svelte treat
Included the |
Would you mind elaborating on what you mean by that? I changed the _tr to tr in my local checkout, and it still passes tests/works correctly. |
Well that's embarrassing. I tried it earlier in the morning and got a |
I've experienced that many a time too - don't sweat it :-) Thank you for this fix! |
Resolves #3588
Looking at what passes for the csv spec, the example is considered invalid csv when pipe-delimited. But python's csv.reader parses it in the same way as the crate. I can't speak to how they handle quirks, but my guess is that the last pipe in each row having one unclosed quote after it causes the record terminator to be quoted and ignored, and the file ends up as being 1 header and 1 record long
The stall doesn't seem to be happening on the backend (which parses and responds in ~100ms), but on the webpage itself. Profiling on firefox confirms that it's choking on rendering ~40000 columns in the preview table
This pr proposes to cap the preview table's columns at 1000, a number small enough that it should render reasonably quickly (~1s, prev ~20s), and ludicrously large enough that exceeding it almost certainly means wrong delimiter and/or invalid csv (who has notetypes with >1000 fields?)
This'll allow the user to quickly recover from a wrong guess by the delimiter heuristic, rather than have to wait ~20 seconds for the page to load
A warning is shown when columns have been clipped:

EDIT: failing on an openssl advisory 👀