Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cap preview cols to prevent stall when csv is parsed with the wrong delimiter #3786

Merged
merged 6 commits into from
Feb 6, 2025

Conversation

iamllama
Copy link
Contributor

@iamllama iamllama commented Feb 2, 2025

Resolves #3588

Sure, here is an example which takes 25 seconds to load for my pc.

Looking at what passes for the csv spec, the example is considered invalid csv when pipe-delimited. But python's csv.reader parses it in the same way as the crate. I can't speak to how they handle quirks, but my guess is that the last pipe in each row having one unclosed quote after it causes the record terminator to be quoted and ignored, and the file ends up as being 1 header and 1 record long

The stall doesn't seem to be happening on the backend (which parses and responds in ~100ms), but on the webpage itself. Profiling on firefox confirms that it's choking on rendering ~40000 columns in the preview table

This pr proposes to cap the preview table's columns at 1000, a number small enough that it should render reasonably quickly (~1s, prev ~20s), and ludicrously large enough that exceeding it almost certainly means wrong delimiter and/or invalid csv (who has notetypes with >1000 fields?)

This'll allow the user to quickly recover from a wrong guess by the delimiter heuristic, rather than have to wait ~20 seconds for the page to load

A warning is shown when columns have been clipped:
image

EDIT: failing on an openssl advisory 👀

Copy link
Member

@dae dae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for digging into this! The way you've addressed it looks good. A couple of minor comments:

Is your thinking with the underscored _tr similar to this? https://github.com/ankitects/anki/pull/3640/files/14c022f6de65491d5832949df802ba2ae7c8074b#diff-5f735d59ca95151c8a3a6aca3a068eef5ea8a3a9d3ff9abdebd7040c9f6722e6

@iamllama
Copy link
Contributor Author

iamllama commented Feb 3, 2025

Is your thinking with the underscored _tr similar to this?

Yep, it'd make svelte treat <tr> as a component otherwise. Might be cleaner to just import the one translation, but i figured having tr. in there would be more greppable 🤔

If this was just for the translation system, you could probably even drop the [one] case, and mention in the translation comment that $count will only ever be 1000 or a similarly-large number, which would allow translators to skip zero/one/few cases.

Included the [one] case for completeness' sake, dropping it seems better than changing it to something like "Only the first 1 column is shown", which sounds off 😅. Added a comment to clarify that as suggested

@dae
Copy link
Member

dae commented Feb 3, 2025

Yep, it'd make svelte treat as a component otherwise. Might be cleaner to just import the one translation, but i figured having tr. in there would be more greppable 🤔

Would you mind elaborating on what you mean by that? I changed the _tr to tr in my local checkout, and it still passes tests/works correctly.

@iamllama
Copy link
Contributor Author

iamllama commented Feb 3, 2025

Well that's embarrassing. I tried it earlier in the morning and got a component_name_lowercase svelte warning, along with the tags silently being removed from the rendered table, but now it seems fine? I must've done something wrong back then, sorry about that!

@dae
Copy link
Member

dae commented Feb 6, 2025

I've experienced that many a time too - don't sweat it :-)

Thank you for this fix!

@dae dae merged commit 2cbcd79 into ankitects:main Feb 6, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug with importer for files containing quotes
2 participants