-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing unicode keys returns bytestrings that must be decoded as utf8. #19
Comments
Hmm, interesting. I don't remember why/where the keys become utf8. But it certainly makes sense if what you get out of the dictionary is what you put in in the first place (unicode => unicode, rather than unicode => str, as you say). I'd be a little careful with changes that break backward compatibility for |
Makes sense. You're right we need to be very careful for compatibility. This is actually quite complicated in such respects. You're looking for:
Thinking some more, downstream, I'm always explicitly decoding as utf-8, and such code would fail after the change: With a I've been in these situations a few times before, open source api libraries are the hardest... here are some suggestions. A. provide an entirely new package I think this is the only way not to cause anybody trouble, and new users of the library get the "unicode in, unicode out" behavior without having to research using any toggles or parameters to make it so.
This is nice, because the code change is minimal, it can simply naturally store by type, and return the same type. Easy to test and no magic about backwards compatibility. B. do not break compatibility
C. break compatibility I won't go into the details much more, because this one is probably the least likely. But basically document the change in Changelog, provide example error messages caused by the change which users might run into to help search engines find the cause, and the workaround (set |
I'm -1 on A, +1 on B or C. And if C: just mention the change in Changelog, bump up the version to 2.0.0 and call it a day (no new parameters). So that's basically A, but discerned by package version rather than new package name. In the end, the choice between B and C is yours Jeff, since you're the one who found the issue. Whatever suits you best! Storing the type means |
Closing -- subsumed by #25. |
I know sqlite is utf-8 internally, that's fine.
But I end up with boiler plate code that says:
This seems wrong. Shouldn't sqlitedict do this for us?
If you agree, that the .keys() and .items() and so on should automatically decode str-type as utf-8 into unicode-type, then I will be happy to submit a PR.
The text was updated successfully, but these errors were encountered: