Fixes bugs related to unicode support in polynomial-hash functions #304
+151
−117
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Details:
This Pull Request fixes bugs related to unicode support in polynomial-hash functions:
charToNumber
ofPolynomialHash
was wrongly computed the code point value of a character:codePointAt
is already aware of surrogate pairs (which is not the case ofcharCodeAt
)roll
ofPolynomialHash
ignored unicode characters totallySimplePolynomialHash
was not aware of unicodeHow did I find it?
Originally I wanted to check how I could rewrite the existing for-loop based tests into something based on property based. Then it clearly appeared that the test can be simplified given the switch to property based:
roll(hash(XYZ), XYZ, YZA) === hash(YZA)
whatever the values ofX
,Y
,Z
,A
.The final version of the test can be read as follow:
For any
A
single character,B
single character andW
string,roll(hash(A + W), A + W, W + B) === hash(W + B)
As a side note, I strongly believe that adding such kind of tests would highly help to detect such bugs easily. Up to now it has detected the following: #100, #101, #102, #110 and #129.
And also #305, #306, #307, #308.