-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient Hash Table implementation #529
Comments
I am trying to implement the Pearson Hashing. |
The test coverage failed because test case is not covering a local variable inside a function. |
If you are asking me, the author of this suggestion, I have no answer. I am not familiar with the testing system that you are using. If a test case it missing, why not add it? I can answer only questions about the Pearson Hashing algorithm, such as how to extend it beyond its natural limit of 8 bits of key data. I have used Pearson Hashing in real products and find that it performs quite randomly, as desired, producing fewer collisions than more common types of hashing functions. |
The nature of the repo is to provide implementations for learning. They're not production-ready. That said, I agree that bad solutions shouldn't be here. If a solution is sub-optimal but provides insight into a technique, I think it can remain here under “Basic”, with a clear banner explaining why it's not optimal and where to look further to gain deeper understanding. |
@David263 Thanks, Because it seems like you have a rich experience in Pearson Hashing algorithm, could you give me some advise about my implementation? |
Right shifting can do different things on different hardware. Absolute value produces different bit results on 1's-complement and 2's-complement hardware. For these reasons, it's safer to mask the value using an AND bit operator to clear the sign bit. As to constructing a 31-bit hash, you should use shift and OR operators to move each byte to the proper place in the word. You will probably need a final mask to get the key in the domain of your table, since you probably do not want to devote a full 31 bits of contiguous memory space to the hash buckets (unless your hardware supports sparse arrays). You can find implementations of Pearson hashing for longer word lengths on the Web, when in doubt. |
@David263 , excellent you are gave me a good point thanks |
There is no good reason to publish poorly-performing algorithms, in my opinion. Hash Tables are frequently used in inner loops, so it is important that they run fast.
The hash function used here has much published criticism. It could be replaced by a variant of Pearson Hashing, which runs very quickly and has good statistical properties for avoiding collisions. Note also that careful use of JavaScript datatypes is needed for efficient and fast data storage.
The text was updated successfully, but these errors were encountered: