Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient Hash Table implementation #529

Open
David263 opened this issue Aug 20, 2020 · 7 comments
Open

Inefficient Hash Table implementation #529

David263 opened this issue Aug 20, 2020 · 7 comments

Comments

@David263
Copy link

There is no good reason to publish poorly-performing algorithms, in my opinion. Hash Tables are frequently used in inner loops, so it is important that they run fast.

The hash function used here has much published criticism. It could be replaced by a variant of Pearson Hashing, which runs very quickly and has good statistical properties for avoiding collisions. Note also that careful use of JavaScript datatypes is needed for efficient and fast data storage.

@JackyYin
Copy link

I am trying to implement the Pearson Hashing.
Please check PR #596

@JackyYin
Copy link

The test coverage failed because test case is not covering a local variable inside a function.
I am not sure if it is a good idea to test the line coverage in this case.
Any suggestion to fix this problem?

@David263
Copy link
Author

If you are asking me, the author of this suggestion, I have no answer. I am not familiar with the testing system that you are using. If a test case it missing, why not add it? I can answer only questions about the Pearson Hashing algorithm, such as how to extend it beyond its natural limit of 8 bits of key data. I have used Pearson Hashing in real products and find that it performs quite randomly, as desired, producing fewer collisions than more common types of hashing functions.

@lazarljubenovic
Copy link

There is no good reason to publish poorly-performing algorithms, in my opinion.

The nature of the repo is to provide implementations for learning. They're not production-ready. That said, I agree that bad solutions shouldn't be here. If a solution is sub-optimal but provides insight into a technique, I think it can remain here under “Basic”, with a clear banner explaining why it's not optimal and where to look further to gain deeper understanding.

JackyYin added a commit to JackyYin/javascript-algorithms that referenced this issue Dec 19, 2020
@JackyYin
Copy link

JackyYin commented Dec 19, 2020

@David263 Thanks, Because it seems like you have a rich experience in Pearson Hashing algorithm, could you give me some advise about my implementation?
I will generate a 32-bits signed int hash, which is generated by ORing four 8-bits hash, and right shift 1 bit to make sure I get a positive number(the left most bit should be 0).
I am not sure whether it's a good idea to do the right shift, or I should just use an absolute function?

@David263
Copy link
Author

Right shifting can do different things on different hardware. Absolute value produces different bit results on 1's-complement and 2's-complement hardware. For these reasons, it's safer to mask the value using an AND bit operator to clear the sign bit. As to constructing a 31-bit hash, you should use shift and OR operators to move each byte to the proper place in the word. You will probably need a final mask to get the key in the domain of your table, since you probably do not want to devote a full 31 bits of contiguous memory space to the hash buckets (unless your hardware supports sparse arrays). You can find implementations of Pearson hashing for longer word lengths on the Web, when in doubt.

@ilyas-wise
Copy link

@David263 , excellent you are gave me a good point thanks
I got it some of before a mistake and today memorized where else's has to possible!
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants