Inefficient Hash Table implementation #529

David263 · 2020-08-20T17:26:06Z

There is no good reason to publish poorly-performing algorithms, in my opinion. Hash Tables are frequently used in inner loops, so it is important that they run fast.

The hash function used here has much published criticism. It could be replaced by a variant of Pearson Hashing, which runs very quickly and has good statistical properties for avoiding collisions. Note also that careful use of JavaScript datatypes is needed for efficient and fast data storage.

JackyYin · 2020-12-18T03:30:40Z

I am trying to implement the Pearson Hashing.
Please check PR #596

JackyYin · 2020-12-18T07:06:45Z

The test coverage failed because test case is not covering a local variable inside a function.
I am not sure if it is a good idea to test the line coverage in this case.
Any suggestion to fix this problem?

David263 · 2020-12-18T10:48:32Z

If you are asking me, the author of this suggestion, I have no answer. I am not familiar with the testing system that you are using. If a test case it missing, why not add it? I can answer only questions about the Pearson Hashing algorithm, such as how to extend it beyond its natural limit of 8 bits of key data. I have used Pearson Hashing in real products and find that it performs quite randomly, as desired, producing fewer collisions than more common types of hashing functions.

lazarljubenovic · 2020-12-18T15:36:03Z

There is no good reason to publish poorly-performing algorithms, in my opinion.

The nature of the repo is to provide implementations for learning. They're not production-ready. That said, I agree that bad solutions shouldn't be here. If a solution is sub-optimal but provides insight into a technique, I think it can remain here under “Basic”, with a clear banner explaining why it's not optimal and where to look further to gain deeper understanding.

JackyYin · 2020-12-19T16:57:43Z

@David263 Thanks, Because it seems like you have a rich experience in Pearson Hashing algorithm, could you give me some advise about my implementation?
I will generate a 32-bits signed int hash, which is generated by ORing four 8-bits hash, and right shift 1 bit to make sure I get a positive number(the left most bit should be 0).
I am not sure whether it's a good idea to do the right shift, or I should just use an absolute function?

David263 · 2020-12-19T19:22:31Z

Right shifting can do different things on different hardware. Absolute value produces different bit results on 1's-complement and 2's-complement hardware. For these reasons, it's safer to mask the value using an AND bit operator to clear the sign bit. As to constructing a 31-bit hash, you should use shift and OR operators to move each byte to the proper place in the word. You will probably need a final mask to get the key in the domain of your table, since you probably do not want to devote a full 31 bits of contiguous memory space to the hash buckets (unless your hardware supports sparse arrays). You can find implementations of Pearson hashing for longer word lengths on the Web, when in doubt.

ilyas-wise · 2021-04-06T05:27:52Z

@David263 , excellent you are gave me a good point thanks
I got it some of before a mistake and today memorized where else's has to possible!
Thanks

JackyYin added a commit to JackyYin/javascript-algorithms that referenced this issue Dec 18, 2020

fix: implementing Pearson Hashing in Hash Table (trekhleb#529)

11953b2

JackyYin mentioned this issue Dec 18, 2020

fix: implementing Pearson Hashing in Hash Table (#529) #596

Open

JackyYin added a commit to JackyYin/javascript-algorithms that referenced this issue Dec 19, 2020

fix: implementing Pearson Hashing in Hash Table (trekhleb#529)

e48f70d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Inefficient Hash Table implementation #529

Inefficient Hash Table implementation #529

David263 commented Aug 20, 2020

JackyYin commented Dec 18, 2020

JackyYin commented Dec 18, 2020

David263 commented Dec 18, 2020

lazarljubenovic commented Dec 18, 2020

JackyYin commented Dec 19, 2020 •

edited

Loading

David263 commented Dec 19, 2020

ilyas-wise commented Apr 6, 2021

Inefficient Hash Table implementation #529

Inefficient Hash Table implementation #529

Comments

David263 commented Aug 20, 2020

JackyYin commented Dec 18, 2020

JackyYin commented Dec 18, 2020

David263 commented Dec 18, 2020

lazarljubenovic commented Dec 18, 2020

JackyYin commented Dec 19, 2020 • edited Loading

David263 commented Dec 19, 2020

ilyas-wise commented Apr 6, 2021

JackyYin commented Dec 19, 2020 •

edited

Loading