Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashes of numbers with identical final digits match #143

Open
ilyvion opened this issue Mar 8, 2025 · 2 comments
Open

Hashes of numbers with identical final digits match #143

ilyvion opened this issue Mar 8, 2025 · 2 comments

Comments

@ilyvion
Copy link

ilyvion commented Mar 8, 2025

This feels very problematic to me and is causing me to have to hash BigDecimal values using a different technique than their real Hash impl.

#[test]
fn big_decimal_hash_issue() {
    use bigdecimal::BigDecimal;
    use std::hash::{DefaultHasher, Hash, Hasher};

    let d1: BigDecimal = "1011".parse().unwrap();
    let d2: BigDecimal = "0.01011".parse().unwrap();

    let mut d1_hasher = DefaultHasher::new();
    let mut d2_hasher = DefaultHasher::new();

    d1.hash(&mut d1_hasher);
    d2.hash(&mut d2_hasher);

    let d1_hash = d1_hasher.finish();
    let d2_hash = d2_hasher.finish();

    assert_ne!(d1_hash, d2_hash, "{d1:?}, {d2:?}");
}

This test fails with

assertion `left != right` failed: BigDecimal(sign=Plus, scale=0, digits=[1011]), BigDecimal(sign=Plus, scale=5, digits=[1011])
  left: 1407011009942640606
 right: 1407011009942640606

You can change d1 and d2 to any two numbers that "end in" (or I guess, more accurately, have the significant digits) 1011, and you still get the panic, e.g. 10.11 or 0.00000001011 -- basically as long as digits=[1011], you'll get the panic.

It does not happen when you go in the other direction, like with 10110 because then you get scale=0, digits=[10110], which hashes differently. Unless you use .normalized() on them first; since you then get something like scale=-1, digits=[1011] instead, which once more gives an identical hash, (I was mistaken about that; while normalizing do give them identical digits, they still hash differently)

@akubera
Copy link
Owner

akubera commented Mar 8, 2025

Yeah it looks like a pretty naive algorithm, written some years ago. I think it’s just a hash of integer value?

Definitely needs fixed

@akubera
Copy link
Owner

akubera commented Mar 8, 2025

Easy solution would be to be format to scientific notation and hash that… but it’d be nicer of course to not have to allocate a string (but Hash has been doing that for 7 years and nobody has complained)

Hash should include precision, right? So hash(1.2) != hash(1.200), despite 1.2 == 1.200?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants