Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabinKarp seems to miss some matches #102

Closed
dubzzz opened this issue Jul 21, 2018 · 5 comments
Closed

rabinKarp seems to miss some matches #102

dubzzz opened this issue Jul 21, 2018 · 5 comments
Labels
bug Something isn't working

Comments

@dubzzz
Copy link
Contributor

dubzzz commented Jul 21, 2018

Hello,

I recently tried to run some property based tests on the algorithms provided in javascript-algorithms.
I discovered the following failing case with rabinKarp algorithm:

expect(rabinKarp("^ !/'#'pp", " !/'#'pp")).toBe(2); // output: -1

In order to find it, I used fast-check framework with the property:

// for any strings a, b and c
// rabinKarp(a + b + c, b) should be different from -1 (indeed b is a substring of a+b+c)
fc.assert(
  fc.property(
    fc.string(), fc.string(), fc.string(),
    (a, b, c) => rabinKarp(a + b + c, b) !== -1
  )
);

I was not able to have a smaller example for this precise case :/

dubzzz referenced this issue in dubzzz/javascript-algorithms Jul 21, 2018
@dubzzz
Copy link
Contributor Author

dubzzz commented Jul 23, 2018

It seems to be some kind of overflow of hashWord/reHashWord. With previous strings the computed hash is above Number.MAX_SAFE_INTEGER.

Number.MAX_SAFE_INTEGER; //9007199254740991
hashWord(" !/'#'pp"); //9143038766935264
reHashWord(hashWord("^ !/'#'p"), "^ !/'#'p", " !/'#'pp"); //9143038766935266

Maybe we can try to mod the hash into [-2 ** 31, 2 ** 31 -1] space in order to get rid of the issue. I will give it a try and send you an update as soon as I get something.

@dubzzz
Copy link
Contributor Author

dubzzz commented Jul 23, 2018

Given the fact that the issue is coming from word[charIndex].charCodeAt(0) * PRIME ** charIndex going above Number.MAX_SAFE_INTEGER, it means we are safe up to charIndex == 4 because 0x10ffff * 97 ** 4 < Number.MAX_SAFE_INTEGER - false for 5.

The current implementation of the algorithm should be ok - will confirm my test passes with this limit - for any word having at most 5 characters otherwise there is no guarantee that it does not overflow and goes into rounding issues.

@trekhleb trekhleb added the bug Something isn't working label Jul 24, 2018
@trekhleb
Copy link
Owner

Good catch @dubzzz

@dubzzz
Copy link
Contributor Author

dubzzz commented Jul 24, 2018

@trekhleb Would you agree on a PR adding some property based tests in this repository?

All the issues I reported so far were found using this test method. For the moment I only added them for sorting, knuth and rabin in my local fork. I think I can spend some time to check the other algorithms too.

In a nutshell:

It is a testing approach complementary to examples based tests. The idea is to check for properties:
for all (x, y, ...) such as precondition(x, y, ...) holds property(x, y, ...) is true
The framework is then responsible to generate x,y..., such as precondition holds, run the check on them and build a smaller failing case in case of failure.

Running example: https://runkit.com/dubzzz/rabin

dubzzz referenced this issue in Bruce-Feldman/javascript-algorithms Jul 26, 2018
@dubzzz
Copy link
Contributor Author

dubzzz commented Jul 30, 2018

@trekhleb Fixed by #110

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants