Skip to content

Commit 89fb0e6

Browse files
committedJun 23, 2018
Add Levenshtein Distance algorithm explanations.
1 parent a950285 commit 89fb0e6

File tree

1 file changed

+69
-0
lines changed
  • src/algorithms/string/levenshtein-distance

1 file changed

+69
-0
lines changed
 

‎src/algorithms/string/levenshtein-distance/README.md

+69
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,76 @@ three edits:
4040
2. sitt**e**n → sitt**i**n (substitution of "i" for "e")
4141
3. sittin → sittin**g** (insertion of "g" at the end).
4242

43+
## Applications
44+
45+
This has a wide range of applications, for instance, spell checkers, correction
46+
systems for optical character recognition, fuzzy string searching, and software
47+
to assist natural language translation based on translation memory.
48+
49+
## Dynamic Programming Approach Explanation
50+
51+
Let’s take a simple example of finding minimum edit distance between
52+
strings `ME` and `MY`. Intuitively you already know that minimum edit distance
53+
here is `1` operation and this operation. And it is a replacing `E` with `Y`. But
54+
let’s try to formalize it in a form of the algorithm in order to be able to
55+
do more complex examples like transforming `Saturday` into `Sunday`.
56+
57+
To apply the mathematical formula mentioned above to `ME → MY` transformation
58+
we need to know minimum edit distances of `ME → M`, `M → MY` and `M → M` transformations
59+
in prior. Then we will need to pick the minimum one and add _one_ operation to
60+
transform last letters `E → Y`. So minimum edit distance of `ME → MY` transformation
61+
is being calculated based on three previously possible transformations.
62+
63+
To explain this further let’s draw the following matrix:
64+
65+
![Levenshtein Matrix](https://cdn-images-1.medium.com/max/1600/1*2d46ug_PL5LfeqztckoYGw.jpeg)
66+
67+
- Cell `(0:1)` contains red number 1. It means that we need 1 operation to
68+
transform `M` to an empty string. And it is by deleting `M`. This is why this number is red.
69+
- Cell `(0:2)` contains red number 2. It means that we need 2 operations
70+
to transform `ME` to an empty string. And it is by deleting `E` and `M`.
71+
- Cell `(1:0)` contains green number 1. It means that we need 1 operation
72+
to transform an empty string to `M`. And it is by inserting `M`. This is why this number is green.
73+
- Cell `(2:0)` contains green number 2. It means that we need 2 operations
74+
to transform an empty string to `MY`. And it is by inserting `Y` and `M`.
75+
- Cell `(1:1)` contains number 0. It means that it costs nothing
76+
to transform `M` into `M`.
77+
- Cell `(1:2)` contains red number 1. It means that we need 1 operation
78+
to transform `ME` to `M`. And it is be deleting `E`.
79+
- And so on...
80+
81+
This looks easy for such small matrix as ours (it is only `3x3`). But here you
82+
may find basic concepts that may be applied to calculate all those numbers for
83+
bigger matrices (let’s say `9x7` one, for `Saturday → Sunday` transformation).
84+
85+
According to the formula you only need three adjacent cells `(i-1:j)`, `(i-1:j-1)`, and `(i:j-1)` to
86+
calculate the number for current cell `(i:j)`. All we need to do is to find the
87+
minimum of those three cells and then add `1` in case if we have different
88+
letters in `i`'s row and `j`'s column.
89+
90+
You may clearly see the recursive nature of the problem.
91+
92+
![Levenshtein Matrix](https://cdn-images-1.medium.com/max/2000/1*JdHQ5TeKiDlE-iKK1s_2vw.jpeg)
93+
94+
Let's draw a decision graph for this problem.
95+
96+
![Minimum Edit Distance Decision Graph](https://cdn-images-1.medium.com/max/1600/1*SGwYUpXH9H1xUeTvJk0e7Q.jpeg)
97+
98+
You may see a number of overlapping sub-problems on the picture that are marked
99+
with red. Also there is no way to reduce the number of operations and make it
100+
less then a minimum of those three adjacent cells from the formula.
101+
102+
Also you may notice that each cell number in the matrix is being calculated
103+
based on previous ones. Thus the tabulation technique (filling the cache in
104+
bottom-up direction) is being applied here.
105+
106+
Applying this principles further we may solve more complicated cases like
107+
with `Saturday → Sunday` transformation.
108+
109+
![Levenshtein distance](https://cdn-images-1.medium.com/max/1600/1*fPEHiImYLKxSTUhrGbYq3g.jpeg)
110+
43111
## References
44112

45113
- [Wikipedia](https://en.wikipedia.org/wiki/Levenshtein_distance)
46114
- [YouTube](https://www.youtube.com/watch?v=We3YDTzNXEk&list=PLLXdhg_r2hKA7DPDsunoDZ-Z769jWn4R8)
115+
- [ITNext](https://itnext.io/dynamic-programming-vs-divide-and-conquer-2fea680becbe)

0 commit comments

Comments
 (0)
Please sign in to comment.