@@ -40,7 +40,76 @@ three edits:
40
40
2 . sitt** e** n → sitt** i** n (substitution of "i" for "e")
41
41
3 . sittin → sittin** g** (insertion of "g" at the end).
42
42
43
+ ## Applications
44
+
45
+ This has a wide range of applications, for instance, spell checkers, correction
46
+ systems for optical character recognition, fuzzy string searching, and software
47
+ to assist natural language translation based on translation memory.
48
+
49
+ ## Dynamic Programming Approach Explanation
50
+
51
+ Let’s take a simple example of finding minimum edit distance between
52
+ strings ` ME ` and ` MY ` . Intuitively you already know that minimum edit distance
53
+ here is ` 1 ` operation and this operation. And it is a replacing ` E ` with ` Y ` . But
54
+ let’s try to formalize it in a form of the algorithm in order to be able to
55
+ do more complex examples like transforming ` Saturday ` into ` Sunday ` .
56
+
57
+ To apply the mathematical formula mentioned above to ` ME → MY ` transformation
58
+ we need to know minimum edit distances of ` ME → M ` , ` M → MY ` and ` M → M ` transformations
59
+ in prior. Then we will need to pick the minimum one and add _ one_ operation to
60
+ transform last letters ` E → Y ` . So minimum edit distance of ` ME → MY ` transformation
61
+ is being calculated based on three previously possible transformations.
62
+
63
+ To explain this further let’s draw the following matrix:
64
+
65
+ ![ Levenshtein Matrix] ( https://cdn-images-1.medium.com/max/1600/1*2d46ug_PL5LfeqztckoYGw.jpeg )
66
+
67
+ - Cell ` (0:1) ` contains red number 1. It means that we need 1 operation to
68
+ transform ` M ` to an empty string. And it is by deleting ` M ` . This is why this number is red.
69
+ - Cell ` (0:2) ` contains red number 2. It means that we need 2 operations
70
+ to transform ` ME ` to an empty string. And it is by deleting ` E ` and ` M ` .
71
+ - Cell ` (1:0) ` contains green number 1. It means that we need 1 operation
72
+ to transform an empty string to ` M ` . And it is by inserting ` M ` . This is why this number is green.
73
+ - Cell ` (2:0) ` contains green number 2. It means that we need 2 operations
74
+ to transform an empty string to ` MY ` . And it is by inserting ` Y ` and ` M ` .
75
+ - Cell ` (1:1) ` contains number 0. It means that it costs nothing
76
+ to transform ` M ` into ` M ` .
77
+ - Cell ` (1:2) ` contains red number 1. It means that we need 1 operation
78
+ to transform ` ME ` to ` M ` . And it is be deleting ` E ` .
79
+ - And so on...
80
+
81
+ This looks easy for such small matrix as ours (it is only ` 3x3 ` ). But here you
82
+ may find basic concepts that may be applied to calculate all those numbers for
83
+ bigger matrices (let’s say ` 9x7 ` one, for ` Saturday → Sunday ` transformation).
84
+
85
+ According to the formula you only need three adjacent cells ` (i-1:j) ` , ` (i-1:j-1) ` , and ` (i:j-1) ` to
86
+ calculate the number for current cell ` (i:j) ` . All we need to do is to find the
87
+ minimum of those three cells and then add ` 1 ` in case if we have different
88
+ letters in ` i ` 's row and ` j ` 's column.
89
+
90
+ You may clearly see the recursive nature of the problem.
91
+
92
+ ![ Levenshtein Matrix] ( https://cdn-images-1.medium.com/max/2000/1*JdHQ5TeKiDlE-iKK1s_2vw.jpeg )
93
+
94
+ Let's draw a decision graph for this problem.
95
+
96
+ ![ Minimum Edit Distance Decision Graph] ( https://cdn-images-1.medium.com/max/1600/1*SGwYUpXH9H1xUeTvJk0e7Q.jpeg )
97
+
98
+ You may see a number of overlapping sub-problems on the picture that are marked
99
+ with red. Also there is no way to reduce the number of operations and make it
100
+ less then a minimum of those three adjacent cells from the formula.
101
+
102
+ Also you may notice that each cell number in the matrix is being calculated
103
+ based on previous ones. Thus the tabulation technique (filling the cache in
104
+ bottom-up direction) is being applied here.
105
+
106
+ Applying this principles further we may solve more complicated cases like
107
+ with ` Saturday → Sunday ` transformation.
108
+
109
+ ![ Levenshtein distance] ( https://cdn-images-1.medium.com/max/1600/1*fPEHiImYLKxSTUhrGbYq3g.jpeg )
110
+
43
111
## References
44
112
45
113
- [ Wikipedia] ( https://en.wikipedia.org/wiki/Levenshtein_distance )
46
114
- [ YouTube] ( https://www.youtube.com/watch?v=We3YDTzNXEk&list=PLLXdhg_r2hKA7DPDsunoDZ-Z769jWn4R8 )
115
+ - [ ITNext] ( https://itnext.io/dynamic-programming-vs-divide-and-conquer-2fea680becbe )
0 commit comments