Skip to content

Commit 871d20d

Browse files
authoredDec 16, 2020
Adding K Nearest Neighbor to ML folder in algorithms with README and tests (#592)
* Updated KNN and README * Update README.md * new * new * updated tests * updated knn coverage
1 parent 802557f commit 871d20d

File tree

4 files changed

+132
-6
lines changed

4 files changed

+132
-6
lines changed
 

‎README.md

+7-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ _Read this in other languages:_
2323
[_Türk_](README.tr-TR.md),
2424
[_Italiana_](README.it-IT.md)
2525

26-
*☝ Note that this project is meant to be used for learning and researching purposes
26+
*☝ Note that this project is meant to be used for learning and researching purposes
2727
only, and it is **not** meant to be used for production.*
2828

2929
## Data Structures
@@ -64,7 +64,7 @@ a set of rules that precisely define a sequence of operations.
6464

6565
* **Math**
6666
* `B` [Bit Manipulation](src/algorithms/math/bits) - set/get/update/clear bits, multiplication/division by two, make negative etc.
67-
* `B` [Factorial](src/algorithms/math/factorial)
67+
* `B` [Factorial](src/algorithms/math/factorial)
6868
* `B` [Fibonacci Number](src/algorithms/math/fibonacci) - classic and closed-form versions
6969
* `B` [Prime Factors](src/algorithms/math/prime-factors) - finding prime factors and counting them using Hardy-Ramanujan's theorem
7070
* `B` [Primality Test](src/algorithms/math/primality-test) (trial division method)
@@ -80,7 +80,7 @@ a set of rules that precisely define a sequence of operations.
8080
* `A` [Integer Partition](src/algorithms/math/integer-partition)
8181
* `A` [Square Root](src/algorithms/math/square-root) - Newton's method
8282
* `A` [Liu Hui π Algorithm](src/algorithms/math/liu-hui) - approximate π calculations based on N-gons
83-
* `A` [Discrete Fourier Transform](src/algorithms/math/fourier-transform) - decompose a function of time (a signal) into the frequencies that make it up
83+
* `A` [Discrete Fourier Transform](src/algorithms/math/fourier-transform) - decompose a function of time (a signal) into the frequencies that make it up
8484
* **Sets**
8585
* `B` [Cartesian Product](src/algorithms/sets/cartesian-product) - product of multiple sets
8686
* `B` [Fisher–Yates Shuffle](src/algorithms/sets/fisher-yates) - random permutation of a finite sequence
@@ -142,12 +142,13 @@ a set of rules that precisely define a sequence of operations.
142142
* `B` [Polynomial Hash](src/algorithms/cryptography/polynomial-hash) - rolling hash function based on polynomial
143143
* `B` [Caesar Cipher](src/algorithms/cryptography/caesar-cipher) - simple substitution cipher
144144
* **Machine Learning**
145-
* `B` [NanoNeuron](https://github.com/trekhleb/nano-neuron) - 7 simple JS functions that illustrate how machines can actually learn (forward/backward propagation)
145+
* `B` [NanoNeuron](https://github.com/trekhleb/nano-neuron) - 7 simple JS functions that illustrate how machines can actually learn (forward/backward propagation)
146+
* `B` [KNN](src/algorithms/ML/KNN) - K Nearest Neighbors
146147
* **Uncategorized**
147148
* `B` [Tower of Hanoi](src/algorithms/uncategorized/hanoi-tower)
148149
* `B` [Square Matrix Rotation](src/algorithms/uncategorized/square-matrix-rotation) - in-place algorithm
149-
* `B` [Jump Game](src/algorithms/uncategorized/jump-game) - backtracking, dynamic programming (top-down + bottom-up) and greedy examples
150-
* `B` [Unique Paths](src/algorithms/uncategorized/unique-paths) - backtracking, dynamic programming and Pascal's Triangle based examples
150+
* `B` [Jump Game](src/algorithms/uncategorized/jump-game) - backtracking, dynamic programming (top-down + bottom-up) and greedy examples
151+
* `B` [Unique Paths](src/algorithms/uncategorized/unique-paths) - backtracking, dynamic programming and Pascal's Triangle based examples
151152
* `B` [Rain Terraces](src/algorithms/uncategorized/rain-terraces) - trapping rain water problem (dynamic programming and brute force versions)
152153
* `B` [Recursive Staircase](src/algorithms/uncategorized/recursive-staircase) - count the number of ways to reach to the top (4 solutions)
153154
* `A` [N-Queens Problem](src/algorithms/uncategorized/n-queens)

‎src/algorithms/ML/KNN/README.md

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# KNN Algorithm
2+
3+
KNN stands for K Nearest Neighbors. KNN is a supervised Machine Learning algorithm. It's a classification algorithm, determining the class of a sample vector using a sample data.
4+
5+
The idea is to calculate the similarity between two data points on the basis of a distance metric. Euclidean distance is used mostly for this task. The algorithm is as follows -
6+
7+
1. Check for errors like invalid data/labels.
8+
2. Calculate the euclidean distance of all the data points in training data with the classification point
9+
3. Sort the distances of points along with their classes in ascending order
10+
4. Take the initial "K" classes and find the mode to get the most similar class
11+
5. Report the most similar class
12+
13+
Here is a visualization for better understanding -
14+
15+
![KNN Visualization](https://media.geeksforgeeks.org/wp-content/uploads/graph2-2.png)
16+
17+
Here, as we can see, the classification of unknown points will be judged by their proximity to other points.
18+
19+
It is important to note that "K" is preferred to have odd values in order to break ties. Usually "K" is taken as 3 or 5.
20+
21+
## References
22+
23+
- [GeeksforGeeks](https://media.geeksforgeeks.org/wp-content/uploads/graph2-2.png)
+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import KNN from '../knn';
2+
3+
describe('KNN', () => {
4+
test('should throw an error on invalid data', () => {
5+
expect(() => {
6+
KNN();
7+
}).toThrowError();
8+
});
9+
test('should throw an error on invalid labels', () => {
10+
const nolabels = () => {
11+
KNN([[1, 1]]);
12+
};
13+
expect(nolabels).toThrowError();
14+
});
15+
it('should throw an error on not giving classification vector', () => {
16+
const noclassification = () => {
17+
KNN([[1, 1]], [1]);
18+
};
19+
expect(noclassification).toThrowError();
20+
});
21+
it('should throw an error on not giving classification vector', () => {
22+
const inconsistent = () => {
23+
KNN([[1, 1]], [1], [1]);
24+
};
25+
expect(inconsistent).toThrowError();
26+
});
27+
it('should find the nearest neighbour', () => {
28+
let dataX = [[1, 1], [2, 2]];
29+
let dataY = [1, 2];
30+
expect(KNN(dataX, dataY, [1, 1])).toBe(1);
31+
32+
dataX = [[1, 1], [6, 2], [3, 3], [4, 5], [9, 2], [2, 4], [8, 7]];
33+
dataY = [1, 2, 1, 2, 1, 2, 1];
34+
expect(KNN(dataX, dataY, [1.25, 1.25]))
35+
.toBe(1);
36+
37+
dataX = [[1, 1], [6, 2], [3, 3], [4, 5], [9, 2], [2, 4], [8, 7]];
38+
dataY = [1, 2, 1, 2, 1, 2, 1];
39+
expect(KNN(dataX, dataY, [1.25, 1.25], 5))
40+
.toBe(2);
41+
});
42+
});

‎src/algorithms/ML/KNN/knn.js

+60
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
/**
2+
* @param {object} dataY
3+
* @param {object} dataX
4+
* @param {object} toClassify
5+
* @param {number} k
6+
* @return {number}
7+
*/
8+
export default function KNN(dataX, dataY, toClassify, K) {
9+
let k = -1;
10+
11+
if (K === undefined) {
12+
k = 3;
13+
} else {
14+
k = K;
15+
}
16+
17+
// creating function to calculate the euclidean distance between 2 vectors
18+
function euclideanDistance(x1, x2) {
19+
// checking errors
20+
if (x1.length !== x2.length) {
21+
throw new Error('inconsistency between data and classification vector.');
22+
}
23+
// calculate the euclidean distance between 2 vectors and return
24+
let totalSSE = 0;
25+
for (let j = 0; j < x1.length; j += 1) {
26+
totalSSE += (x1[j] - x2[j]) ** 2;
27+
}
28+
return Number(Math.sqrt(totalSSE).toFixed(2));
29+
}
30+
31+
// starting algorithm
32+
33+
// calculate distance from toClassify to each point for all dimensions in dataX
34+
// store distance and point's class_index into distance_class_list
35+
let distanceList = [];
36+
for (let i = 0; i < dataX.length; i += 1) {
37+
const tmStore = [];
38+
tmStore.push(euclideanDistance(dataX[i], toClassify));
39+
tmStore.push(dataY[i]);
40+
distanceList[i] = tmStore;
41+
}
42+
43+
// sort distanceList
44+
// take initial k values, count with class index
45+
distanceList = distanceList.sort().slice(0, k);
46+
47+
// count the number of instances of each class in top k members
48+
// with that maintain record of highest count class simultanously
49+
const modeK = {};
50+
const maxm = [-1, -1];
51+
for (let i = 0; i < Math.min(k, distanceList.length); i += 1) {
52+
if (distanceList[i][1] in modeK) modeK[distanceList[i][1]] += 1;
53+
else modeK[distanceList[i][1]] = 1;
54+
if (modeK[distanceList[i][1]] > maxm[0]) {
55+
[maxm[0], maxm[1]] = [modeK[distanceList[i][1]], distanceList[i][1]];
56+
}
57+
}
58+
// return the class with highest count from maxm
59+
return maxm[1];
60+
}

0 commit comments

Comments
 (0)
Please sign in to comment.