I implemented K-Nearest Neighbours algorithm, but my experience using MATLAB is lacking. I need you to check the small portion of code and tell me what can be improved or modified. I hope it is a correct implementation of the algorithm.
function test_data = knn(test_data, tr_data,k)
numoftestdata = size(test_data,1);
numoftrainingdata = size(tr_data,1);
for sample=1:numoftestdata
%Step 1: Computing euclidean distance for each testdata
R = repmat(test_data(sample,:),numoftrainingdata,1) ;
euclideandistance = (R(:,1) - tr_data(:,1)).^2;
%Step 2: compute k nearest neighbors and store them in an array
[dist position] = sort(euclideandistance,'ascend');
knearestneighbors=position(1:k);
knearestdistances=dist(1:k);
% Step 3 : Voting
for i=1:k
A(i) = tr_data(knearestneighbors(i),2);
end
M = mode(A);
if (M~=1)
test_data(sample,2) = mode(A);
else
test_data(sample,2) = tr_data(knearestneighbors(1),2);
end
end
To test it you can use :
- test_data = [6,0; 2,0; 5,0]
- tr_data = [1,1;0,2;3,2; 4,4; 5,3]
1 Answer 1
- Use consistent indentation.
- You switch from extremely verbose all lower case variable names like
numoftrainingdata
to single letter capitalized variable names likeA
. Make your variable names descriptive and no longer than necessary, and be consistent. - Use consistent white space between operators.
knn()
doesn't need the second column oftest_data
, and the calling function doesn't need the first column oftest_data
.Rather than calling the function like this:
test_data = knn(test_data,tr_data,k);
Call it like this:
test_data(:,2) = knn(test_data(1,:),tr_data,k);
Do you want to handle certain error conditions like
0 >= k
ork > size(tr_data,1)
?- Rather than squaring the distance, you can use
abs()
. - Remove the ascend parameter from
sort()
, that is the default mode. knearestdistances
is unused.- You call
mode()
a second time rather than usingM
.
Simplify:
[dist position] = sort(euclideandistance,'ascend');
knearestneighbors = position(1:k);
knearestdistances = dist(1:k);
for i=1:k
A(i) = tr_data(knearestneighbors(i),2);
end
M = mode(A);
if (M~=1)
test_data(sample,2) = mode(A);
else
test_data(sample,2) = tr_data(knearestneighbors(1),2);
end
to
[~,position] = sort(euclideandistance);
A = tr_data(position(1:k),2);
M = mode(A);
if (M~=1)
test_data(sample,2) = M;
else
test_data(sample,2) = tr_data(position(1),2);
end
After applying the above suggestions and vectorizing the function you could write it as:
function out_data = knn(test_data,tr_data,k)
test_data_n = size(test_data,1);
tr_data_n = size(tr_data,1);
% absolute distance between all test and training data
dist = abs(repmat(test_data,1,tr_data_n) - repmat(tr_data(:,1)',test_data_n,1));
% indicies of nearest neighbors
[~,nearest] = sort(dist,2);
% k nearest
nearest = nearest(:,1:k);
% mode of k nearest
val = reshape(tr_data(nearest,2),[],k);
out_data = mode(val,2);
% if mode is 1, output nearest instead
out_data(out_data==1) = val(out_data==1,1);
end
Edit
Regarding correctness, i'm not sure why you check to see if the mode is 1. There is nothing unique about a mode of 1 in general.
-
\$\begingroup\$ I made a mistake, If the frequency of the mode is One , that means all values of
val
are unique, we choose the nearest. \$\endgroup\$user21479– user214792014年04月03日 10:43:08 +00:00Commented Apr 3, 2014 at 10:43 -
\$\begingroup\$ ` [M,F] = mode(knntrdata); if (F~=1) test_data(sample,2) = M; else test_data(sample,2) = tr_data(position(1),2); end` \$\endgroup\$user21479– user214792014年04月03日 10:48:53 +00:00Commented Apr 3, 2014 at 10:48
-
\$\begingroup\$ is it necessary to vectorize the function ? \$\endgroup\$user21479– user214792014年04月03日 11:10:06 +00:00Commented Apr 3, 2014 at 11:10
-
1\$\begingroup\$ @ALJIMohamed Yes that code looks correct to check if all the values are unique. Vectorization is only necessary if your code is too verbose or slow. The only part of your code that needs to be vectorized is the assignment of
A
, because it is too verbose. I vectorized the rest as an example. \$\endgroup\$Bob65536– Bob655362014年04月03日 21:51:45 +00:00Commented Apr 3, 2014 at 21:51