There are already a number of knn implementations in R, and GBeliakov and I have already used his algorithms with a Choquet integral aggregation of the k-nearest neighbours, however given that I will need any future kNN to be flexible, starting with the basic code that I can then manipulate seemed like an okay idea.

It needs a definition of the mode function, which strangely isn’t in R already. An additional thing is that usually with kNN you need to do a random tie-break if there are multiple modes, so it’s best to show all modes. This also needs a sort function once the distances are found.

**Some publications which used kNN**

- Using Choquet integrals for kNN approximation and Classification
- Texture recognition by using GLCM and various aggregation functions

**R-implementation**

required input: training data, test data, value of k

# write the mode function all.modes <- function(x) { z <- table(as.vector(x)) names(z)[z == max(z)] } #read the data train1<-read.table("documents/R/train1.txt") test1<-read.table("documents/R/test1.txt") k <- 5 output1 <- cbind(test1,NA) for(h in 1:nrow(test1)) { #create working train file to calculate distances trainwork <- matrix(NA,nrow=nrow(train1),ncol=2) trainwork[,1] <- train1[,ncol(train1)] #find distances to training point 1 for(i in 1:nrow(trainwork)) { trainwork[i,2] <- sum((test1[h,]-train1[i,])^2)-(test1[h,ncol(test1)]-train1[i,ncol(train1)])^2 } #reorder by distances ordtrain <- trainwork[order(trainwork[,2]),] #create short matrix of nrow=k ktrainwork <- matrix(NA,nrow=k,ncol=2) for(i in 1:k) { ktrainwork[i,]<-ordtrain[i,] } #take the modes with random selection and add to output file output1[h,ncol(output1)] <- as.numeric(sample(statmod(ktrainwork[,1]),1)) }

Advertisements