This is part of an algorithm developed by Wouter Duivesteijn as part of a Masters Thesis. Description of the algorithm can be found here. Duivesteijn look at the problem in terms of monotonicity constraints in the context of nearest neighbour classification. In the past, we’ve looked at nearest neighbour algorithms for data approximation, so we’ll probably look at the data towards that.

After the monotone violation graph is created, Duivesteijn and Feelders employ a max flow optimisation approach in order to relabel the monotone data.

The R-code needs a couple of things that might be automatically available in matlab, i.e. you need to define the partial order for multivariate data, you also need to make use of the setdiff function in order to filter out the data. With these, the code is reasonably easy to implement.

I know the pseudo-algorithm given by the original authors is better from a comprehension point of view, but i’ll just summarise for the sake of the R code:

**Pseudocode**

1. read the training data as x1 … xn y-label

2. normalise the xi (columns of the data – not the labels)

3. define a function which specifies the partial order over X^n

4. mark 1s in the MVG matrix based on the partial order

5. create a vector containing each of the non-violating points and filter these entries from the matrix

6. from the filtered matrix, create an edges array that can be used in the optomisation algorithm

**R-implementation**

required input: training data

# first read data D <- read.table("documents/R/train1.txt") # normalise columns normD <- D for(i in 1:nrow(D)) for(j in 1:(ncol(D)-1)) {normD[i,j]<-(D[i,j]-mean(D[,j]))/sd(D[,j])} #define the partial order parto<-function(x,y){ if(sum(as.numeric(x[1:n]<=y[1:n]))==n) {1} else {0} } # prepare mvg matrix mvgAdjMatrix <-matrix(0,nrow(D),nrow(D)) #fill mvgAdjMatrix for(i in 1:nrow(D)) for(j in 1:nrow(D)) { mvgAdjMatrix[i,j]<-parto(D[i,],D[j,])&(D[i,ncol(D)]>D[j,ncol(D)]) } # make the nonviolating points vector nonViolatingPoints<-array(0,0) for(i in 1:nrow(D)) { if( sum(mvgAdjMatrix[i,])+sum(mvgAdjMatrix[,i]) ==0) { nonViolatingPoints<-cbind(nonViolatingPoints,i)} } # remove the nonviolatingpoints from mvg matrix tBR<-array(1:nrow(D)) mvgAdjMatrix <- mvgAdjMatrix[setdiff(tBR,nonViolatingPoints),setdiff(tBR,nonViolatingPoints)] # create the edges array that is used by the opt alg mvgEdges<-matrix(0,0,2) a2<-array(0,2) for(i in 1:nrow(mvgAdjMatrix)) for (j in 1:nrow(mvgAdjMatrix)) { if(mvgAdjMatrix[i,j]==1) { a2[1]<-i ; a2[2]<-j ; mvgEdges<-rbind(mvgEdges,a2)} }

Pingback: Monotonicity Check | Aggregation Functions