Here is an algorithm from Feelders. With various authors, has made some of the most important contributions in optimal relabeling of data to make it monotone. In previous work, the implementations required the identification of the largest independent set from the monotonicity violations graph, however I think this one is a bit more straight forward and lends itself quite easily to the regression case – although in order to prove that the solution is optimal I think I’ll have to work out how whether we just need intermediate values between the set of outputs or whether it’s more complicated — my intuition is that relabeling over the set of outputs or using an intermediate point as the average would be enough to minimize the loss in terms of least squares or LAD. At any rate, this algorithm takes a data set, then for each of the available class labels, finds a closed set of points (in terms of the monotonicity on (**x**_1, **x**_2 etc) ) where the minimum loss occurs if the set were instead labeled one label higher. It then assigns that label to all instances in the complement. The theory behind this is that going from one set to another should be convex, so the set “preceding” this should be optimum.

In order to find the set, the algorithm makes use of a transformation of the maximum flow/min cut problem, which although NP-hard to solve can be converted into a linear (and binary) programming problem by making use of the fact that the graph is defined with a partial order. All of this really is very clever in my opinion. The algorithm is quite efficient. Also included below are some loops to give a list of the monotonicity violations and a list of the comparable pairs – this helps validate some of the analyses in Feelders and other papers. I think in the future I will aim to loop these at the same time in order to save on the working time but at the moment it’s quick enough for it not to matter too much.

The Pseudocode is in the original article, but I’ll give a rephrased version here:

**Pseudocode**

1. Read the data

2. Calculate n, and add a labeling column so that the data can be resorted into its original order later (optional).

3. Define the partial order function (for ≤ arguments for vectors) and also the empirical Loss function which just (in this case) is the squared difference between l and the original label. The y labels need to be from 1,2,3,… etc

4. For each of the labels, work out the cost if each of the data points are relabeled to that label, set up a constraints matrix and solve the binary problem which minimizes the weight and finds the minimal upper set.

5. For each of the elements in the complement of the minimal uppersset, add them to a new list with the given label.

6. Repeat on the data which is left over.

7. Then there are various analyses – how many changes are made, which of the instances are comparable and monotonicity violations.

**R implementation**

#read the data and set up number of variables, number of classes #Read values as x1 x2 ... xn y D<-read.table("documents/R/train1.txt") #store an original copy of the data D.orig <- D n <- ncol(D)-1 ycol <- n+1 classes <- max(D[,ycol]) #the output data set Dnew <- array(0,0) # optional if you want to have the output in the same order D <- cbind(D,array(1:nrow(D))) ## emp Loss function empLoss <- function(i,l) {(l-D[i,ycol])^2} ## partial order function parto<-function(x,y){if(sum(as.numeric(x[1:n]<=y[1:n]))==n) {1} else {0}} # it's necessary for the classes to be 1,2,3,...etc for(l in 1:(classes-1)) { #cost array for each vector, increase in loss in going from l to l+1 wtLoss <- array(0,nrow(D)) for(i in 1:nrow(D)) {wtLoss[i]<- empLoss(i,(l+1)) - empLoss(i,l)} ## Set up the monotonicity constraints ## (this changes for all minimization problems as D keeps changing) all.const<-array(0,0) for(i in 1:nrow(D)) for(j in 1:nrow(D)) { if(i == j) {all.const <- all.const} else if(parto(D[i,],D[j,])==1) { const <- array(0,nrow(D)) const[i] <- 1 const[j]<- -1 all.const <- rbind(all.const,const) } } const.values <-array(0,nrow(all.const)) ## Obtain minimal upper set MinUpperSet <- lp(direction="min",wtLoss,all.const,"<=",const.values,all.bin=TRUE)$solution ## Lowerset to relabel LowSet <- array(0,0) for(i in 1:length(MinUpperSet)) { if(MinUpperSet[i]==0) LowSet <-c(LowSet,i)} for(i in LowSet) { Dnew <- rbind(Dnew,c(D[i,1:n],l,D[i,(ycol+1)]))} D <- D[setdiff(1:nrow(D),LowSet),] } Dnew <-rbind(Dnew,D) # optional (if using an ordering column, will reorder) Dnew <-Dnew[order(Dnew[,ncol(Dnew)]),] #take away the ordering column Dnew <- Dnew[,1:ycol] ## analysis of the changes made relabel.vec <- Dold[,ncol(Dold)]-Dnew[,ncol(Dnew)] # if you want to count the "cost" (L1) of the relabelling relabel.cost <- sum(abs(relabel.vec)) # if you want to count the number of points relabelled relabel.num <- nrow(Dnew)-length(relabel.vec[relabel.vec == 0]) # monotonicity pairs on the original data set #violating points - includes same x diff label ViolationsList <- array(0,0) for(i in 1:nrow(D)) for(j in 1:nrow(D)) { if(parto(D[i,],D[j,])&(D[i,ncol(D)]>D[j,ncol(D)]) == 1 ) {ViolationsList <- rbind(ViolationsList,c(i,j))} else {ViolationsList <- ViolationsList}} #number of monotonicity violations violations.num <- nrow(ViolationsList) #comparable pairs ComparableList <-array(0,0) for(i in 1:nrow(D)) for(j in 1:nrow(D)) { if(i == j) {ComparableList <- ComparableList} else if(parto(D[i,],D[j,]) == 1 ) {ComparableList <- rbind(ComparableList,c(i,j))} } Comparable.num <- nrow(ComparableList)

Pingback: Monotonicity Check | Aggregation Functions