Piecewise Linear Data Scaling

The following was first developed in order to fit the Choquet integral to journal rankings data.  The Choquet integral requires the inputs to be commensurable, and the distributions of journal indices e.g. impact factor, result in the usual scaling techniques such as normalisation, standardisation or functional have little distinguishing ability between journals of differing quality.  What we want is for a journal with 0.8 across all categories to have an ouput of 0.8 (idempotency).


Relevant Papers

Pseudocode

1. Read the data as x1 … xn y

2.For each variable, calculate the median for each y-label, *sort, and associate with the desired output.

3. Define the transformation function u(x) as the piecewise-linear function which interpolates the medians.

4. Transform the data and output as a new table.

R-implementation

Required input:

change the name of the input file, output file, and class values.

#read the data
data<-read.table("documents/R/thedata.txt")
#create the median/average matrix
mdat<-matrix(NA,nrow=4,ncol=ncol(data))
# fill matrix with values, 
# need to change Y1, Y2 etc to label values in ascending order
# this is set for 4 lables, to add in more just add manually 
for(i in 1:4){
mdat[1,i]<-median(split(data,data$y)$`Y1`[,i])
mdat[2,i]<-median(split(data,data$y)$`Y2`[,i])
mdat[3,i]<-median(split(data,data$y)$`Y3`[,i])
mdat[4,i]<-median(split(data,data$y)$`Y4`[,i])}
# enforce monotonicity of the matrices
for(i in 1:4){ mdat[,i]<-sort(mdat[,i])}
#alternative to this is this (j is number of iterations)
## for(i in 1:4 j in 1:10){ mdat[,i]<-sort(mdat[,i])}
#create new data set
data1<-data
n <- nrow(data1)
for(i in 1:n) for( j in 1:4) {data1[i,j]<-
if(data[i,j]<mdat[1,j]) 
mdat[1,j]*(data[i,j]-min(data[,j])) 
else if(data[i,j]<mdat[2,j]) 
mdat[1,j]+(mdat[2,j]-mdat[1,j])*(data[i,j]-mdat[1,j])/(mdat[2,j]-mdat[1,j]) 
else if(data[i,j]<mdat[3,j]) 
mdat[2,j]+(mdat[2,j]-mdat[1,j])*(data[i,j]-mdat[2,j])/(mdat[3,j]-mdat[2,j]) 
else if(data[i,j]<mdat[4,j]) 
mdat[3,j]+(mdat[2,j]-mdat[1,j])*(data[i,j]-mdat[3,j])/(mdat[4,j]-mdat[3,j]) 
else 1}
 #plot utility functions
plot(data[,1],data1[,1])
 #write to a file
write.table(data1,"newdata.txt")
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s