An extended stability paper has been accepted to Fuzzy Sets and Systems. This was the full version of our conference paper submitted to AGOP in 2015. R-code and some extra tables/data relating to the paper are available here.
Title: Approaches to learning strictly-stable weights for data with missing values
Authors: G. Beliakov, D. Gómez, S. James, J. Montero, J.T. Rodríguez
The problem of missing data is common in real-world applications of supervised machine learning such as classification and regression. Such data often gives rise to the need for functions defined for varying dimension. Here we propose optimization methods for learning the weights of quasi-arithmetic means in the context of data with missing values. We investigate some alternative approaches depending on the number of variables that have missing values and show results for several numerical experiments.
A collaboration with Gleb and Marek has been accepted and published as part of a special issue on economic welfare in the International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems.
Title: Penalty-based and other representations of economic inequality
Authors: G. Beliakov, M. Gagolewski and S. James
Economic inequality measures are employed as a key component in various socio-demographic indices to capture the disparity between the wealthy and poor. Since their inception, they have also been used as a basis for modelling spread and disparity in other contexts. While recent research has identified that a number of classical inequality and welfare functions can be considered in the framework of OWA operators, here we propose a framework of penalty-based aggregation functions and their associated penalties as measures of inequality.
Springer have released my book on aggregation functions. It is available for subscribing universities through SpringerLink and hardcover editions can also be ordered.
This is a book based on course material I’d prepared for a unit in our Master of Data Analytics at Deakin. The hope is that it’ll be accessible to those who don’t have a strong mathematics or programming background, but still treats the topic with some rigor, allowing intuition about some of the different concepts to be developed.
Title: An introduction to data analysis using aggregation functions in R
Download/access from Springer
Once again we were fortunate enough to be able to contribute to a special issue on consensus. This will be published in Fuzzy Optimization and Decision Making (a journal by Springer) and the special edition is titled “Fuzzy Approaches in Intelligent Decision Making and Consensus”, edited by Enrique Herrera-Viedma, Francisco Chiclana, Francisco Cabrerizo and YuCheng Dong.
Title: Aggregation and consensus for preference relations based on fuzzy partial orders
Authors: G. Beliakov, S. James and T. Wilkin
We propose a framework for eliciting and aggregating pairwise preference relations based on the assumption of an underlying fuzzy partial order. We also propose some linear programming optimization methods for ensuring consistency either as part of the aggregation phase or as a pre- or post- processing task. We contend that this framework of pairwise-preference relations, based on the Kemeny distance, can be less sensitive to extreme or biased opinions and is also less complex to elicit from experts. We provide some examples and outline their relevant properties and associated concepts.
This is a talk I’ll be giving myself at ISAS in July this year (shortly after IPMU). I’m very much looking forward to the format of this symposium and of being able to share ideas with the other researchers in attendance.
Title: Elicitation of fuzzy partial orders from incomplete preferences
Author: S. James
Recently we have proposed the framework of fuzzy partial order based preference relations for expression and aggregation of preferences. For a set of n alternatives/options, a FPO-based preference relation is represented as an n × n matrix A where each entry aij represents the degree to which option i is preferred to option j. While this kind of representation has been researched extensively, e.g. with multiplicative and additive relations, the key difference here is that a value of aij = 1 is interpreted as indicating option i is preferred to j, a value of aij = 0 means that option i is not preferred to j and values in-between represent partial preference. We therefore have the restriction that aij > 0 implies aji = 0, and the maximum expression of strength of preference is only crisp preference.
The perceived advantage of such a representation is that the aggregation of such matrices is less susceptible to extreme opinions, corresponding with a fuzzy version of the Kemeny distance. It also should align more with a natural expression of preference that is not as dependent on individual interpretations of a ratings scale.
While we have developed methods for obtaining final rankings of alternatives through aggregation and for repairing inconsistent matrices, a remaining problem is how to deal with large datasets involving many alternatives. In these situations, the elicitation of preferences becomes quite onerous on the decision maker and, on the computation side, the number of corresponding partial orders becomes expensively large. We propose to use a subset of triplets of comparison data, i.e. rankings provided between 3 alternatives, in order to obtain a final ranking of the alternatives. Our goal is to reduce the amount of information and effort required from the decision maker but still be able to obtain an acceptable ranking. Once the theory behind this process is developed, it can be evaluated on human subjects in terms of ease of preference elicitation and their agreement with the final ranking.
The FUZZIEEE conference is held during our teaching time so I wasn’t sure I would be able to attend this year. Fortunately my colleague and co-author, Tim Wilkin, is able to attend and present our paper there. It is based on research lead by Gleb on robust aggregation. We have also submitted a paper on this topic to TFS. The R package used has been made available from the CRAN repository. Additional code can also be found here.
Title: Robust OWA-Based Aggregation for Data with Outliers
Authors: G. Beliakov, S. James, T. Wilkin and T. Calvo
We consider the problem of aggregating a large number of online ratings where there may be outliers, representing biased, missing or erroneous evaluations. The penalty-based method proposed comprises both outlier detection and reallocation of weights and we focus on models dependent on the relative order of inputs, i.e. based on OWA operators, however we also define the model for weighted means.
Very much looking forward to attending the IPMU conference (Information Processing and Management of Uncertainty) in Eindhoven, Netherlands later in June this year. Over January I had worked with Marek Gagolewski on two papers and then also submitted a paper based on the work done with Dale Nimmo and honours student Andrew Geschke in ecological optimisation last year.
Title: Fitting aggregation functions to data: Part I – Linearization and regularization
Authors: M. Bartoszuk, G. Beliakov, M. Gagolewski and S. James
The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been dis- cussed in the literature yet. In the first part of the two-part contribution we deal with the concept of regularization, a quite standard technique from machine learning applied so as to increase the fit quality on test and validation data samples. Due to the constraints on the weighting vector, it turns out that many more methods can be used in the cur- rent framework, as compared to regression models. Moreover, it is worth noting that so far fitting weighted quasi-arithmetic means to empirical data has only been performed approximately, via the so-called lineariza- tion technique. In this paper we consider exact solutions to such special optimization tasks and indicate cases where linearization leads to much worse solutions.
Title: Fitting aggregation functions to data: Part II – Idempotization
Authors: M. Bartoszuk, G. Beliakov, M. Gagolewski and S. James
The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the second part of the two-part contribution we deal with a quite common situation in which we have inputs coming from different sources, describing a similar phenomenon, but which have not been properly normalized. In such a case, idempotent and nonde- creasing functions cannot be used to aggregate them unless proper pre- processing is performed. The proposed idempotentization method, based on the notion of B-splines, allows for an automatic calibration of independent variables. The introduced technique is applied in an R source code plagiarism detection system.
Title: Linear optimization for ecological indices based on aggregation functions
Authors: G. Beliakov, A. Geschke, S. James and D. Nimmo.
We consider an optimization problem in ecology where our objective is to maximize biodiversity with respect to different land-use allocations. As it turns out, the main problem can be framed as learning the weights of a weighted arithmetic mean where the objective is the geometric mean of its outputs. We propose methods for approximating solutions to this and similar problems, which are non-linear by nature, using linear and bilevel techniques.