Author's Accepted Manuscript

On the Usefulness of One-Class Classifier

Ensembles for Decomposition of Multi-Class

Problems

Bartosz Krawczyk, Michał Woźniak, Francisco

Herrera

PII: S0031-3203(15)00221-6

DOI: http://dx.doi.org/10.1016/j.patcog.2015.06.001

Reference: PR5444

To appear in: Pattern Recognition

Received date: 12 January 2014

Revised date: 24 April 2015

Accepted date: 7 June 2015

Cite this article as: Bartosz Krawczyk, Michał Woźniak, Francisco Herrera, On the Usefulness of One-Class Classifier Ensembles for Decomposition of MultiClass Problems, Pattern Recognition, http://dx.doi.org/10.1016/j.patcog.2015.06.001

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form.

Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. www.elsevier.com/locate/pr

On the Usefulness of One-Class Classifier Ensembles for

Decomposition of Multi-Class Problems

Bartosz Krawczyka,∗, Michal Woz´niaka, Francisco Herrerab,c aDepartment of Systems and Computer Networks, Wroclaw University of Technology,

Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland bDepartment of Computer Science and Artificial Intelligence, University of Granada, P.O.

Box 18071, Granada, Spain cFaculty of Computing and Information Technology - North Jeddah, King Abdulaziz

University, 21589, Jeddah, Saudi Arabia

Abstract

Multi-class classification can be addressed in a plethora of ways. One of the most promising research directions is applying the divide and conquer rule, by decomposing the given problem into a set of simpler sub-problems and then reconstructing the original decision space from local responses.

In this paper, we propose to investigate the usefulness of applying one-class classifiers to this task, by assigning a dedicated one-class descriptor to each class, with three main approaches: one-versus-one, one-versus-all and trained fusers.

Despite not using all the knowledge available, one-class classifiers display several desirable properties that may be of benefit to the decomposition task. They can adapt to the unique properties of the target class, trying to fit a best concept description. Thus they are robust to many difficulties embedded in the nature of data, such as noise, imbalanced or complex distribution. We analyze the possibilities of applying an ensemble of one-class methods to tackle multi-class problems, with a special attention paid to the final stage - reconstruction of the original multi-class problem. Although binary decomposition is more suitable for most standard datasets, we identify the specific areas of applicability for one-class classifier decomposition.

To do so, we develop a double study: first, for a given fusion method, we compare one-class and binary classifiers to find the correlations between classifier models and fusion algorithms. Then, we compare the best methods from each group (one-versus-one, one-versus-all and trained fusers) to draw conclusions about the overall performance of one-class solutions. We show, backed-up by thorough statistical analysis, that one-class decomposition is a worthwhile approach, especially in case of problems with complex distribution and a large number of classes.

Keywords: one-class classification, multi-class classification, classifier ensemble, decomposition strategies, classifier fusion, binary classification.

Preprint submitted to Pattern Recognition June 12, 2015 1. Introduction

Multi-class problems are abundant in real-life applications. Ranging from several classes in chemometrics [4], medicine [10], by dozens in object recognition [16] and computer vision [17] to hundreds in biometrics [11]. Often with the increase in the number of classes, comes the increased complexity of the estimated decision rules. This leads to the possibility of overfitting and increasing the computational cost of recognition system. Additionally, the difficulties in classification may be present only for some classes, while others can be separated with minimal error.

Building a classifier that handles only a reduced subset of classes may be an solution to these problems. As binary classification itself is well-studied in the last years [47], binary decomposition methods has gained an significant attention of the machine learning community [1]. Binary classifiers return simpler decision boundaries and allow to reduce the competence areas of each classifier, thus producing locally specialized learners. This leads to a creation of an ensemble of binary learners [63], each dedicated to a sub-problem. From their local decision, the dedicated fusion method must reconstruct the original multi-class problem. While binary decomposition has been proven to perform very well in most multi-class problems [21], it have some limitations as being very dependent on the selected fusion method or low robustness to imbalanced and sparse distribution. That is why novel methods for tackling multi-class data need to be examined, having in mind that binary decomposition is a well-established point of reference.

In this work, we turn the attention towards one-class classification (OCC), which is quite young, yet challenging machine learning domain [35]. OCC seems an natural way of decomposing a multi-class dataset. We consider each class as independent and train a one-class model on each of them. Then, we reconstruct the original problem with a dedicated fusion algorithm, just as in binary decomposition.

Application of OCC to problems, where generating negative examples can be costly, dangerous or simply impossible [24] is obvious. However, one may ask: what is the point of applying OCC to multi-class problems, where the representatives of all classes are given beforehand? Some studies show, that when objects from all classes are available it is preferable to use binary classifiers than their one-class counterparts [8; 26].