An efficient radius-incorporated MKL algorithm for Alzheimer's disease prediction
Xinwang Liu a, Luping Zhou b,n, Lei Wang b, Jian Zhang c, Jianping Yin a, Dinggang Shen d,e,n a School of Computer, National University of Defense Technology, Changsha 410073, China b School of Computer Science and Software Engineering, University of Wollongong, NSW 2522, Australia c Faculty of Engineering and Information Technology, University of Technology, Sydney, NSW 2007, Australia d The Department of Radiology and Biomedical Research Imaging Center (BRIC), University of North Carolina, Chapel Hill, NC 27599, USA e Department of Brain and Cognitive Engineering, Korea University, Seoul, Republic of Korea a r t i c l e i n f o
Received 28 January 2013
Received in revised form 11 August 2014
Accepted 9 December 2014
Available online 17 December 2014
Multiple kernel learning
Support vector machines
Neuroimaging a b s t r a c t
Integrating multi-source information has recently shown promising performance in predicting Alzheimer's disease (AD). Multiple kernel learning (MKL) plays an important role in this regard by learning the combination weights of a set of base kernels via the principle of margin maximisation. The latest research on MKL further incorporates the radius of minimum enclosing ball (MEB) of training data to improve the kernel learning performance. However, we observe that directly applying these radiusincorporated MKL algorithms to AD prediction tasks does not necessarily improve, and sometimes even deteriorate, the prediction accuracy. In this paper, we propose an improved radius-incorporated MKL algorithm for AD prediction. First, we redesign the objective function by approximating the radius of
MEB with its upper bound, a linear function of the kernel weights. This approximation makes the resulting optimisation problem convex and globally solvable. Second, instead of using cross-validation, we model the regularisation parameter C of the SVM classifier as an extra kernel weight and automatically tune it in MKL. Third, we theoretically show that our algorithm can be reformulated into a similar form of the SimpleMKL algorithm and conveniently solved by the off-the-shelf packages. We discuss the factors that contribute to the improved performance and apply our algorithm to discriminate different clinic groups from the benchmark ADNI data set. As experimentally demonstrated, our algorithm can better utilise the radius information and achieve higher prediction accuracy than the comparable MKL methods in the literature. In addition, our algorithm demonstrates the highest computational efficiency among all the comparable methods. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction
Pattern recognition techniques have been extensively applied to the analysis and diagnosis of medical diseases, and their effectiveness and significance have been well demonstrated in the literature [1–3]. In particular, accurate classification of people contracting a disease and the healthy population helps not only treatment but also early prevention. Therefore, developing better classification methods in this regard is highly desired. In this paper, we aim to develop a new pattern classification algorithm that can achieve improved classification performance when applied to Alzheimer's disease.
Alzheimer's disease (AD in short) is the most common neurodegenerative disease, covering 60–70% age-related dementia . It is a fatal disease that worsens as it progresses. Mild cognitive impairment (MCI) is a precursor of AD. It is heterogeneous, with a conversion rate of 15% per year to AD . Considering the immense cost on looking after AD patients, early identification of MCI and AD patients is of great significance. As a result, the following two classification tasks become important: (i) discriminating MCI patients from the healthy population; and (ii) discriminating the MCI patients who will convert to AD from those who will not. Since the two tasks can generally be viewed as predicting whether a person will develop towards or into
AD, we call them collectively “AD prediction” for short in this paper.
Recent studies have demonstrated neuroimaging techniques as an important meanings for AD analysis [6,7]. For example, magnetic resonance imaging (MRI) shows grey matter morphometry, and
Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/pr
Pattern Recognition http://dx.doi.org/10.1016/j.patcog.2014.12.007 0031-3203/& 2014 Elsevier Ltd. All rights reserved. n Corresponding authors.
E-mail addresses: firstname.lastname@example.org, email@example.com (X. Liu), firstname.lastname@example.org (L. Zhou), email@example.com (L. Wang), firstname.lastname@example.org (J. Zhang), email@example.com (J. Yin), firstname.lastname@example.org (D. Shen).
Pattern Recognition 48 (2015) 2141–2150
Fluorodeoxyglucose (FDG) positron emission tomography (PET) shows metabolic activity. In this case, more effective AD prediction methods have been developed by combining the complementary information carried by these imaging modalities [8,9]. As seen in the recent literature, the combination methods can be performed at feature level [10,11,8,12–14] or classifier level . A common practice of feature-level combination is to concatenate the features from different modalities into a long feature vector [10,11] and use it for classification. However, such concatenation usually requires proper normalisation of the features from different modalities.
Otherwise, classification could be dominated by the features that have large variation but are not necessarily discriminative, leading to less satisfying classification performance.
In the past several years, multiple kernel learning (MKL) has shown superior performance to the methods using feature-level combination on AD prediction . MKL is an important extension of support vector machines (SVM)  for handling multiple information sources. By predefining one (or multiple in general) “base” kernel function for each source, MKL aims to find the optimal linear combination weights of these kernels by maximising classification-performance-related criteria such as the margin of two classes. One of the representative algorithms is SimpleMKL . It has been used for AD prediction by combining multiple modalities such as MRI, PET, and cerebrospinal fluid (CSF) parameters [12–14]. Due to its promising classification performance and solid theoretical foundation, SimpleMKL  is regarded as the state-of-the-art for AD prediction with multiple modalities.