Analyzing Hadoop power consumption and impact on application QoSby Javier Conejero, Omer Rana, Peter Burnap, Jeffrey Morgan, Blanca Caminero, Carmen Carrión

Future Generation Computer Systems

About

Year
2015
DOI
10.1016/j.future.2015.03.009
Subject
Computer Networks and Communications / Hardware and Architecture / Software

Text

Accepted Manuscript

Analysing Hadoop power consumption and impact on application QoS

Javier Conejero, Omer Rana, Peter Burnap, Jeffrey Morgan, Blanca

Caminero, Carmen Carrio´n

PII: S0167-739X(15)00064-3

DOI: http://dx.doi.org/10.1016/j.future.2015.03.009

Reference: FUTURE 2728

To appear in: Future Generation Computer Systems

Received date: 28 June 2014

Revised date: 28 December 2014

Accepted date: 9 March 2015

Please cite this article as: J. Conejero, O. Rana, P. Burnap, J. Morgan, B. Caminero, C.

Carrio´n, Analysing Hadoop power consumption and impact on application QoS, Future

Generation Computer Systems (2015), http://dx.doi.org/10.1016/j.future.2015.03.009

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. - Power consumption characterisation of Hadoop Clouds (with a Social media use case). - OpenNebula based Cloud environments. - Experimentation on two different Cloud infrastructures (single node - multi node). - Inclusion of the QoS related to power consumption (in terms of processing time). - Extension of the experiments, in order to compare the use of variable-sized VMs in one node (results included in CLOSER paper), with fixed-sized VMs spreaded over more than one node. *Highlights (for review)

Analysing Hadoop Power Consumption and Impact on

Application QoS

Javier Conejeroa, Omer Ranab, Peter Burnapb, Jeffrey Morganc, Blanca

Camineroa, Carmen Carrio´na aComputing Systems Department, University of Castilla-La Mancha

Campus Universitario s/n, 02071 Albacete, Spain bSchool of Computing & Informatics, Cardiff University

Queens Buildings, 5 The Parade, Cardiff CF24 3AA, UK cSchool of Social Sciences, Cardiff University

Glamorgan Building, King Edward VII Avenue, Cardiff CF10 3WT, UK

Abstract

Energy efficiency is often identified as one of the key reasons for migrating to Cloud environments. It is stated that a data centre hosting the Cloud environment is likely to achieve greater energy efficiency (at a reduced cost) compared to a local deployment. With increasing energy prices, it is also estimated that a large percentage of operational costs within a Cloud environment can be attributed to energy. In this work, we investigate and measure energy consumption of a number of virtual machines running the Hadoop system, over an OpenNebula Cloud. Our workload is based on sentiment analysis undertaken over Twitter messages. Our objective is to understand the tradeoff between energy efficiency and performance for such a workload.

From our results we generalise and speculate on how such an analysis could be used as a basis to establish a Service Level Agreement (SLA) with a Cloud provider – especially where there is likely to be a high level of variability (both in performance and energy use) over multiple runs of the same application (at different times). Among the service level objectives that might be included in a SLA, Quality of Service (QoS) related metrics (i.e., latency) are one of the most challenging to support. This work provides some insight on the reEmail addresses: Francisco.Conejero@uclm.es (Javier Conejero), ranaof@cardiff.ac.uk (Omer Rana), burnapp@cardiff.ac.uk (Peter Burnap),

MorganJ51@cardiff.ac.uk (Jeffrey Morgan), mariablanca.caminero@uclm.es (Blanca

Caminero), Carmen.Carrion@uclm.es (Carmen Carrio´n)

Preprint submitted to Future Generation Computer Systems December 28, 2014 *Manuscript

Click here to view linked References lationship between power consumption and QoS related metrics, describing how a combined consideration of these two metrics could be supported for a particular workload.

Keywords:

Cloud computing, Power consumption, Hadoop, OpenNebula, Social media analysis 1. Introduction

Various companies (ranging in size and computing maturity) are adopting Cloud computing technology to perform their business processes, mainly driven by the fact that it reduces the cost of computing infrastructure deployment and management. At the same time, environmental concerns of many large scale computing infrastructure operators – primarily large data centres – have prompted the need for considering more energy efficient operation of computational infrastructure. This coupled with the need to consider new sources of energy, such as solar/wind energy, leads to important challenges in understanding how more energy efficient Cloud computing could be provided to end users. It is also useful to note that the business case for migrating to

Cloud computing systems has often centered on the cost savings that would arise due to reduced use of energy at a client site. Currently, energy costs account for a large percentage of operational expenditure for computational infrastructure. It is often stated that due to the economies of scale, the ability to negotiate cheaper energy tariffs and the use of renewable energy sources, data centre operators are able to offer both cost and energy efficient operational systems.

With increasing outsourcing of computational capability comes the need to specify Service Level Agreements (SLAs) with infrastructure providers.

Such SLAs may also include support for pay-per-use scalability of backend servers, enabling a company to dynamically grow its computational usage based on demand (using an incremental charging model for the excess capacity used). Determining how such SLAs should be specified and subsequently monitored for conformance remains a challenge with many commercial Cloud providers – where repeatable performance is difficult to guarantee in many instances (due to the use of virtualisation and a variable mapping between virtual and physical resources). Increasingly, there is also the demand to include “green” metrics into an SLA, to enable a company using a data centre 2 to display its environmentally friendly credentials to customers. For instance, the EU Optimis project [1] identified parameters such as “BREEAM Certification”, “LEED Certification”, “Energy Star Rating” as potential parameters which could enable a Cloud provider to demonstrate their credentials in energy efficiency. These parameters could also be combined with additional metrics such as Carbon footprint, energy consumption per VM, etc to give a more finer grained analysis of energy use. Consequently, there is increasing interest in making Cloud computing environmentally sustainable [2] – thereby requiring techniques to improve power efficiency at all levels of the data centre (from resource scheduling of workloads to the operation of