Analysing Hadoop power consumption and impact on application QoS
Javier Conejero, Omer Rana, Peter Burnap, Jeffrey Morgan, Blanca
Caminero, Carmen Carrio´n
Reference: FUTURE 2728
To appear in: Future Generation Computer Systems
Received date: 28 June 2014
Revised date: 28 December 2014
Accepted date: 9 March 2015
Please cite this article as: J. Conejero, O. Rana, P. Burnap, J. Morgan, B. Caminero, C.
Carrio´n, Analysing Hadoop power consumption and impact on application QoS, Future
Generation Computer Systems (2015), http://dx.doi.org/10.1016/j.future.2015.03.009
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. - Power consumption characterisation of Hadoop Clouds (with a Social media use case). - OpenNebula based Cloud environments. - Experimentation on two different Cloud infrastructures (single node - multi node). - Inclusion of the QoS related to power consumption (in terms of processing time). - Extension of the experiments, in order to compare the use of variable-sized VMs in one node (results included in CLOSER paper), with fixed-sized VMs spreaded over more than one node. *Highlights (for review)
Analysing Hadoop Power Consumption and Impact on
Javier Conejeroa, Omer Ranab, Peter Burnapb, Jeffrey Morganc, Blanca
Camineroa, Carmen Carrio´na aComputing Systems Department, University of Castilla-La Mancha
Campus Universitario s/n, 02071 Albacete, Spain bSchool of Computing & Informatics, Cardiff University
Queens Buildings, 5 The Parade, Cardiff CF24 3AA, UK cSchool of Social Sciences, Cardiff University
Glamorgan Building, King Edward VII Avenue, Cardiff CF10 3WT, UK
Energy efficiency is often identified as one of the key reasons for migrating to Cloud environments. It is stated that a data centre hosting the Cloud environment is likely to achieve greater energy efficiency (at a reduced cost) compared to a local deployment. With increasing energy prices, it is also estimated that a large percentage of operational costs within a Cloud environment can be attributed to energy. In this work, we investigate and measure energy consumption of a number of virtual machines running the Hadoop system, over an OpenNebula Cloud. Our workload is based on sentiment analysis undertaken over Twitter messages. Our objective is to understand the tradeoff between energy efficiency and performance for such a workload.
From our results we generalise and speculate on how such an analysis could be used as a basis to establish a Service Level Agreement (SLA) with a Cloud provider – especially where there is likely to be a high level of variability (both in performance and energy use) over multiple runs of the same application (at different times). Among the service level objectives that might be included in a SLA, Quality of Service (QoS) related metrics (i.e., latency) are one of the most challenging to support. This work provides some insight on the reEmail addresses: Francisco.Conejero@uclm.es (Javier Conejero), email@example.com (Omer Rana), firstname.lastname@example.org (Peter Burnap),
MorganJ51@cardiff.ac.uk (Jeffrey Morgan), email@example.com (Blanca
Caminero), Carmen.Carrion@uclm.es (Carmen Carrio´n)
Preprint submitted to Future Generation Computer Systems December 28, 2014 *Manuscript
Click here to view linked References lationship between power consumption and QoS related metrics, describing how a combined consideration of these two metrics could be supported for a particular workload.
Cloud computing, Power consumption, Hadoop, OpenNebula, Social media analysis 1. Introduction
Various companies (ranging in size and computing maturity) are adopting Cloud computing technology to perform their business processes, mainly driven by the fact that it reduces the cost of computing infrastructure deployment and management. At the same time, environmental concerns of many large scale computing infrastructure operators – primarily large data centres – have prompted the need for considering more energy efficient operation of computational infrastructure. This coupled with the need to consider new sources of energy, such as solar/wind energy, leads to important challenges in understanding how more energy efficient Cloud computing could be provided to end users. It is also useful to note that the business case for migrating to
Cloud computing systems has often centered on the cost savings that would arise due to reduced use of energy at a client site. Currently, energy costs account for a large percentage of operational expenditure for computational infrastructure. It is often stated that due to the economies of scale, the ability to negotiate cheaper energy tariffs and the use of renewable energy sources, data centre operators are able to offer both cost and energy efficient operational systems.
With increasing outsourcing of computational capability comes the need to specify Service Level Agreements (SLAs) with infrastructure providers.
Such SLAs may also include support for pay-per-use scalability of backend servers, enabling a company to dynamically grow its computational usage based on demand (using an incremental charging model for the excess capacity used). Determining how such SLAs should be specified and subsequently monitored for conformance remains a challenge with many commercial Cloud providers – where repeatable performance is difficult to guarantee in many instances (due to the use of virtualisation and a variable mapping between virtual and physical resources). Increasingly, there is also the demand to include “green” metrics into an SLA, to enable a company using a data centre 2 to display its environmentally friendly credentials to customers. For instance, the EU Optimis project  identified parameters such as “BREEAM Certification”, “LEED Certification”, “Energy Star Rating” as potential parameters which could enable a Cloud provider to demonstrate their credentials in energy efficiency. These parameters could also be combined with additional metrics such as Carbon footprint, energy consumption per VM, etc to give a more finer grained analysis of energy use. Consequently, there is increasing interest in making Cloud computing environmentally sustainable  – thereby requiring techniques to improve power efficiency at all levels of the data centre (from resource scheduling of workloads to the operation of