Cloud Computing - Task Scheduling Based on Genetic Algorithms
Essay by Ekrem YILDIRIM • December 9, 2017 • Research Paper • 3,926 Words (16 Pages) • 1,356 Views
Essay Preview: Cloud Computing - Task Scheduling Based on Genetic Algorithms
Cloud Computing –
Task Scheduling based on Genetic Algorithms
Eleonora Maria Mocanu, Mihai Florea, Mugurel Ionuţ Andreica, Nicolae Ţăpuş
Computer Science Department, Politehnica University of Bucharest, Bucharest, Romania nora.mocanu@gmail.com, florea.mihai@gmail.com, mugurel.andreica@cs.pub.ro, ntapus@cs.pub.ro
Abstract— Cloud Computing is a cutting edge technology for managing and delivering services over the Internet. Map-Reduce is the programming model used in cloud computing for processing large data sets in parallel over huge clusters. In order to increase efficiency, a good task scheduling is needed. Genetic algorithms are very useful and accurate in finding solutions to large scale optimization problems, such as task scheduling. They have gained immense popularity over last few years as a robust and easily adaptable search technique. Hadoop, the open source implementation of Map-Reduce, has several task schedulers available (FIFO, Fair, Capacity Schedulers), but neither one of them is focused on minimizing the global execution time. The goal of this project is to improve Hadoop’s functionality by implementing a scheduler based on a genetic algorithm, solving the stated problem.
Keywords - Cloud Computing; Genetic Alghorithms; MapReduce; Hadoop
I. INTRODUCTION
Cloud Computing is the next natural step in the evolution of on-demand information technology services. It offers a way to increase storage capacity, computation power and add capabilities on the fly without investing in new infrastructure, licensing new software or training new personnel. In order to do all that, it needs a good taskresource planning.
Task scheduling in distributed systems usually has the goal of distributing the load on processors, maximizing their utilization, while minimizing the global task execution time. Task scheduling can be defined into two kinds: static scheduling and dynamic scheduling.
Static scheduling uses so-called prescheduling technology to schedule known tasks in foregone environment, while dynamic scheduling depends on not only the foregone tasks, but also on the current system state to make scheduling plans. Dynamic scheduling is used in real life because a cloud system is dynamically changing over time. Task flow is uncertain and during the possible long time of execution, available resources with their quantity and form are changing all the way; resources capability, current load, interests and tasks’ requests are dynamic, too.
978-1-4673-0750-5/12/$31.00 ©2012 IEEE
Dynamic scheduling problem is an NP-complete problem, whose cost of precise computation will increase exponentially with the increase of the problem scale. In fact, we can just find out an approximate solution to NPcomplete problem [7]. At present, three general algorithms become popular to deal with such problems: Simulated Annealing, Tabu Search and Genetic Algorithm. Among them, GA gains wide attention because of its simplicity, robustness and accuracy in finding the global optimum.
This paper is focused on scheduling problems inside Map-Reduce framework, Google’s abstraction for processing large data sets in real time. Hadoop is the open source implementation of Map-Reduce used by large IT companies such as Facebook or Yahoo!. Hadoop comes with four scheduler choices (FIFO, Fair Scheduler, Capacity Scheduler and Dynamic Scheduler). Job scheduling in Hadoop is performed by the master node, which receives heartbeats sent by slaves at 3 seconds interval with their status and asking for new tasks.
The FIFO scheduler assigns tasks in the order of their submit time, but has also 5 priority levels. The Fair scheduler tries to offer equal shares of the cluster resources to user pools and to jobs within them. Jobs are scheduled in the order of their deficit. Capacity Scheduler offers the possibility of defining queues of jobs and is focused on using the whole cluster’s capacity. Dynamic Scheduler comes with the possibility to manage budgets and dynamically increase job weights. Neither of the schedulers is focused on minimizing global execution time. That is why implementing a scheduler based on genetic algorithms could lead to a better throughput and a more efficient task assignment.
The goal of this project is to study how a Genetic Scheduler could improve Hadoop’s task assignment and to develop it using the existing framework.
II. STATE OF THE ART
The following section of the paper will focus on explaining the concepts and technologies that were used in order to construct the best solution for our application.
With our society’s fast and continuous development comes the need to process greater quantities of data in reasonable amounts on time. That is why there is always need for more computation power and greater storage capacities. As silicon based processor technology has reached its limits in matter of computation speed, the next step was to use two or more processors instead of one. This is how parallel and distributed systems were born.
- Cloud Computing
Cloud computing provides computation, software, data access, and storage services that do not require end-user knowledge of the physical location and configuration of the platform that delivers the services. It is based on a heterogeneous system of distributed clusters that is transparent to the user.
Cloud computing is Internet-based development and use of computer technology. The cloud is a metaphor for the Internet and is an abstraction for the complex and dynamic infrastructure it conceals [16].
- Map-Reduce
Map-Reduce [2] is a programming model developed by Google, used for writing applications that process large sets of data (that can reach petabytes) on parallel and distributed systems. The processed data can be crawled documents, web request logs etc. and the processing applications: determining summaries of number of pages crawled per host, set of most frequent queries, frequency of words in a document etc.Map-Reduce, like the name says, is based on two operations derived from functional programming, Map and Reduce.
...
...