American History (4,233)
Biographies (1,168)
Book Reports (3,862)
Business (17,294)
English (13,871)
History Other (3,821)
Miscellaneous (12,648)
Music and Movies (1,106)
Philosophy (1,165)
Psychology (1,486)
Religion (953)
Science (2,671)
Social Issues (7,924)
Technology (1,924)

Explain How a Big Data Hadoop System Works in Simple Words?

Essay by Jyothi Allada • February 25, 2018 • Term Paper • 999 Words (4 Pages) • 1,355 Views

Essay Preview: Explain How a Big Data Hadoop System Works in Simple Words?

prev next

Report this essay

Page 1 of 4

Explain how a big data hadoop system works in simple words?

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. Hadoop’s strength lies in its ability to scale up from single servers to thousands of machines, each offering local computation and storage.

Hadoop delegates tasks across these servers (called “worker nodes” or “slave nodes”), essentially harnessing the power of each device and running them together simultaneously. This is what allows massive amounts of data to be analysed: splitting the tasks across different locations in this manner allows bigger jobs to be completed faster. Hadoop comprises of many different components that all work together to create a single platform. There are two key functional components within this ecosystem: The storage of data (Hadoop Distributed File System, or HDFS) and the framework for running parallel computations on this data (MapReduce).

Hadoop Distributed File System (HDFS)

HDFS enables Hadoop to store huge files. It’s a scalable file system that distributes and stores data across all machines in a Hadoop cluster. Each HDFS cluster contains the following:

NameNode: Runs on a master node that tracks and directs the storage of the cluster.
DataNode: Runs on slave nodes, which make up the majority of the machines within a cluster. The NameNode instructs data files to be split into blocks, each of which are replicated three times and stored on machines across the cluster. These replicas ensure the entire system won’t go down if one server fails or is taken offline, known as fault tolerance.
Client machine: neither a NameNode or a DataNode, Client machines have Hadoop installed on them. They’re responsible for loading data into the cluster, submitting MapReduce jobs and viewing the results of the job once complete.

MapReduce

MapReduce is the system used to efficiently process the large amount of data Hadoop stores in HDFS. MapReduce or YARN, are used for scheduling and processing. Hadoop MapReduce executes a sequence of jobs, where each job is a Java application that runs on the data. Instead of MapReduce, using querying tools like Pig Hadoop and Hive Hadoop gives the data hunters strong power and flexibility.

Hadoop workflow

The typical workflow of the Hadoop while executing a job includes:

Loading data into the cluster/HDFS
Perform the computation using MapReduce jobs
Store the output results again in HDFS
Retrieve the results from the cluster/HDFS

Consider an instance where we have all the promotional emails sent to our customers and we want to find to how many people were sent a discount coupon “DISCOUNT25” in a particular campaign. We can load this data to HDFS and then write a MapReduce job which will read all the emails and see if they contain the required word and count the number of customers who received such emails. Finally, it stores the result to HDFS and from there we can retrieve the result.

What is Map - Reduce and how does it work?

Hadoop MapReduce is the heart of the Hadoop system. It provides all the capabilities we need to break big data into manageable chunks, process the data in parallel on a distributed cluster, and then make the data available for user consumption or additional processing. And it does all this work in a highly resilient, fault-tolerant manner.

Hadoop MapReduce includes several stages, each with an important set of operations helping to get to the goal of getting the answers we need from big data. The process starts with a user request to run a MapReduce program and continues until the results are written back to the HDFS. HDFS and MapReduce perform their work on nodes in a cluster hosted on racks of commodity servers.

...

...

Download as: txt (6 Kb) pdf (72.9 Kb) docx (13.2 Kb)

Continue for 3 more pages »

Read Full Essay Save

Only available on Essays24.com

Similar Essays

Data Base Mangement System

DATABASE MANAGEMENT SYSTEM (DBMS) A Database Management System (DBMS) is a software program that typically operates on a database server or mainframe system to manage

1,228 Words | 5 Pages
Functions Of An Enterprise Data Management (Edm) System

Enterprise data management is a system to Manage Electronic Data by providing control and security (CPD notes, 2005) In today's connected world, data and the

892 Words | 4 Pages
Advantages and Disadvantages of Big Data

One of the most complex business mysteries revolves around consumer behavior and how to best cater to the consumer needs and preferences. Companies bridge the

734 Words | 3 Pages
Big Data: More Disadvantages Than Advantages? (dutch)

Big data: meer nadelen dan voordelen? The emergence of social media in the middle of 2000s created opportunities to study social and cultural processes and

1,993 Words | 8 Pages
Big Data: Gold or Hype?

1 Introduction According to Singh, Garg, and Mishra (2015), they define that big data is that size of data that cannot be kept and managed

1,306 Words | 6 Pages
Bbs 301 Applying Mixed Methods Research to Business - a Big Data Focus Facebook Data Application

________________ BBS 301 APPLYING MIXED METHODS RESEARCH TO BUSINESS. A BIG DATA FOCUS FACEBOOK DATA APPLICATION ________________ 1. Table of Contents 1 INTRODUCTION 1 2

2,293 Words | 10 Pages
Descriptive Analysis of Big Data

Lo Man Tai 17058036D TUT003 1. Introduction Half of the first semester has passed and I have been a student of BBA for a while.

1,730 Words | 7 Pages
Big Data Analysis

INTRODUCTION This period unlike any is looked with explosive development in the extent of information generated. Information development has experienced a renaissance, affected basically by

645 Words | 3 Pages

Similar Topics

Browse 74,000+ Papers and Essays
Join 500,000+ Other Members
High Quality Documents

Sign up