Nt1330 Unit 3 Problem Analysis Paper

Words: 2027
Pages: 9

In this part, I am going to give the overview of the MapReduce programming model, but in the later part of the report I am going to explain more about the implementation and execution of the MapReduce programming model.
MapReduce is originally developed by Google to handle large amount of data in google. MapReduce is a programming model and also used for generating and processing large amounts of data. MapReduce model is inspired by the map and reduce function that is commonly used in functional programming. User write the program for both Map and Reduce. The map function is used for processing the Key/Value pair to generate intermediate Key/Value pair and Reduce function is used for merging all intermediate values associated with the same
…show more content…
The content of the file is split into large blocks and each block of the file is replicated at multiple DataNodes. The NameNode maintains the namespace tree and mapping of that file bolcks to DataNodes (store the physical location of the file blocks). HDFS client contacts the NameNode for the location of the particular data blocks and after getting the physical location of the data, reads block contents from the DataNode closest to the client. At the time of writing the file, Client asks NameNode to nominate a suite of three DataNode to host the block replicas and after getting the information about the DataNode, Client writes data to the DataNode in a pipeline fashion. Each Cluster has multiple NameNode and the Cluster can have thousands of the DataNodes and thousands of HDFS Clients, as DataNode may execute multiple application tasks concurrently. HDFS keeps the whole namespace in the Random Access Memory. The list of the blocks belonging to each file contains the meta-data of the name system called the image. Image is frequently stored in the local host's native file system is called a checkpoint. The NameNode also Stores the modification log of the image which is called a journal. During the restart of the NameNode restores the namespace by reading the namespace and replaying …show more content…
The user of the MapReduce library expresses the process as two different functions: Map and Reduce.
The Map function written by programmers, takes as input pair and produces a set of intermediate key/value pairs. The MapReduce library groups togather all the intermediate values associated with the dame intermediate key K1 and passes to the Reduce function.
The Reduce function, also written by programmers, takes the intermediate key K1 and the value for that particular key K1. It will merges together those values to for a smaller set of values associated with the particular key.
Following figure 4 is the overview of the MapReduce Programming model. Which has shown that the Mapper takes the input and generate key/value pair and then it will sort the output and, Reducer takes the sorted key/value pairs as the input and give the reduced output of merging of same key's value to the one key.
Following is the pseudo-code[12] to find the frequency of the words (word count) in the input documents: map(String key, String