1. Performance: Which gives us the real time analytics through millions of records?
2. Accessibility: Which gives the ability for accessing time?
3. Applications: Understanding what an application can do and in how many ways the application process the data as per the organisations requirement. Some of the stream processes we use frequently are Flume, Samza, Storm, Spark etc.
3. Detailed Design Of The System:
The targets we are going to achieve in this project are to get the timely tweets of customers from twitter organisation through twitter API. Here we will use stream processing to obtain tweets through twitter API. These obtained tweets will be stored in the form of data in centralised stores such as HBase and HDFS. This data then will be transferred to the
3.1 HDFS:
HDFS is one of the distributed file system in Hadoop. It is taken from google file system that is developed to run …show more content…
Read write operations will be performed through these data nodes as per the client requirements.
2. Some operations such as deletion, creation and replication are performed as per the instructions of name node.
3.2 MapReduce:
MapReduce is a programming model in which data can be distributed based on computing model in java. In this algorithm they consists of mainly two modules one is Map and the second one is Reduce. Map changes a set of data into another set of data where each element is separated into tuples. Tuples consisted of key/value pairs. Reduce takes the output from map and combines those data tuples into a tiny set of tuples.
The major advantage of MapReduce is to scale large data processing into multiple components. The data processing nodes in map reduce are called mappers and reducers, these processing nodes can run data even in hundreds or thousands of machines at a time in a cluster with a simple configuration change. This configuration change has attracted many organisations to use MapReduce as there programming language.
3.3