Hadoop Development

Big Data (Hadoop) Developer Course outline

Introduction to Big data and Hadoop

  •  Understanding Big Data
  •  Challenges in processing Big Data
  •  3V Characteristics (Volume, Variety and Velocity)
  •  Brief history of Hadoop
  •  How Hadoop addresses Big Data?
  •  HDFS and MR
  •  Hadoop echo system

HDFS (Hadoop Distributed File System)

  •  HDFS Overview and Architecture
  •  HDFS Keywords like Name Node, Data Node, Heart Beat etc
  •  Configuring HDFS
  •  Data Flows (Read and Write)
  •  HDFS Permissions and Security
  •  HDFS commands
  •  Rack Awareness

5 Daemons processes
 Map Reduce

  • Map Reduce Basics
  •  Map Reduce Data Flow
  • Word count Example solving
  • Algorithms for simple and complex problems
  • Hadoop Streaming

 Developing a Map Reduce Application

  • Setting up working environment
  • Custom Data types (Writable and Custom Key types)
  •  Input and Output file formats
  • Driver, Mapper and Reducer Code Wal thru
  • Configuring IDE Eclipse
  •  Writing Unit test and running locally
  • Map Reduce Web UI
  • Hands -on

How Map Reduce works?

Classic Map Reduce (Map Reduce I)

YARN (Map Reduce II)

Job Scheduling

Shuffle and Sort


Oozie Workflows

Hands-on Excercises

 How Map Reduce works?

  •  Map Reduce Types
  • Input formats – Input splits & records, text input, binary input, multiple inputs and database input.
  • Output formats – text output, binary output, multiple outputs, Lazy output and database output.
  • Hands-on

 Hadoop Echo Systems


  • Overview of PIG
  • Installation and running PIG
  • PIG Latin
  • Loading and storing data
  • Hands-on


  • Overview of HIVE
  •  Installation and running HIVE
  • HiveQL
  • Tables
  • Hands-on


Overview of HBASE
CLinets (avro, REST, Thrift)



Overview of SQOOP
• Solving Case studies