Hadoop Online Training

Become an expert in Hadoop by acquiring knowledge various concepts of Big Data Hadoop Such as MapReduce, Hadoop architecture, Pig & Hive, Oozie, Flume and Apache workflow scheduler. Also, get familiar with Hbase, Zookeeper and Sqoop concepts, while working on industry-based, use-cases and projects.

The Big Data and Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.
Hadoop Online Training

Introduction

  • What is Hadoop?
  • The Hadoop Distributed File System
  • Hadoop Map Reduce Works
  • Anatomy of a Hadoop Cluster
  • Master Daemons
  • Name node
  • Job Tracker
  • Secondary name node
  • Slave Daemons
  • Job tracker
  • Task tracker

HDFS (Hadoop Distributed File System)

  • Blocks and Splits
  • Input Splits
  • HDFS Splits
  • Data Replication
  • Hadoop Rack Aware
  • Data high availability
  • Data Integrity
  • Cluster architecture and block placement
  • Accessing HDFS
  • JAVA Approach
  • CLI Approach
  • Programming Practices
  • Developing MapReduce Programs
  • Running without HDFS and Mapreduce
  • Running all daemons in a single node 
  • Running daemons on dedicated nodes
  • Local Mode
  • Pseudo-distributed Mode
  • Fully distributed mode

Setup Hadoop cluster of Apache, Cloudera and HortonWorks

  • Make a fully distributed Hadoop cluster on a single laptop/desktop
  • Name Node in Safe mode
  • Meta Data Backup
  • Integrating Kerberos security in hadoop

Writing a MapReduce Program

  • Examining a Sample MapReduce Program with several examples
  • Basic API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop's Streaming API

Performing several hadoop jobs

  • The configure and close Methods
  • Sequence Files
  • Record Reader
  • Record Writer
  • Role of Reporter
  • Output Collector
  • Processing XML files
  • Counters
  • Directly Accessing HDFS
  • ToolRunner
  • Using The Distributed Cache

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Classification/Machine Learning
  • Term Frequency - Inverse Document Frequency
  • Word Co-Occurrence
  • Creating an Inverted Index
  • Identity Mapper
  • Identity Reducer
  • Exploring well known problems using MapReduce applications

Debugging MapReduce Programs

  • Testing with MRUnit
  • Logging
  • Other Debugging Strategies

Advanced MapReduce Programming

  • A Recap of the MapReduce Flow
  • The Secondary Sort
  • Customized Input Formats and Output Formats

Monitoring and debugging on a Production Cluster

  • Counters
  • Skipping Bad Records
  • Rerunning failed tasks with Isolation Runner

Tuning for Performance in MapReduce

  • Reducing network traffic with combiner
  • Partitioners
  • Using Compression
  • Reusing the JVM
  • Running with speculative execution
  • Refactoring code and rewriting algorithms Parameters affecting Performance
  • Other Performance Aspects

HBase

  • HBase concepts
  • HBase architecture
  • Region server architecture
  • File storage architecture
  • HBase basics
  • Column access
  • Scans
  • HBase use cases
  • Install and configure HBase on a multi node cluster
  • Create database
  • Develop and run sample applications
  • Access data stored in HBase using clients like Java
  • Python and Pearl
  • HBase and Hive Integration
  • HBase admin tasks
  • Defining Schema and basic operation

Hive

  • Hive concepts
  • Hive architecture
  • Install and configure hive on cluster
  • Create database
  • access it from java client
  • Buckets
  • Partitions
  • Joins in hive
  • Inner joins
  • Outer Joins
  • Hive UDF
  • Hive UDAF
  • Hive UDTF
  • Develop and run sample applications in Java/Python to access hive

PIG

  • Pig basics
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • Pig Vs Hive
  • Write sample Pig Latin scripts
  • Modes of running PIG
  • Running in Grunt shell
  • Programming in Eclipse
  • Running as Java program
  • PIG UDFs
  • Pig Macros

Flume, Chukwa, Avro, Scribe, Thrift

  • Flume and Chukwa concepts
  • Use cases of Thrift
  • Avro and scribe
  • Install and configure flume on cluster
  • Create a sample application to capture logs from Apache using flume

CDH4 Enhancements

  • Name Node High – Availability
  • Name Node federation
  • Fencing
  • YARN

Hadoop Challenges

  • Hadoop disaster recovery
  • Hadoop suitable cases

Exercies

  • Hadoop Project - a real time project where students can practice
Latest
Previous
Next Post »