Hadoop Online Training - Online Training in Michigan[MI]

Become an expert in Hadoop by acquiring knowledge various concepts of Big Data Hadoop Such as MapReduce, Hadoop architecture, Pig & Hive, Oozie, Flume and Apache workflow scheduler. Also, get familiar with Hbase, Zookeeper and Sqoop concepts, while working on industry-based, use-cases and projects.

The Big Data and Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.

Introduction

What is Hadoop?
The Hadoop Distributed File System
Hadoop Map Reduce Works
Anatomy of a Hadoop Cluster
Master Daemons
Name node
Job Tracker
Secondary name node
Slave Daemons
Job tracker
Task tracker

HDFS (Hadoop Distributed File System)

Blocks and Splits
Input Splits
HDFS Splits
Data Replication
Hadoop Rack Aware
Data high availability
Data Integrity
Cluster architecture and block placement
Accessing HDFS
JAVA Approach
CLI Approach
Programming Practices
Developing MapReduce Programs
Running without HDFS and Mapreduce
Running all daemons in a single node
Running daemons on dedicated nodes
Local Mode
Pseudo-distributed Mode
Fully distributed mode

Setup Hadoop cluster of Apache, Cloudera and HortonWorks

Make a fully distributed Hadoop cluster on a single laptop/desktop
Name Node in Safe mode
Meta Data Backup
Integrating Kerberos security in hadoop

Writing a MapReduce Program

Examining a Sample MapReduce Program with several examples
Basic API Concepts
The Driver Code
The Mapper
The Reducer
Hadoop's Streaming API

Performing several hadoop jobs

The configure and close Methods
Sequence Files
Record Reader
Record Writer
Role of Reporter
Output Collector
Processing XML files
Counters
Directly Accessing HDFS
ToolRunner
Using The Distributed Cache

Common MapReduce Algorithms

Sorting and Searching
Indexing
Classification/Machine Learning
Term Frequency - Inverse Document Frequency
Word Co-Occurrence
Creating an Inverted Index
Identity Mapper
Identity Reducer
Exploring well known problems using MapReduce applications

Debugging MapReduce Programs

Testing with MRUnit
Logging
Other Debugging Strategies

Advanced MapReduce Programming

A Recap of the MapReduce Flow
The Secondary Sort
Customized Input Formats and Output Formats

Monitoring and debugging on a Production Cluster

Counters
Skipping Bad Records
Rerunning failed tasks with Isolation Runner

Tuning for Performance in MapReduce

Reducing network traffic with combiner
Partitioners
Using Compression
Reusing the JVM
Running with speculative execution
Refactoring code and rewriting algorithms Parameters affecting Performance
Other Performance Aspects

HBase

HBase concepts
HBase architecture
Region server architecture
File storage architecture
HBase basics
Column access
Scans
HBase use cases
Install and configure HBase on a multi node cluster
Create database
Develop and run sample applications
Access data stored in HBase using clients like Java
Python and Pearl
HBase and Hive Integration
HBase admin tasks
Defining Schema and basic operation

Hive

Hive concepts
Hive architecture
Install and configure hive on cluster
Create database
access it from java client
Buckets
Partitions
Joins in hive
Inner joins
Outer Joins
Hive UDF
Hive UDAF
Hive UDTF
Develop and run sample applications in Java/Python to access hive

PIG

Pig basics
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Pig Vs Hive
Write sample Pig Latin scripts
Modes of running PIG
Running in Grunt shell
Programming in Eclipse
Running as Java program
PIG UDFs
Pig Macros

Flume, Chukwa, Avro, Scribe, Thrift

Flume and Chukwa concepts
Use cases of Thrift
Avro and scribe
Install and configure flume on cluster
Create a sample application to capture logs from Apache using flume

CDH4 Enhancements

Name Node High – Availability
Name Node federation
Fencing
YARN

Hadoop Challenges

Hadoop disaster recovery
Hadoop suitable cases

Exercies

Hadoop Project - a real time project where students can practice