Become an expert in Hadoop by acquiring knowledge various concepts of Big Data Hadoop Such as MapReduce, Hadoop architecture, Pig & Hive, Oozie, Flume and Apache workflow scheduler. Also, get familiar with Hbase, Zookeeper and Sqoop concepts, while working on industry-based, use-cases and projects.
The Big Data and Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry use-cases.
Introduction
- What is Hadoop?
- The Hadoop Distributed File System
- Hadoop Map Reduce Works
- Anatomy of a Hadoop Cluster
- Master Daemons
- Name node
- Job Tracker
- Secondary name node
- Slave Daemons
- Job tracker
- Task tracker
HDFS (Hadoop Distributed File System)
- Blocks and Splits
- Input Splits
- HDFS Splits
- Data Replication
- Hadoop Rack Aware
- Data high availability
- Data Integrity
- Cluster architecture and block placement
- Accessing HDFS
- JAVA Approach
- CLI Approach
- Programming Practices
- Developing MapReduce Programs
- Running without HDFS and Mapreduce
- Running all daemons in a single node
- Running daemons on dedicated nodes
- Local Mode
- Pseudo-distributed Mode
- Fully distributed mode
Setup Hadoop cluster of Apache, Cloudera and HortonWorks
- Make a fully distributed Hadoop cluster on a single laptop/desktop
- Name Node in Safe mode
- Meta Data Backup
- Integrating Kerberos security in hadoop
Writing a MapReduce Program
- Examining a Sample MapReduce Program with several examples
- Basic API Concepts
- The Driver Code
- The Mapper
- The Reducer
- Hadoop's Streaming API
Performing several hadoop jobs
- The configure and close Methods
- Sequence Files
- Record Reader
- Record Writer
- Role of Reporter
- Output Collector
- Processing XML files
- Counters
- Directly Accessing HDFS
- ToolRunner
- Using The Distributed Cache
Common MapReduce Algorithms
- Sorting and Searching
- Indexing
- Classification/Machine Learning
- Term Frequency - Inverse Document Frequency
- Word Co-Occurrence
- Creating an Inverted Index
- Identity Mapper
- Identity Reducer
- Exploring well known problems using MapReduce applications
Debugging MapReduce Programs
- Testing with MRUnit
- Logging
- Other Debugging Strategies
Advanced MapReduce Programming
- A Recap of the MapReduce Flow
- The Secondary Sort
- Customized Input Formats and Output Formats
Monitoring and debugging on a Production Cluster
- Counters
- Skipping Bad Records
- Rerunning failed tasks with Isolation Runner
Tuning for Performance in MapReduce
- Reducing network traffic with combiner
- Partitioners
- Using Compression
- Reusing the JVM
- Running with speculative execution
- Refactoring code and rewriting algorithms Parameters affecting Performance
- Other Performance Aspects
HBase
- HBase concepts
- HBase architecture
- Region server architecture
- File storage architecture
- HBase basics
- Column access
- Scans
- HBase use cases
- Install and configure HBase on a multi node cluster
- Create database
- Develop and run sample applications
- Access data stored in HBase using clients like Java
- Python and Pearl
- HBase and Hive Integration
- HBase admin tasks
- Defining Schema and basic operation
Hive
- Hive concepts
- Hive architecture
- Install and configure hive on cluster
- Create database
- access it from java client
- Buckets
- Partitions
- Joins in hive
- Inner joins
- Outer Joins
- Hive UDF
- Hive UDAF
- Hive UDTF
- Develop and run sample applications in Java/Python to access hive
PIG
- Pig basics
- Install and configure PIG on a cluster
- PIG Vs MapReduce and SQL
- Pig Vs Hive
- Write sample Pig Latin scripts
- Modes of running PIG
- Running in Grunt shell
- Programming in Eclipse
- Running as Java program
- PIG UDFs
- Pig Macros
Flume, Chukwa, Avro, Scribe, Thrift
- Flume and Chukwa concepts
- Use cases of Thrift
- Avro and scribe
- Install and configure flume on cluster
- Create a sample application to capture logs from Apache using flume
CDH4 Enhancements
- Name Node High – Availability
- Name Node federation
- Fencing
- YARN
Hadoop Challenges
- Hadoop disaster recovery
- Hadoop suitable cases
Exercies
- Hadoop Project - a real time project where students can practice
Sign up here with your email
ConversionConversion EmoticonEmoticon