Hadoop Analyst Training

Free
View cart
Hadoop Analyst Training

About SimplyAnalytics

We are the best Training institute for learning Hadoop Analyst Training in Chennai . We have expert trainers and excellent materials to transform your skills to fit into the job market.

About Hadoop Analyst Training Course

  • This training course is a comprehensive study of Big Data Analysis with Hadoop. The course topics include Introduction to Hadoop and its Ecosystem, MapReduce and HDFS, Introduction to Hive, Relational Data Analysis with Hive, Hive Data Management and Optimization. Further, it gives an Introduction to Pig, Basic Data analysis using Pig, Complex data Processing, Multi-Dataset Operations, Introduction to IMPALA, ELT Connectivity with Hadoop Ecosystem.

Learning Objectives

After completion of Hadoop Analyst Training course, you will be able to:

  • 1. Gain a clear understanding of Hadoop and its Ecosystem.
  • 2. Get insight into MapReduce and HDFS.
  • 3. Learn ETL Connectivity with Hadoop.
  • 4. Write Hive and Pig scripts and Work with Sqoop.
  • 5. Understand YARN (MRv2) latest version of Hadoop Release 2.0.
  • 6. Implement HBase, MapReduce Integration, Advanced Usage and Advanced Indexing.
  • 7. Work on a Real-time Project on Big Data Analytics and Gain hands-on Project Experience.
  • 8. Execute linked-in algorithms – Identification of Shortest Path for 1st level or 2nd level connection using MapReduce.
  • 9. Work with Datasets – Twitter data set for sentiment analysis and Loan Dataset.
  • 10. Understand Impala for real-time queries on Hadoop.

Recommended Audience

  • 1. Java Architects, Data Warehouse Developers, SAAS Professionals, Data Analysts.
  • 2. Business Analysts and System Analysts.
  • 3. Professionals and Students aiming to learn latest technology and build career in Big Data using Hadoop.

Pre-Requisites:

  • 1. Some prior experience in any Programming Language and good analytical skills.
  • 2. Basic knowledge of Unix, sql scripting.
  • 3. Prior knowledge of Apache Hadoop is not required.

Module 1 : Introduction to Hadoop and its Ecosystem, Map Reduce and HDFS

  • 1. Big Data, Factors constituting Big Data.
  • 2. Hadoop and Hadoop Ecosystem.
  • 3. Map Reduce -Concepts of Map, Reduce, Ordering, Concurrency, Shuffle, Reducing, Concurrency.
  • 4. Hadoop Distributed File System (HDFS) Concepts and its Importance.
  • 5. Deep Dive in Map Reduce – Execution Framework, Partioner, Combiner, Data Types, Key pairs.
  • 6. HDFS Deep Dive – Architecture, Data Replication, Name Node, Data Node, Data Flow.
  • 7. Parallel Copying with DISTCP, Hadoop Archives.

Module 2 : Hands on Exercises

  • 1. Installing Hadoop in Pseudo Distributed Mode, Understanding Important configuration files, their Properties and Demon Threads
  • 2. Accessing HDFS from Command Line
  • 3. Map Reduce – Basic Exercises
  • 4. Understanding Hadoop Eco-system

1.Introduction to Sqoop, use cases and Installation

2.Introduction to Hive, use cases and Installation

3.Introduction to Pig, use cases and Installation

4.Introduction to Oozie, use cases and Installation

5.Introduction to Flume, use cases and Installation

6.Introduction to Yarn

Mini Project – Importing Mysql Data using Sqoop and Querying it using Hive

Module 3 : Deep Dive in Map Reduce

  • 1. How to develop Map Reduce Application, writing unit test
  • 2. Best Practices for developing and writing, Debugging Map Reduce applications
  • 3. Joining Data sets in Map Reduce

Module 4 : Hive

1. Introduction to Hive

  • 1. What Is Hive?
  • 2. Hive Schema and Data Storage
  • 3. Comparing Hive to Traditional Databases
  • 4. Hive vs. Pig
  • 5. Hive Use Cases
  • 6. Interacting with Hive

2. Relational Data Analysis with Hive

  • 1. Hive Databases and Tables
  • 2. Basic HiveQL Syntax
  • 3. Data Types
  • 4. Joining Data Sets
  • 5. Common Built-in Functions
  • 6. Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

3. Hive Data Management

  • 1. Hive Data Formats
  • 2. Creating Databases and Hive-Managed Tables
  • 3. Loading Data into Hive
  • 4. Altering Databases and Tables
  • 5. Self-Managed Tables
  • 6. Simplifying Queries with Views
  • 7. Storing Query Results
  • 8. Controlling Access to Data
  • 9. Hands-On Exercise: Data Management with Hive

4. Hive Optimization

  • 1. Understanding Query Performance
  • 2. Partitioning
  • 3. Bucketing
  • 4. Indexing Data

5. Extending Hive

  • 1. User-Defined Functions

6. Hands on Exercises – Playing with huge data and Querying extensively.

7. User defined Functions, Optimizing Queries, Tips and Tricks for performance tuning

Module 5 : Pig

1. Introduction to Pig

  • 1. What Is Pig?
  • 2. Pig’s Features
  • 3. Pig Use Cases
  • 4. Interacting with Pig

2. Basic Data Analysis with Pig

  • 1. Pig Latin Syntax
  • 2. Loading Data
  • 3. Simple Data Types
  • 4. Field Definitions
  • 5. Data Output
  • 6. Viewing the Schema
  • 7. Filtering and Sorting Data
  • 8. Commonly-Used Functions
  • 9. Hands-On Exercise: Using Pig for ETL Processing

3. Processing Complex Data with Pig

  • 1. Complex/Nested Data Types
  • 2. Grouping
  • 3. Iterating Grouped Data
  • 4. Hands-On Exercise: Analyzing Data with Pig

4. Multi-Dataset Operations with Pig

  • 1. Techniques for Combining Data Sets
  • 2. Joining Data Sets in Pig
  • 3. Set Operations
  • 4. Splitting Data Sets
  • 5. Hands-On Exercise

5. Extending Pig

  • 1. Macros and Imports
  • 2. UDFs
  • 3. Using Other Languages to Process Data with Pig
  • 4. Hands-On Exercise: Extending Pig with Streaming and UDFs

6. Pig Jobs

Module 6 : Impala

1. Introduction to Impala

  • 1. What is Impala?
  • 2. How Impala Differs from Hive and Pig
  • 3. How Impala Differs from Relational Databases
  • 4. Limitations and Future Directions
  • 5. Using the Impala Shell

2. Choosing the Best (Hive, Pig, Impala)

Module 7 : Major Project – Putting it all together and Connecting Dots

  • 1. Putting it all together and Connecting Dots
  • 2. Working with Large data sets, Steps involved in analyzing large data

Module 8 : ETL Connectivity with Hadoop Ecosystem

  • 1. How ETL tools work in Big data Industry
  • 2. Connecting to HDFS from ETL tool and moving data from Local system to HDFS
  • 3. Moving Data from DBMS to HDFS
  • 4. Working with Hive with ETL Tool
  • 5. Creating Map Reduce job in ETL tool
  • 6. End to End ETL PoC showing Hadoop integration with ETL tool.

Module 9 : Job and Certification Support

  • Major Project, Hadoop Development, cloudera Certification Tips and Guidance and Mock Interview Preparation, Practical Development Tips and Techniques, certification preparation

Why choose SimplyAnalytics for Hadoop Analyst Training in Chennai?

  • 1. 100% Practical and placement oriented training.
  • 2. We are registered training organization.
  • 3. Expert trainers from IT industries.
  • 4. Placements Assistance.
  • 5. Flexible timings.
  • 6. Weekdays and weekend batches.
  • 7. Affordable fees.
  • 8. Air conditioned classroom.
  • 9. Wi-Fi enabled training institute.
  • 10. Best Lab specialities.

Are you located in any of these areas – Adambakkam, Camp Road, Chromepet, Ekkattuthangal, Guindy, kovilambakkam, Madipakkam, Medavakkam, Nanganallur, Navalur, Nungambakkam, OMR, Pallikaranai, Perungudi, Rajakilpakkam, Saidapet, Sholinganallur,Siruseri, St.Thomas Mount, T. Nagar, Tambaram, Tambaram East, Thiruvanmiyur, Thoraipakkam, Velachery, and West Mambalam.

Our Medavakkam office is just few kilometre away from your location. If you need the best Hadoop Analyst Training in Chennai travelling of extra kilometres is worth it .

Related Search Tags: Hadoop Analyst Training in Chennai, Hadoop Analyst Training Institute in Chennai, Hadoop Analyst Training Course in Chennai, Hadoop Analyst Training, Hadoop Analyst Training in Chennai,Hadoop Analyst Training, Hadoop Analyst Training course, Hadoop Analyst Training courses, Hadoop Analyst Training in Chennai Medavakkam.

Course Features

  • Lectures 1
  • Quizzes 1
  • Duration 50 hours
  • Skill level All level
  • Language English
  • Students 0
  • Assessments Self
Curriculum is empty