Bigdata Hadoop Architect Training

Free
View cart
Bigdata Hadoop Architect Training

About SimplyAnalytics

We are the best Training institute for learning Hadoop Architect Training in Chennai . We have expert trainers and excellent materials to transform your skills to fit into the job market .

Module 1: Introduction

  • 1.Introduction to Big Data and Hadoop
  • 2.What is Big Data?
  • 3.Types of Data
  • 4.Need for Big Data
  • 5.Characteristics of Big Data
  • 6.Traditional IT Analytics Approach
  • 7.Big Data—Use Cases
  • 8.Handling Limitations of Big Data
  • 9.Introduction to Hadoop
  • 10.History and Milestones of Hadoop

Module 2: Hadoop Architecture

  • 1.Hadoop Cluster in commodity hardware
    2.Hadoop core services and components
    3.Regular file system vs. Hadoop
    4.HDFS layer e. HDFS operation principle

Module 3: MapReduce

  • 1.Introdution to MapReduce
    2.Hadoop MapReduce example
    3.Hadoop MapReduce Characteristics
    4.Setting up your MapReduce Environment
    5.Building a MapReduce Program
    6.MapReduce Requirements and Features
    7.MapReduce Java Programming in Eclipse
    8.Checking Hadoop Environment for MapReduce
    9.MapReduce

Module 4: Advanced MapReduce

  • 1.Advanced MapReduce
  • 2.Hadoop Data Types
  • 3.InputFormats in MapReduce
  • 4.OutputFormats in MapReduce
  • 5.Distributed Cache
  • 6.Joins in MapReduce
    `

Module 5: PIG

  • 1.Introduction to PIG
  • 2.Components of Pig
  • 3.Pig Data Model
  • 4.Pig Modes
  • 5.Pig Vs. SQL
  • 6.Installing Pig Engine
  • 7.Datasets for Pig Development
  • 8.Pig Latin
  • 9.Filtering and Transforming Data
  • 10.Grouping and Sorting
  • 11.Combining and Splitting
  • 12.Pig Commands

Module 6: HIVE

  • 1.Why another data warehousing system
  • 2.What is HIVE
  • 3.Characteristics of Hive
  • 4.System Architecture and Components of Hive
  • 5.Hive Data Models
  • 6.Serialization/De-serialization
    7.Hive file formats
  • 8.Hive Query Language
  • 9.HIVE: Installing, running, and programming
  • 10.Hive Functions
  • 11.Difference between Hive and PIG

Module 7: ETL Connectivity with Hadoop Ecosystem

  • 1. How ETL tools work in BigData Industry
  • 2. Connecting to HDFS from ETL tool and moving data from Local system to HDFS
  • 3. Moving Data from DBMS to HDFS
  • 4. Working with Hive with ETL Tool
  • 5. Creating Map Reduce job in ETL tool
  • 6. End to End ETL PoC showing Hadoop integration with ETL tool.

Module 8: Advance Flume

  • 1.Apache Flume
  • 2.Big data ecosystem
  • 3.Physically distributed Data sources
  • 4.Changing structure of Data
  • 5.Use case- Log aggregation
  • 6.Adding flume agent
  • 7.Handling a server farm
  • 8.Data volume per agent
  • 9.Example describing a single node flume
  • deployment

Module 9: HBase

  • 1.HBase introduction
  • 2.Characteristics of HBase
  • 3.HBase Architecture
  • 4.Storage Model of HBase
  • 5.When to use HBase
  • 6.HBase Data Model
  • 7.HBase Families
  • 8.HBase Components
  • 9.Row Distribution between region servers
  • 10. Data Storage
  • 11.Installation of HBase
  • 12.Configuration of HBase
  • 13.HBase Shell Commands

Module 10: Hadoop Stack Integration Testing

  • 1.Why Hadoop testing is important
  • 2.Unit testing
  • 3.Integration testing
  • 4.Performance testing
  • 5.Diagnostics
  • 6.Nightly QA test
  • 7.Benchmark and end to end tests
  • 8.Functional testing
  • 9.Release certification testing
  • 10.Security testing
  • 11.Scalability Testing
  • 12.Reliability testing
  • 13.Release testing

Module 11: Why Spark?

  • 1.What is Spark
  • 2.Comparison with Hadoop
  • 3.Components of Spark

Module 12: Spark Persistence in Spark

  • 1.Persistence
  • 2.Motivation
  • 3.Example
  • 4.Transformation
  • 5.Scala and Python
  • 6.Examples – K-means
  • 7.Latent Dirichlet Allocation (LDA)

Module 13: Broadcast and accumulator

  • 1.Motivation
  • 2.Broadcast Variables
  • 3.Example: Join
  • 4.Alternative if one table is small
  • 5.Better version with broadcast
  • 6.How to create a Broadcast
  • 7.Accumulators motivation
  • 8.Example: Join
  • 9.Accumulator Rules
  • 10.Custom accumulators
  • 11.Another common use
  • 12.Creating an accumulator using spark context object

Module 14: Spark SQL and RDD

  • 1.Introduction
  • 2.Spark SQL main capabilities
  • 3.Spark SQL usage diagram
  • 4.Spark SQL
  • 5.Important topics in Spark SQL- Data frames

Module 15: Kafka

  • 1. Understand Kafka and its components
  • 2. Set up an end to end Kafka cluster
  • 3. Integrating Kafka with real time streaming systems
  • 4. Designing a high throughput messaging system
  • 5. Use Kafka to produce and consume
  • 6. Understanding the insights of Kafka API
  • 7. Work on a real life Project, implementing
  • Twitter streaming with Kafka, Hadoop and; Storm

Module 16: NoSQL Introduction

Requirement of NOSQL

  • 1.Database Type
  • 2.OLTP
  • 3.OLAP
  • 4.NOSQL
  • 5.Type of NOSQL Database
  • 6.Challenges with RDBMS
  • 7.Why NOSQL
  • 8.ACID property
  • 9.CAP Theorem
  • 10.Base property
  • 11.Introduction to Json/ Bson
  • 12.Json Data types
  • 13.Database collection and document
  • 14.MongoDB use cases
  • 15.Unacknowledged
  • 16.Acknowledged
  • 17.Juurnaled
  • 18.Fsynced
  • 19.Repica Acknowledged

Module 17: MongoDB Security

  • 1.Security Risks to Databases
  • 2.MongoDB Security Approach
  • 3.MongoDB Security Concept
  • 4.Access Control
  • 5.Integration with MongoDB with Robomongo

Module 18: CRUD Operations

  • 1.MongoDB crud Tutorial
  • 2.json its syntax
  • 3.CRUD Introduction,
  • 4.Read and Write Operations
  • 5.Write Operation Concern Levels
  • 6.MongoDB CRUD Tutorials
  • 7.MongoDB CRUD Reference
  • 8.Hands on with CRUD Operations
  • Module 19: Mini Projects
  • Project 1. List the items
  • Project 2. Sorting of Records
  • Project 3. Show a histogram of date vs users created. Optionally, use a rich visualization like
  • Project 4. Prepare a map of tags vs # of questions in each tag and display it.

Module 20: Major Projects

  • Project 1 Movie Recommendation
  • Project 2 Twitter API Integration for tweet Analysis
  • Project 3 Data Exploration Using Spark SQL – Wikipedia dataset

Module 21: Use Cases – I

  • 1. Sqoop Hive Use Case Example
  • 2. Custom simple eval UDFs in Pig and Hive
  • 3. Custom Generic UDF in Hive
  • 4. Sqoop Importing MySQL Data into HDFS
  • 5. Sqoop Import Command Arguments
  • 6. Sqoop Export Commands
  • 7. Flume Data Collection into HBase
  • 8. RDBMS to MapReduce
  • 9. Graph Algorithm MapReduce
  • 10. Spatial Join

Module 22: Use Cases – II

  • 1. Hive ORC MR
  • 2. Distributed Indexing MapReduce
  • 3. Avro to Parquet
  • 4. GeoEnrichment
  • 5. Pattern Matching
  • 6. JSON process using MapReduce
  • 7. Inverted Indexing
  • 8. MapReduce Design Patterns
  • 9. Pig Design Patterns
  • 10. Integration of Cascading and Apache Hive

Module 23: Use Cases – III

  • 1. XML Data Processing Using Hadoop
  • 2. Load Functions in Pig
  • 3. Built-in Load Store Functions in Pig
  • 4. Processing Logs in Pig
  • 5. HBase Integration with Hadoop Hive
  • 6. Sqoop Hive Use Case Example
  • 7. Custom simple eval UDFs in Pig and Hive
  • 8. Custom Generic UDF in Hive
  • 9. Sqoop Importing MySQL Data into HDFS
  • 10. Sqoop Import Command Arguments

Module 24: Use Cases – IV

  • 1. Sqoop Export Commands
  • 2. Flume Data Collection into HBase
  • 3. RDBMS to MapReduce
  • 4. Graph Algorithm MapReduce
  • 5. Spatial Join
  • 6. Hive ORC MR
  • 7. Distributed Indexing MapReduce
  • 8. Avro to Parquet
  • 9. GeoEnrichment
  • 10. Pattern Matching

Module 25: Use Cases – II

  • 1. JSON process using MapReduce
  • 2. Inverted Indexing
  • 3. MapReduce Design Patterns
  • 4. Pig Design Patterns
  • 5. Integration of Cascading and Apache Hive
  • 6. XML Serializer/Deserializer or Apache Hive
  • 7. A collection of user defined functions (UDFs) for Apache Hive
  • 8. Process your tweets in Hive
  • 9. Combiner using MapReduce
  • 10. Twitter Data process using Hive

Technology Stack

  • 1. Hadoop Ecosystem
  • 2. Sqoop
  • 3. Hive
  • 4. Pig
  • 5. RDBMS
  • 6. Kafka
  • 7.Spark
  • 8. MongoDB
  • 9. HBase
  • 10.Java

Course Features

  • Lectures 1
  • Quizzes 1
  • Duration 50 hours
  • Skill level All level
  • Language English
  • Students 0
  • Assessments Self
Curriculum is empty