- CCA 175 Spark and Hadoop Developer - Curriculum
Online
₹ 449 999
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course and certificate fees
Fees information
certificate availability
certificate providing authority
The syllabus
Introduction
Setting up Environment using AWS Cloud9
- Getting Started with Cloud9
- Creating Cloud9 Environment
- Warming up with Cloud9 IDE
- Overview of EC2 related to Cloud9
- Opening ports for Cloud9 Instance
- Associating Elastic IPs to Cloud9 Instance
- Increase EBS Volume Size of Cloud9 Instance
- Setup Jupyter Lab on Cloud9
- [Commands] Setup Jupyter Lab on Cloud9
Setting up Environment - Overview of GCP and Provision Ubuntu VM
- Signing up for GCP
- Overview of GCP Web Console
- Overview of GCP Pricing
- Provision Ubuntu VM from GCP
- Setup Docker
- Why we are setting up Python and Jupyter Lab for Scala related course?
- Validating Python
- Setup Jupyter Lab
Setup Hadoop on Single Node Cluster
- Introduction to Single Node Hadoop Cluster
- Setup Prerequisties
- [Commands] - Setup Prerequisites
- Setup Password less login
- [Commands] - Setup Password less login
- Download and Install Hadoop
- [Commands] - Download and Install Hadoop
- Configure Hadoop HDFS
- [Commands] - Configure Hadoop HDFS
- Start and Validate HDFS
- [Commands] - Start and Validate HDFS
- Configure Hadoop YARN
- [Commands] - Configure Hadoop YARN
- Start and Validate YARN
- [Commands] - Start and Validate YARN
- Managing Single Node Hadoop
- [Commands] - Managing Single Node Hadoop
Setup Hive and Spark on Single Node Cluster
- Setup Data Sets for Practice
- [Commands] - Setup Data Sets for Practice
- Download and Install Hive
- [Commands] - Download and Install Hive
- Setup Database for Hive Metastore
- [Commands] - Setup Database for Hive Metastore
- Configure and Setup Hive Metastore
- [Commands] - Configure and Setup Hive Metastore
- Launch and Validate Hive
- [Commands] - Launch and Validate Hive
- Scripts to Manage Single Node Cluster
- [Commands] - Scripts to Manage Single Node Cluster
- Download and Install Spark 2
- [Commands] - Download and Install Spark 2
- Configure Spark 2
- [Commands] - Configure Spark 2
- Validate Spark 2 using CLIs
- [Commands] - Validate Spark 2 using CLIs
- Validate Jupyter Lab Setup
- [Commands] - Validate Jupyter Lab Setup
- Intergrate Spark 2 with Jupyter Lab
- [Commands] - Intergrate Spark 2 with Jupyter Lab
- Download and Install Spark 3
- [Commands] - Download and Install Spark 3
- Configure Spark 3
- [Commands] - Configure Spark 3
- Validate Spark 3 using CLIs
- [Commands] - Validate Spark 3 using CLIs
- Intergrate Spark 3 with Jupyter Lab
- [Commands] - Intergrate Spark 3 with Jupyter Lab
Scala Fundamentals
- Introduction and Setting up of Scala
- Setup Scala on Windows
- Basic Programming Constructs
- Functions
- Object Oriented Concepts - Classes
- Object Oriented Concepts - Objects
- Object Oriented Concepts - Case Classes
- Collections - Seq, Set and Map
- Basic Map Reduce Operations
- Setting up Data Sets for Basic I/O Operations
- Basic I/O Operations and using Scala Collections APIs
- Tuples
- Development Cycle - Create Program File
- Development Cycle - Compile source code to jar using SBT
- Development Cycle - Setup SBT on Windows
- Development Cycle - Compile changes and run jar with arguments
- Development Cycle - Setup IntelliJ with Scala
- Development Cycle - Develop Scala application using SBT in IntelliJ
Overview of Hadoop HDFS Commands
- Getting help or usage of HDFS Commands
- Listing HDFS Files
- Managing HDFS Directories
- Copying files from local to HDFS
- Copying files from HDFS to local
- Getting File Metadata
- Previewing Data in HDFS File
- HDFS Block Size
- HDFS Replication Factor
- Getting HDFS Storage Usage
- Using HDFS Stat Commands
- HDFS File Permissions
- Overriding Properties
Apache Spark 2 using Scala - Data Processing - Overview
- Introduction for the module
- Starting Spark Context using spark-shell
- Overview of Spark read APIs
- Previewing Schema and Data using Spark APIs
- Overview of Spark Data Frame APIs
- Overview of Functions to Manipulate Data in Spark Data Frames
- Overview of Spark Write APIs
Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions
- Introduction to Pre-defined Functions
- Creating Spark Session Object in Notebook
- Create Dummy Data Frames for Practice
- Categories of Functions on Spark DAta Frame Columns
- Using Spark Special Functions - col
- Using Spark Special Functions - lit
- Manipulating String Columns using Spark Functions - Case Conversion and Length
- Manipulating String Columns using Spark Functions - substring
- Manipulating String Columns using Spark Functions - split
- Manipulating String Columns using Spark Functions - Concatenating Strings
- Manipulating String Columns using Spark Functions - Padding Strings
- Manipulating String Columns using Spark Functions - Trimming unwanted characters
- Date and Time Functions in Spark - Overview
- Date and Time Functions in Spark - Date Arithmetic
- Date and Time Functions in Spark - Using trunc and date_trunc
- Date and Time Functions in Spark - Using date_format and other functions
- Date and Time Functions in Spark - dealing with unix timestamp
- Pre-defined Functions in Spark - Conclusion
Apache Spark 2 using Scala - Basic Transformations using Data Frame
- Introduction to Basic Transformations using Data Frame APIs
- Starting Spark Context
- Overview of Filtering using Spark Data Frame APIs
- Filtering Data from Spark Data Frames - Reading Data and Understanding Schema
- Filtering Data from Spark Data Frames - Task 1 - Equal Operator
- Filtering Data from Spark Data Frames - Task 2 - Comparison Operators
- Filtering Data from Spark Data Frames - Task 3 - Boolean AND
- Filtering Data from Spark Data Frames - Task 4 - IN Operator
- Filtering Data from Spark Data Frames - Task 5 - Between and Like
- Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter
- Overview of Aggregations using Spark Data Frame APIs
- Overview of Sorting using Spark Data Frame APIs
- Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1
- Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2
- Solution - Getting Delayed Counts By Date using Spark Data Frame APIs
Apache Spark 2 using Scala - Joining Data Sets
- Prepare and Validate Data Sets
- Starting Spark Session or Spark Context
- Analyze Data Sets for Joins using Spark Data Frame APIs
- Eliminate Duplicate records from Data Frame using Spark Data Frame APIs
- Recap of Basic Transformations using Spark Data Frame APIs
- Joining Data Sets using Spark Data Frame APIs - Problem Statements
- Overview of Joins using Spark Data Frame APIs
- Inner Join using Spark Data Fr - Get number of flights departed from US airports
- Inner Join using Spark Data Fram - Get number of flights departed from US States
- Outer Join using Spark Data Frame APIs - Get Aiports - Never Used
Apache Spark 2 using SQL - Getting Started
- Getting Started with Spark SQL - Overview
- Overview of Spark Documentation
- Launching and using Spark SQL CLI
- Overview of Spark SQL Properties
- Running OS Commands using Spark SQL
- Understanding Spark Metastore Warehouse Directory
- Managing Spark Metastore Databases
- Managing Spark Metastore Tables
- Retrieve Metadata of Spark Metastore Tables
- Role of Spark Metastore or Hive Metastore
- Exercise - Getting Started with Spark SQL
Apache Spark 2 using SQL - Basic Transformations
- Basic Transformation using Spark SQL - Introduction
- Spark SQL - Overview
- Define Problem Statement for Basic Transformations using Spark SQL
- Prepare or Create Tables using Spark SQL
- Projecting or Selecting Data using Spark SQL
- Filtering Data using Spark SQL
- Joining Tables using Spark SQL - Inner
- Joining Tables using Spark SQL - Outer
- Aggregating Data using Spark SQL
- Sorting Data using Spark SQL
- Conclusion - Final Solution using Spark SQL
Apache Spark 2 using SQL - Basic DDL and DML
- Introduction to Basic DDL and DML using Spark SQL
- Create Spark Metastore Tables using Spark SQL
- Overview of Data Types for Spark Metastore Table Columns
- Adding Comments to Spark Metastore Tables using Spark SQL
- Loading Data Into Spark Metastore Tables using Spark SQL - Local
- Loading Data Into Spark Metastore Tables using Spark SQL - HDFS
- Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite
- Creating External Tables in Spark Metastore using Spark SQL
- Managed Spark Metastore Tables vs External Spark Metastore Tables
- Overview of Spark Metastore Table File Formats
- Drop Spark Metastore Tables and Databases
- Truncating Spark Metastore Tables
- Exercise - Managed Spark Metastore Tables
Apache Spark 2 using SQL - DML and Partitioning
- Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
- Introduction to Partitioning of Spark Metastore Tables using Spark SQL
- Creating Spark Metastore Tables using Parquet File Format
- Load vs. Insert into Spark Metastore Tables using Spark SQL
- Inserting Data using Stage Spark Metastore Table using Spark SQL
- Creating Partitioned Spark Metastore Tables using Spark SQL
- Adding Partitions to Spark Metastore Tables using Spark SQL
- Loading Data into Partitioned Spark Metastore Tables using Spark SQL
- Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
- Using Dynamic Partition Mode to insert data into Spark Metastore Tables
- Exercise - Partitioned Spark Metastore Tables using Spark SQL
Apache Spark 2 using SQL - Pre-defined Functions
- Introduction - Overview of Spark SQL Functions
- Overview of Pre-defined Functions using Spark SQL
- Validating Functions using Spark SQL
- String Manipulation Functions using Spark SQL
- Date Manipulation Functions using Spark SQL
- Overview of Numeric Functions using Spark SQL
- Data Type Conversion using Spark SQL
- Dealing with Nulls using Spark SQL
- Using CASE and WHEN using Spark SQL
- Query Example - Word Count using Spark SQL
Apache Spark 2 using SQL - Pre-defined Functions - Exercises
- Prepare Users Table using Spark SQL
- Exercise 1 - Get number of users created per year
- Exercise 2 - Get the day name of the birth days of users
- Exercise 3 - Get the names and email ids of users added in the year 2019
- Exercise 4 - Get the number of users by gender
- Exercise 5 - Get last 4 digits of unique ids
- Exercise 6 - Get the count of users based up on country code
Apache Spark 2 using SQL - Windowing Functions
- Introduction to Windowing Functions using Spark SQL
- Prepare HR Database in Spark Metastore using Spark SQL
- Overview of Windowing Functions using Spark SQL
- Aggregations using Windowing Functions using Spark SQL
- LEAD or LAG Functions using Spark SQL
- Getting first and last values using Spark SQL
- Ranking using Windowing Functions in Spark SQL
- Order of execution of Spark SQL Queries
- Overview of Subqueries using Spark SQL
- Filtering Window Function Results using Spark SQL
Sample scenarios with solutions
- Introduction to Sample Scenarios and Solutions
- Problem Statements - General Guidelines
- Initializing the job - General Guidelines
- Getting crime count per type per month - Understanding Data
- Getting crime count per type per month - Implementing the logic - Core API
- Getting crime count per type per month - Implementing the logic - Data Frames
- Getting crime count per type per month - Validating Output
- Get inactive customers - using Core Spark API (leftOuterJoin)
- Get inactive customers - using Data Frames and SQL
- Get top 3 crimes in RESIDENCE - using Core Spark API
- Get top 3 crimes in RESIDENCE - using Data Frame and SQL
- Convert NYSE data from text file format to parquet file format
- Get word count - with custom control arguments, num keys and file format
Instructors
Mr Durga Viswanatha Raju Gadiraju
Technology Adviser
Freelancer