Introduction

CCA 175 Spark and Hadoop Developer - Curriculum

Setting up Environment using AWS Cloud9

Getting Started with Cloud9
Creating Cloud9 Environment
Warming up with Cloud9 IDE
Overview of EC2 related to Cloud9
Opening ports for Cloud9 Instance
Associating Elastic IPs to Cloud9 Instance
Increase EBS Volume Size of Cloud9 Instance
Setup Jupyter Lab on Cloud9
[Commands] Setup Jupyter Lab on Cloud9

Setting up Environment - Overview of GCP and Provision Ubuntu VM

Signing up for GCP
Overview of GCP Web Console
Overview of GCP Pricing
Provision Ubuntu VM from GCP
Setup Docker
Why we are setting up Python and Jupyter Lab for Scala related course?
Validating Python
Setup Jupyter Lab

Setup Hadoop on Single Node Cluster

Introduction to Single Node Hadoop Cluster
Setup Prerequisties
[Commands] - Setup Prerequisites
Setup Password less login
[Commands] - Setup Password less login
Download and Install Hadoop
[Commands] - Download and Install Hadoop
Configure Hadoop HDFS
[Commands] - Configure Hadoop HDFS
Start and Validate HDFS
[Commands] - Start and Validate HDFS
Configure Hadoop YARN
[Commands] - Configure Hadoop YARN
Start and Validate YARN
[Commands] - Start and Validate YARN
Managing Single Node Hadoop
[Commands] - Managing Single Node Hadoop

Setup Hive and Spark on Single Node Cluster

Setup Data Sets for Practice
[Commands] - Setup Data Sets for Practice
Download and Install Hive
[Commands] - Download and Install Hive
Setup Database for Hive Metastore
[Commands] - Setup Database for Hive Metastore
Configure and Setup Hive Metastore
[Commands] - Configure and Setup Hive Metastore
Launch and Validate Hive
[Commands] - Launch and Validate Hive
Scripts to Manage Single Node Cluster
[Commands] - Scripts to Manage Single Node Cluster
Download and Install Spark 2
[Commands] - Download and Install Spark 2
Configure Spark 2
[Commands] - Configure Spark 2
Validate Spark 2 using CLIs
[Commands] - Validate Spark 2 using CLIs
Validate Jupyter Lab Setup
[Commands] - Validate Jupyter Lab Setup
Intergrate Spark 2 with Jupyter Lab
[Commands] - Intergrate Spark 2 with Jupyter Lab
Download and Install Spark 3
[Commands] - Download and Install Spark 3
Configure Spark 3
[Commands] - Configure Spark 3
Validate Spark 3 using CLIs
[Commands] - Validate Spark 3 using CLIs
Intergrate Spark 3 with Jupyter Lab
[Commands] - Intergrate Spark 3 with Jupyter Lab

Scala Fundamentals

Introduction and Setting up of Scala
Setup Scala on Windows
Basic Programming Constructs
Functions
Object Oriented Concepts - Classes
Object Oriented Concepts - Objects
Object Oriented Concepts - Case Classes
Collections - Seq, Set and Map
Basic Map Reduce Operations
Setting up Data Sets for Basic I/O Operations
Basic I/O Operations and using Scala Collections APIs
Tuples
Development Cycle - Create Program File
Development Cycle - Compile source code to jar using SBT
Development Cycle - Setup SBT on Windows
Development Cycle - Compile changes and run jar with arguments
Development Cycle - Setup IntelliJ with Scala
Development Cycle - Develop Scala application using SBT in IntelliJ

Overview of Hadoop HDFS Commands

Getting help or usage of HDFS Commands
Listing HDFS Files
Managing HDFS Directories
Copying files from local to HDFS
Copying files from HDFS to local
Getting File Metadata
Previewing Data in HDFS File
HDFS Block Size
HDFS Replication Factor
Getting HDFS Storage Usage
Using HDFS Stat Commands
HDFS File Permissions
Overriding Properties

Apache Spark 2 using Scala - Data Processing - Overview

Introduction for the module
Starting Spark Context using spark-shell
Overview of Spark read APIs
Previewing Schema and Data using Spark APIs
Overview of Spark Data Frame APIs
Overview of Functions to Manipulate Data in Spark Data Frames
Overview of Spark Write APIs

Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

Introduction to Pre-defined Functions
Creating Spark Session Object in Notebook
Create Dummy Data Frames for Practice
Categories of Functions on Spark DAta Frame Columns
Using Spark Special Functions - col
Using Spark Special Functions - lit
Manipulating String Columns using Spark Functions - Case Conversion and Length
Manipulating String Columns using Spark Functions - substring
Manipulating String Columns using Spark Functions - split
Manipulating String Columns using Spark Functions - Concatenating Strings
Manipulating String Columns using Spark Functions - Padding Strings
Manipulating String Columns using Spark Functions - Trimming unwanted characters
Date and Time Functions in Spark - Overview
Date and Time Functions in Spark - Date Arithmetic
Date and Time Functions in Spark - Using trunc and date_trunc
Date and Time Functions in Spark - Using date_format and other functions
Date and Time Functions in Spark - dealing with unix timestamp
Pre-defined Functions in Spark - Conclusion

Apache Spark 2 using Scala - Basic Transformations using Data Frame

Introduction to Basic Transformations using Data Frame APIs
Starting Spark Context
Overview of Filtering using Spark Data Frame APIs
Filtering Data from Spark Data Frames - Reading Data and Understanding Schema
Filtering Data from Spark Data Frames - Task 1 - Equal Operator
Filtering Data from Spark Data Frames - Task 2 - Comparison Operators
Filtering Data from Spark Data Frames - Task 3 - Boolean AND
Filtering Data from Spark Data Frames - Task 4 - IN Operator
Filtering Data from Spark Data Frames - Task 5 - Between and Like
Filtering Data from Spark Data Frames - Task 6 - Using functions in Filter
Overview of Aggregations using Spark Data Frame APIs
Overview of Sorting using Spark Data Frame APIs
Solution - Get Delayed Counts using Spark Data Frame APIs - Part 1
Solution - Get Delayed Counts using Spark Data Frame APIs - Part 2
Solution - Getting Delayed Counts By Date using Spark Data Frame APIs

Apache Spark 2 using Scala - Joining Data Sets

Prepare and Validate Data Sets
Starting Spark Session or Spark Context
Analyze Data Sets for Joins using Spark Data Frame APIs
Eliminate Duplicate records from Data Frame using Spark Data Frame APIs
Recap of Basic Transformations using Spark Data Frame APIs
Joining Data Sets using Spark Data Frame APIs - Problem Statements
Overview of Joins using Spark Data Frame APIs
Inner Join using Spark Data Fr - Get number of flights departed from US airports
Inner Join using Spark Data Fram - Get number of flights departed from US States
Outer Join using Spark Data Frame APIs - Get Aiports - Never Used

Apache Spark 2 using SQL - Getting Started

Getting Started with Spark SQL - Overview
Overview of Spark Documentation
Launching and using Spark SQL CLI
Overview of Spark SQL Properties
Running OS Commands using Spark SQL
Understanding Spark Metastore Warehouse Directory
Managing Spark Metastore Databases
Managing Spark Metastore Tables
Retrieve Metadata of Spark Metastore Tables
Role of Spark Metastore or Hive Metastore
Exercise - Getting Started with Spark SQL

Apache Spark 2 using SQL - Basic Transformations

Basic Transformation using Spark SQL - Introduction
Spark SQL - Overview
Define Problem Statement for Basic Transformations using Spark SQL
Prepare or Create Tables using Spark SQL
Projecting or Selecting Data using Spark SQL
Filtering Data using Spark SQL
Joining Tables using Spark SQL - Inner
Joining Tables using Spark SQL - Outer
Aggregating Data using Spark SQL
Sorting Data using Spark SQL
Conclusion - Final Solution using Spark SQL

Apache Spark 2 using SQL - Basic DDL and DML

Introduction to Basic DDL and DML using Spark SQL
Create Spark Metastore Tables using Spark SQL
Overview of Data Types for Spark Metastore Table Columns
Adding Comments to Spark Metastore Tables using Spark SQL
Loading Data Into Spark Metastore Tables using Spark SQL - Local
Loading Data Into Spark Metastore Tables using Spark SQL - HDFS
Loading Data into Spark Metastore Tables using Spark SQL - Append and Overwrite
Creating External Tables in Spark Metastore using Spark SQL
Managed Spark Metastore Tables vs External Spark Metastore Tables
Overview of Spark Metastore Table File Formats
Drop Spark Metastore Tables and Databases
Truncating Spark Metastore Tables
Exercise - Managed Spark Metastore Tables

Apache Spark 2 using SQL - DML and Partitioning

Introduction to DML and Partitioning of Spark Metastore Tables using Spark SQL
Introduction to Partitioning of Spark Metastore Tables using Spark SQL
Creating Spark Metastore Tables using Parquet File Format
Load vs. Insert into Spark Metastore Tables using Spark SQL
Inserting Data using Stage Spark Metastore Table using Spark SQL
Creating Partitioned Spark Metastore Tables using Spark SQL
Adding Partitions to Spark Metastore Tables using Spark SQL
Loading Data into Partitioned Spark Metastore Tables using Spark SQL
Inserting Data into Partitions of Spark Metastore Tables using Spark SQL
Using Dynamic Partition Mode to insert data into Spark Metastore Tables
Exercise - Partitioned Spark Metastore Tables using Spark SQL

Apache Spark 2 using SQL - Pre-defined Functions

Introduction - Overview of Spark SQL Functions
Overview of Pre-defined Functions using Spark SQL
Validating Functions using Spark SQL
String Manipulation Functions using Spark SQL
Date Manipulation Functions using Spark SQL
Overview of Numeric Functions using Spark SQL
Data Type Conversion using Spark SQL
Dealing with Nulls using Spark SQL
Using CASE and WHEN using Spark SQL
Query Example - Word Count using Spark SQL

Apache Spark 2 using SQL - Pre-defined Functions - Exercises

Prepare Users Table using Spark SQL
Exercise 1 - Get number of users created per year
Exercise 2 - Get the day name of the birth days of users
Exercise 3 - Get the names and email ids of users added in the year 2019
Exercise 4 - Get the number of users by gender
Exercise 5 - Get last 4 digits of unique ids
Exercise 6 - Get the count of users based up on country code

Apache Spark 2 using SQL - Windowing Functions

Introduction to Windowing Functions using Spark SQL
Prepare HR Database in Spark Metastore using Spark SQL
Overview of Windowing Functions using Spark SQL
Aggregations using Windowing Functions using Spark SQL
LEAD or LAG Functions using Spark SQL
Getting first and last values using Spark SQL
Ranking using Windowing Functions in Spark SQL
Order of execution of Spark SQL Queries
Overview of Subqueries using Spark SQL
Filtering Window Function Results using Spark SQL

Sample scenarios with solutions

Introduction to Sample Scenarios and Solutions
Problem Statements - General Guidelines
Initializing the job - General Guidelines
Getting crime count per type per month - Understanding Data
Getting crime count per type per month - Implementing the logic - Core API
Getting crime count per type per month - Implementing the logic - Data Frames
Getting crime count per type per month - Validating Output
Get inactive customers - using Core Spark API (leftOuterJoin)
Get inactive customers - using Data Frames and SQL
Get top 3 crimes in RESIDENCE - using Core Spark API
Get top 3 crimes in RESIDENCE - using Data Frame and SQL
Convert NYSE data from text file format to parquet file format
Get word count - with custom control arguments, num keys and file format

Popular Searches

Spark SQL and Spark 3 using Scala (Formerly CCA175)

Online

₹ 3299

Quick Facts

Course and certificate fees

Fees information

certificate availability

certificate providing authority

The syllabus

Introduction

Setting up Environment using AWS Cloud9

Setting up Environment - Overview of GCP and Provision Ubuntu VM

Setup Hadoop on Single Node Cluster

Setup Hive and Spark on Single Node Cluster

Scala Fundamentals

Overview of Hadoop HDFS Commands

Apache Spark 2 using Scala - Data Processing - Overview

Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

Apache Spark 2 using Scala - Basic Transformations using Data Frame

Apache Spark 2 using Scala - Joining Data Sets

Apache Spark 2 using SQL - Getting Started

Apache Spark 2 using SQL - Basic Transformations

Apache Spark 2 using SQL - Basic DDL and DML

Apache Spark 2 using SQL - DML and Partitioning

Apache Spark 2 using SQL - Pre-defined Functions

Apache Spark 2 using SQL - Pre-defined Functions - Exercises

Apache Spark 2 using SQL - Windowing Functions

Sample scenarios with solutions

Instructors

Articles

Popular Articles

Latest Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download Careers360 App

All this at the convenience of your phone

Popular Searches

Spark SQL and Spark 3 using Scala (Formerly CCA175)

Online

₹ 3299

Quick Facts

Course and certificate fees

Fees information

certificate availability

certificate providing authority

The syllabus

Introduction

Setting up Environment using AWS Cloud9

Setting up Environment - Overview of GCP and Provision Ubuntu VM

Setup Hadoop on Single Node Cluster

Setup Hive and Spark on Single Node Cluster

Scala Fundamentals

Overview of Hadoop HDFS Commands

Apache Spark 2 using Scala - Data Processing - Overview

Apache Spark 2 using Scala - Processing Column Data using Pre-defined Functions

Apache Spark 2 using Scala - Basic Transformations using Data Frame

Apache Spark 2 using Scala - Joining Data Sets

Apache Spark 2 using SQL - Getting Started

Apache Spark 2 using SQL - Basic Transformations

Apache Spark 2 using SQL - Basic DDL and DML

Apache Spark 2 using SQL - DML and Partitioning

Apache Spark 2 using SQL - Pre-defined Functions

Apache Spark 2 using SQL - Pre-defined Functions - Exercises

Apache Spark 2 using SQL - Windowing Functions

Sample scenarios with solutions

Instructors

Articles

Popular Articles

Latest Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Thank You!

Download Careers360 App

All this at the convenience of your phone