PySpark & AWS: Master Big Data With PySpark and AWS

Udemy

Online

₹ 599 4099

particular

details

                                    Medium of instructions
                                    English

                                    Mode of learning
                                    Self study

                                    Mode of Delivery
                                    Video and Text Based

Introduction

Why Big Data
Applications of PySpark
Introduction to Instructor
Introduction to Course
Projects Overview
Request for Your Honest Review
Links for the Course's Materials and Codes

01-Introduction to Hadoop, Spark EcoSystems and Architectures

Links for the Course's Materials and Codes
Why Spark
Hadoop EcoSystem
Spark Architecture and EcoSystem
DataBricks SignUp
Create DataBricks Notebook
Download Spark and Dependencies
Java Setup on Window
Python Setup on Window
Spark Setup on Window
Hadoop Setup on Window
Runing Spark on Window
Java Download on MAC
Installing JDK on MAC
Setting Java Home on MAC
Java check on MAC
Installing Python on MAC
Setup Spark on MAC
Which of the following statement is True
Which of the following is not a part of spark ecosystem?

Spark RDDs

Links for the Course's Materials and Codes
Spark RDDs
Creating Spark RDD
Running Spark Code Locally
RDD stands for:
RDD is created by using:
RDD Map (Lambda)
RDD Map (Simple Function)
Quiz (Map)
Solution 1 (Map)
Solution 2 (Map)
RDD FlatMap
RDD Filter
Quiz (Filter)
Solution (Filter)
RDD Distinct
RDD GroupByKey
RDD ReduceByKey
Quiz (Word Count)
Solution (Word Count)
RDD (Count and CountByValue)
RDD (saveAsTextFile)
RDD (Partition)
Finding Average-1
Finding Average-2
Quiz (Average)
Solution (Average)
Finding Min and Max
Quiz (Min and Max)
Solution (Min and Max)
Project Overview
Total Students
Total Marks by Male and Female Student
Total Passed and Failed Students
Total Enrollments per Course
Total Marks per Course
Average marks per Course
Finding Minimum and Maximum marks
Average Age of Male and Female Students

Spark DFs

Links for the Course's Materials and Codes
Introduction to Spark DFs
Creating Spark DFs
DF stands for:
DF is created by using:
Spark Infer Schema
Spark Provide Schema
Create DF from Rdd
Rectifying the Error
Select DF Colums
Spark DF withColumn
Spark DF withColumnRenamed and Alias
Spark DF Filter rows
Quiz (select, withColumn, filter)
Solution (select, withColumn, filter)
Spark DF (Count, Distinct, Duplicate)
Quiz (Distinct, Duplicate)
Solution (Distinct, Duplicate)
Spark DF (sort, orderBy)
Quiz (sort, orderBy)
Solution (sort, orderBy)
Spark DF (Group By)
Spark DF (Group By - Multiple Columns and Aggregations)
Spark DF (Group By -Visualization)
Spark DF (Group By - Filtering)
Quiz (Group By)
Solution (Group By)
Quiz (Word Count)
Solution (Word Count)
Spark DF (UDFs)
Quiz (UDFs)
Solution (UDFs)
Solution (Cache and Presist)
Spark DF (DF to RDD)
Spark DF (Spark SQL)
Spark DF (Write DF)
Project Overview
Project (Count and Select)
Project (Group By)
Project (Group By, Aggregations and Order By)
Project (Filtering)
Project (UDF and WithColumn)
Project (Write)

Collaborative filtering

Links for the Course's Materials and Codes
Collaborative filtering
Utility Matrix
Explicit and Implicit Ratings
Expected Results
Dataset
Joining Dataframes
Train and Test Data
ALS model
Hyperparameter tuning and cross validation
Best model and evaluate predictions
Recommendations

Spark Streaming

Links for the Course's Materials and Codes
Introduction to Spark Streaming
Spark Streaming with RDD
Spark streaming is used to:
Spark Streaming Context
Spark Streaming Reading Data
Spark Streaming Cluster Restart
Spark Streaming RDD Transformations
Which statement is true about SparkContext and StreamingContext:
Spark Streaming DF
Spark Streaming Display
Spark Streaming DF Aggregations

ETL Pipeline

Links for the Course's Materials and Codes
Introduction to ETL
We can perform ETL using PySpark:
ETL stands for:
ETL pipeline Flow
Data set
Extracting Data
Transforming Data
Loading data (Creating RDS-I)
Load data (Creating RDS-II)
RDS Networking
Downloading Postgres
Installing Postgres
Connect to RDS thorugh PgAdmin
Loading Data

Project - Change Data Capture / Replication On Going

Links for the Course's Materials and Codes
Introduction to Project
Project Architecture
In this project we are going to implement:
The cloud service DMS will be used to:
Creating RDS MySql instance
Creating S3 Bucket
Creating DMS Source Endpoint
Creating DMS Destination Endpoint
Creating DMS Instance
MySql WorkBench
Connecting with RDS and Dumping Data
Quering RDS
DMS Full Load
DMS Replication Ongoing
Stoping Instances
Glue Job (Full Load)
Glue Job (Change Capture)
Glue Job (CDC)
Creating Lambda Function and Adding Trigger
Checking Trigger
Getting S3 file name in Lambda
Creating Glue Job
Adding Invoke for Glue Job
Testing Invoke
Writing Glue Shell Job
Full Load Pipeline
Change Data Capture Pipeline

Popular Courses

Popular Platforms

Popular Searches

PySpark & AWS: Master Big Data With PySpark and AWS

Online

₹ 599 4099

Quick Facts

Course and certificate fees

Fees information

certificate availability

certificate providing authority

The syllabus

Introduction

01-Introduction to Hadoop, Spark EcoSystems and Architectures

Spark RDDs

Spark DFs

Collaborative filtering

Spark Streaming

ETL Pipeline

Project - Change Data Capture / Replication On Going

Articles

Popular Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Download the Careers360 App on your Android phone

Popular Searches

PySpark & AWS: Master Big Data With PySpark and AWS

Online

₹ 599 4099

Quick Facts

Course and certificate fees

Fees information

certificate availability

certificate providing authority

The syllabus

Introduction

01-Introduction to Hadoop, Spark EcoSystems and Architectures

Spark RDDs

Spark DFs

Collaborative filtering

Spark Streaming

ETL Pipeline

Project - Change Data Capture / Replication On Going

Articles

Popular Articles

Trending Courses

Popular Courses

Popular Platforms

Learn more about the Courses

Thank You!

Download the Careers360 App on your Android phone