- Why Big Data
- Applications of PySpark
- Introduction to Instructor
- Introduction to Course
- Projects Overview
- Request for Your Honest Review
- Links for the Course's Materials and Codes
Online
₹ 449 3,499
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course and certificate fees
Fees information
₹ 449 ₹3,499
certificate availability
Yes
certificate providing authority
Udemy
The syllabus
Introduction
01-Introduction to Hadoop, Spark EcoSystems and Architectures
- Links for the Course's Materials and Codes
- Why Spark
- Hadoop EcoSystem
- Spark Architecture and EcoSystem
- DataBricks SignUp
- Create DataBricks Notebook
- Download Spark and Dependencies
- Java Setup on Window
- Python Setup on Window
- Spark Setup on Window
- Hadoop Setup on Window
- Runing Spark on Window
- Java Download on MAC
- Installing JDK on MAC
- Setting Java Home on MAC
- Java check on MAC
- Installing Python on MAC
- Setup Spark on MAC
- Which of the following statement is True
- Which of the following is not a part of spark ecosystem?
Spark RDDs
- Links for the Course's Materials and Codes
- Spark RDDs
- Creating Spark RDD
- Running Spark Code Locally
- RDD stands for:
- RDD is created by using:
- RDD Map (Lambda)
- RDD Map (Simple Function)
- Quiz (Map)
- Solution 1 (Map)
- Solution 2 (Map)
- RDD FlatMap
- RDD Filter
- Quiz (Filter)
- Solution (Filter)
- RDD Distinct
- RDD GroupByKey
- RDD ReduceByKey
- Quiz (Word Count)
- Solution (Word Count)
- RDD (Count and CountByValue)
- RDD (saveAsTextFile)
- RDD (Partition)
- Finding Average-1
- Finding Average-2
- Quiz (Average)
- Solution (Average)
- Finding Min and Max
- Quiz (Min and Max)
- Solution (Min and Max)
- Project Overview
- Total Students
- Total Marks by Male and Female Student
- Total Passed and Failed Students
- Total Enrollments per Course
- Total Marks per Course
- Average marks per Course
- Finding Minimum and Maximum marks
- Average Age of Male and Female Students
Spark DFs
- Links for the Course's Materials and Codes
- Introduction to Spark DFs
- Creating Spark DFs
- DF stands for:
- DF is created by using:
- Spark Infer Schema
- Spark Provide Schema
- Create DF from Rdd
- Rectifying the Error
- Select DF Colums
- Spark DF withColumn
- Spark DF withColumnRenamed and Alias
- Spark DF Filter rows
- Quiz (select, withColumn, filter)
- Solution (select, withColumn, filter)
- Spark DF (Count, Distinct, Duplicate)
- Quiz (Distinct, Duplicate)
- Solution (Distinct, Duplicate)
- Spark DF (sort, orderBy)
- Quiz (sort, orderBy)
- Solution (sort, orderBy)
- Spark DF (Group By)
- Spark DF (Group By - Multiple Columns and Aggregations)
- Spark DF (Group By -Visualization)
- Spark DF (Group By - Filtering)
- Quiz (Group By)
- Solution (Group By)
- Quiz (Word Count)
- Solution (Word Count)
- Spark DF (UDFs)
- Quiz (UDFs)
- Solution (UDFs)
- Solution (Cache and Presist)
- Spark DF (DF to RDD)
- Spark DF (Spark SQL)
- Spark DF (Write DF)
- Project Overview
- Project (Count and Select)
- Project (Group By)
- Project (Group By, Aggregations and Order By)
- Project (Filtering)
- Project (UDF and WithColumn)
- Project (Write)
Collaborative filtering
- Links for the Course's Materials and Codes
- Collaborative filtering
- Utility Matrix
- Explicit and Implicit Ratings
- Expected Results
- Dataset
- Joining Dataframes
- Train and Test Data
- ALS model
- Hyperparameter tuning and cross validation
- Best model and evaluate predictions
- Recommendations
Spark Streaming
- Links for the Course's Materials and Codes
- Introduction to Spark Streaming
- Spark Streaming with RDD
- Spark streaming is used to:
- Spark Streaming Context
- Spark Streaming Reading Data
- Spark Streaming Cluster Restart
- Spark Streaming RDD Transformations
- Which statement is true about SparkContext and StreamingContext:
- Spark Streaming DF
- Spark Streaming Display
- Spark Streaming DF Aggregations
ETL Pipeline
- Links for the Course's Materials and Codes
- Introduction to ETL
- We can perform ETL using PySpark:
- ETL stands for:
- ETL pipeline Flow
- Data set
- Extracting Data
- Transforming Data
- Loading data (Creating RDS-I)
- Load data (Creating RDS-II)
- RDS Networking
- Downloading Postgres
- Installing Postgres
- Connect to RDS thorugh PgAdmin
- Loading Data
Project - Change Data Capture / Replication On Going
- Links for the Course's Materials and Codes
- Introduction to Project
- Project Architecture
- In this project we are going to implement:
- The cloud service DMS will be used to:
- Creating RDS MySql instance
- Creating S3 Bucket
- Creating DMS Source Endpoint
- Creating DMS Destination Endpoint
- Creating DMS Instance
- MySql WorkBench
- Connecting with RDS and Dumping Data
- Quering RDS
- DMS Full Load
- DMS Replication Ongoing
- Stoping Instances
- Glue Job (Full Load)
- Glue Job (Change Capture)
- Glue Job (CDC)
- Creating Lambda Function and Adding Trigger
- Checking Trigger
- Getting S3 file name in Lambda
- Creating Glue Job
- Adding Invoke for Glue Job
- Testing Invoke
- Writing Glue Shell Job
- Full Load Pipeline
- Change Data Capture Pipeline