- Introduction
- The Spark Architecture
- The Spark Unified Stack
- Java Installation
- Hadoop Installation
- Python Installation
- PySpark Installation
- Install Microsoft Buid Tools
- Mac OS - Java Installation
- Mac OS - Python Installation
- Mac OS - PySpark Installation
- Mac OS - Testing the Spark Installation
- Install Jupyter Notebooks
- The Spark Web UI
- Section Summary
Online
₹ 449 799
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
The Apache Spark 3.0.0 release begins the 3. x series. Apache Spark 3.0 expands on many of the technological advances made in Spark 2. x, introducing fresh concepts while also continuing long-term projects in development. The Apache Spark 3 for Data Engineering & Analytics with Python certification course was designed by David Charles Academy - Senior Big Data Engineer & Consultant at ABN AMRO and is available on Udemy for individuals interested in learning how to use Apache Spark for data engineering and data analytics with Python.
Apache Spark 3 for Data Engineering & Analytics with Python online classes incorporates more than 8 hours of prerecorded lectures supported by 12 downloadable resources and 4 articles aimed at providing individuals with a deeper understanding of managing data across a cluster using Spark. The Apache Spark 3 for Data Engineering & Analytics with Python online course covers data analytics, Spark transformation, Spark execution, data engineering, and data visualization, as well as analytical processing strategies deployed across significant data clusters.
The highlights
- Certificate of completion
- Self-paced course
- 8.5 hours of pre-recorded video content
- 4 articles
- 12 downloadable resources
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
certificate providing authority
What you will learn
After completing the Apache Spark 3 for Data Engineering & Analytics with Python online certification, individuals will gain insight into the principles of Apache Spark as well as will acquire the knowledge of the functionalities of Spark 3 and Python for data engineering and data analytics operations. In this Apache Spark course, individuals will explore the fundamentals associated with Spark SQL, Spark transformation, Spark actions, Spark execution, Spark DataFrame API, and Spark Web UI. In this Apache Spark certification, individuals will learn about the resilient distributed datasets and APIs as well as will acquire the skills to interpret Spark Web UI and directed acyclic graphs for Spark execution. Individuals will also learn about the strategies to visualize data including dashboards and graphs on Databricks.
The syllabus
Introduction to Spark and Installation
Spark Execution Concepts
- Section Introduction
- Spark Application and Session
- Spark Transformations and Actions Part 1
- Spark Transformations and Actions Part 2
- DAG Visualisation
RDD Crash Course
- Introduction to RDDs
- Data Preparation
- Distince and Filter Transformations
- Map and Flat Map Transformations
- SortByKey Transformations
- RDD Actions
- Challenge - Convert Fahrenheit to Centigrade
- Challenge - XYZ Research
- XYZ Research
- Challenge - XYZ Research Part 1
- Challenge XYZ Research Part 2
Structured API - Spark DataFrame
- Structured APIs Introduction
- Preparing the Project Folder
- PySpark DataFrame, Schema and DataTypes
- DataFrame Reader and Writer
- Challenge Part 1 - Brief
- Challenge Part 1
- Challenge Part 1 - Data Preparation
- Working with Structured Operations
- Managing Performance Errors
- Reading a JSON File
- Columns and Expressions
- Filter and Where Conditions
- Distinct Drop Duplicates Order By
- Rows and Union
- Adding, Renaming and Dropping Columns
- Working with Missing or Bad Data
- Working with User Defined Functions
- Challenge Part 2 - Brief
- Challenge Part 2
- Challenge Part 2 - Remove Null Row and Bad Records
- Challenge Part 2 - Get the City and State
- Challenge Part 2 - Rearrange the Schema
- Challenge Part 2 - Write Partitioned DataFrame to Parquet
- Aggregations
- Aggregations - Setting up Flight Summary Data
- Aggregations - Count and Count Distinct
- Aggregations - Min Max Sum SumDistinct AVG
- Aggregations with Grouping
- Challenge Part 3 - Brief
- Challenge Part 3
- Challenge Part 3 - Prepare 2019 Data
- Challenge Part 3 - Q1 Get the Best Sales Month
- Challenge Part 3 - Q2 Get the City that sold the most products
- Challenge Part 3 - Q3 When to advertise
- Challenge Part 3 - Q4 Products Bought Together
Introduction to Spark SQL and Databricks
- Introduction to DataBricks
- Spark SQL Introduction
- Register Account on Databricks
- Create a Databricks Cluster
- Creating our First 2 Databricks Notebooks
- Reading CSV Files into DataFrame
- Creating a Database and Table
- Inserting Records into a Table
- Exposing Bad Records
- Figuring out how to remove bad records
- Extract the City and State
- Inserting Records to Final Sales Table
- What was the best month in sales?
- Get the City that sold the most products
- Get the right time to advertise
- Get the most products sold together
- Create a Dashboard
- Summary