- Udemy 101: Getting the Most From This Course
- Alternate download link for the ml-100k dataset
- WARNING: DO NOT INSTALL JAVA 16 IN THE NEXT LECTURE
- Introduction, and installing the course materials, IntelliJ, and Scala
- Introduction to Apache Spark
- Spark Basics
- What's New in Spark 3?
Online
₹ 649 799
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Apache Spark is a strong and united analytics engine for handling large amounts of data. It involves Java, Scala, Python, and R APIs, as well as an optimized engine that supports general implementation graphs. Apache Spark with Scala - Hands On with Big Data online certification is designed by Sundog Education - an educational platform that provides valuable professional skills in big data, data science, and machine learning in association with Frank Kane- Founder of Sundog Education, which is presented by Udemy.
Apache Spark with Scala - Hands On with Big Data online course offers 9 hours of hands-on lectures along with 3 articles which are designed to help candidates learn the technique of framing data analysis difficulties as spark problems using over 20 hands-on illustrations, and then scaling them up to operate on cloud computing services. Apache Spark with Scala - Hands On with Big Data online classes discuss topics like big data analysis, machine learning, data streaming, caching, partitioning, graph structures, structured data, data frames, Hadoop clusters, datasets, and more.
The highlights
- Certificate of completion
- Self-paced course
- 9 hours of pre-recorded video content
- 3 articles
- Learning resources
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
certificate providing authority
What you will learn
After completing the Apache Spark with Scala - Hands On with Big Data certification course, candidates will acquire knowledge of the strategies to use Apache Spark with Scala for big data operations including big data analytics. In this big data certification, candidates will explore the concepts involved with MLLib, data streaming, spark streaming, caching, partitioning, resilient distributed datasets, graph structures, and structured data as well as will acquire knowledge of techniques to transform structured data using datasets, data frames, and SparkSQL. In this Apache Spark course, candidates will learn about methodologies to develop, deploy and manage spark scripts on Hadoop clusters as well as will acquire knowledge of the strategies involved with traversing and analyzing graph structures using GraphX. In this big data course, candidates will also learn about analyzing big data sets using machine learning on Spark.
The syllabus
Getting Started
Scala Crash Course [Optional]
- [Activity] Scala Basics
- [Exercise] Flow Control in Scala
- [Exercise] Functions in Scala
- [Exercise] Data Structures in Scala
Using Resilient Distributed Datasets (RDDs)
- The Resilient Distributed Dataset
- Ratings Histogram Example
- Spark Internals
- Key / Value RDD's, and the Average Friends by Age example
- [Activity] Running the Average Friends by Age Example
- Filtering RDD's, and the Minimum Temperature by Location Example
- [Activity] Running the Minimum Temperature Example, and Modifying it for Maximum
- [Activity] Counting Word Occurrences using Flatmap()
- [Activity] Improving the Word Count Script with Regular Expressions
- [Activity] Sorting the Word Count Results
- [Exercise] Find the Total Amount Spent by Customer
- [Exercise] Check your Results, and Sort Them by Total Amount Spent
- Check Your Results and Implementation Against Mine
SparkSQL, DataFrames, and DataSets
- Introduction to SparkSQL
- [Activity] Using SparkSQL
- [Activity] Using DataSets
- [Exercise] Implement the "Friends by Age" example using DataSets
- Exercise Solution: Friends by Age, with Datasets.
- [Activity] Word Count example, using Datasets
- [Activity] Revisiting the Minimum Temperature example, with Datasets
- [Exercise] Implement the "Total Spent by Customer" problem with Datasets
- Exercise Solution: Total Spent by Customer with Datasets
Advanced Examples of Spark Programs
- [Activity] Find the Most Popular Movie
- [Activity] Use Broadcast Variables to Display Movie Names
- [Activity] Find the Most Popular Superhero in a Social Graph
- [Exercise] Find the Most Obscure Superheroes
- Exercise Solution: Find the Most Obscure Superheroes
- Superhero Degrees of Separation: Introducing Breadth-First Search
- Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
- [Activity] Superhero Degrees of Separation: Review the code, and run it!
- Item-Based Collaborative Filtering in Spark, cache(), and persist()
- [Activity] Running the Similar Movies Script using Spark's Cluster Manager
- [Exercise] Improve the Quality of Similar Movies
Running Spark on a Cluster
- [Activity] Using spark-submit to run Spark driver scripts
- [Activity] Packaging driver scripts with SBT
- [Exercise] Package a Script with SBT and Run it Locally with spark-submit
- Exercise solution: Using SBT and spark-submit
- Introducing Amazon Elastic MapReduce
- Creating Similar Movies from One Million Ratings on EMR
- Partitioning
- Best Practices for Running on a Cluster
- Troubleshooting, and Managing Dependencies
Machine Learning with Spark ML
- Introducing MLLib
- [Activity] Using MLLib to Produce Movie Recommendations
- Linear Regression with MLLib
- [Activity] Running a Linear Regression with Spark
- [Exercise] Predict Real Estate Values with Decision Trees in Spark
- Exercise Solution: Predicting Real Estate with Decision Trees in Spark
Intro to Spark Streaming
- The DStream API for Spark Streaming
- [Activity] Real-time Monitoring of the Most Popular Hashtags on Twitter
- Structured Streaming
- [Activity] Using Structured Streaming for real-time log analysis
- [Exercise] Windowed Operations with Structured Streaming
- Exercise Solution: Top URL's in a 30-second Window
Intro to GraphX
- GraphX, Pregel, and Breadth-First-Search with Pregel.
- Using the Pregel API with Spark GraphX
- [Activity] Superhero Degrees of Separation using GraphX
You Made It! Where to Go from Here.
- Learning More, and Career Tips
- Bonus Lecture: More courses to explore!
Instructors
Mr Frank Kane
Founder
Freelancer