- Udemy 101: Getting the Most From This Course
- Tips for Using This Course
- If you have trouble downloading Hortonworks Data Platform...
- Warning for Apple M1 users
- Installing Hadoop [Step by Step]
- The Hortonworks and Cloudera Merger, and how it affects this course.
- Hadoop Overview and History
- Overview of the Hadoop Ecosystem
Online
₹ 649 1,299
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Big data is a collection of organized, semi-organized, and unstructured data collected by businesses that can be extracted for knowledge and applied to advanced analytics tasks like predictive modeling and machine learning. Sundog Education, an educational platform that offers extremely advantageous skillsets in big data, data science, and machine learning in collaboration with Frank Kane, founder of Sundog Education, has created The Ultimate Hands-On Hadoop: Tame your Big Data certification course, which is made available through Udemy.
The Ultimate Hands-On Hadoop: Tame your Big Data online training is a self-paced program that provides more than 14.5 hours of video-based lessons accompanied by 8 articles and 2 downloadable resources which are intended for students who want to comprehend the broader Hadoop ecosystem and use it to store, analyze, and sell big data at scale. The Ultimate Hands-On Hadoop: Tame your Big Data online course covers various data engineering technologies including Cassandra, MongoDB, MySQL, Kafka, Spark, Hive, Flink, Storm, Presto, Hbase, ZooKeeper, Sqoop, Hue, Flume, Oozie, Mesos, Phoenix, Drill, and more.
The highlights
- Certificate of completion
- Self-paced course
- 14.5 hours of pre-recorded video content
- 8 articles
- 2 downloadable resources
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
certificate providing authority
What you will learn
After completing The Ultimate Hands-On Hadoop: Tame your Big Data online certification, students will gather a better understanding of the principles and concepts involved with big data using Hadoop for big data analytics. In this big data course, students will explore the fundamentals associated with data analysis, relational data analysis, non-relational data analysis, data streaming, data storage, spark streaming, and querying data as well as will acquire the knowledge of the methodologies to use Pig and Spark to create scripts to process data on a Hadoop cluster. In this big data certification, students will learn about the functionalities of various data engineering tools including Habse, Cassandra, MongoDB, Drill, Phoenix, Kafka, Sqoop, Zeppelin, Tez, Hue, YARN, Presto, MySQL, Mesos, HDFS, Oozie, MapReduce, Flink, and Storm.
The syllabus
Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.
Using Hadoop's Core: HDFS and MapReduce
- HDFS: What it is, and how it works
- Alternate MovieLens download location
- Installing the MovieLens Dataset
- [Activity] Install the MovieLens dataset into HDFS using the command line
- MapReduce: What it is, and how it works
- How MapReduce distributes processing
- MapReduce example: Break down movie ratings by rating score
- Notes on MRJob installation
- [Activity] Installing Python, MRJob, and nano
- [Activity] Code up the ratings histogram MapReduce job and run it
- [Exercise] Rank movies by their popularity
- Note: Sorting will only work by partition.
- [Activity] Check your results against mine!
Programming Hadoop with Pig
- Introducing Ambari
- Introducing Pig
- Example: Find the oldest movie with a 5-star rating using Pig
- [Activity] Find old 5-star movies with Pig
- More Pig Latin
- [Exercise] Find the most-rated one-star movie
- Pig Challenge: Compare Your Results to Mine!
Programming Hadoop with Spark
- Why Spark?
- The Resilient Distributed Dataset (RDD)
- [Activity] Find the movie with the lowest average rating - with RDD's
- Datasets and Spark 2.0
- [Activity] Find the movie with the lowest average rating - with DataFrames
- [Activity] Movie recommendations with MLLib
- [Exercise] Filter the lowest-rated movies by number of ratings
- [Activity] Check your results against mine!
Using relational data stores with Hadoop
- What is Hive?
- [Activity] Use Hive to find the most popular movie
- How Hive works
- [Exercise] Use Hive to find the movie with the highest average rating
- Compare your solution to mine.
- Integrating MySQL with Hadoop
- Cheat sheet for the following lecture
- [Activity] Install MySQL and import our movie data
- [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
- [Activity] Use Sqoop to export data from Hadoop to MySQL
Using non-relational data stores with Hadoop
- Why NoSQL?
- What is HBase
- [Activity] Import movie ratings into HBase
- [Activity] Use HBase with Pig to import data at scale.
- Cassandra overview
- If you have trouble installing Cassandra...
- [Activity] Installing Cassandra
- [Activity] Write Spark output into Cassandra
- MongoDB overview
- [Activity] Install MongoDB, and integrate Spark with MongoDB
- [Activity] Using the MongoDB shell
- Choosing a database technology
- [Exercise] Choose a database for a given problem
Querying your Data Interactively
- Overview of Drill
- [Activity] Setting up Drill
- [Activity] Querying across multiple databases with Drill
- Overview of Phoenix
- [Activity] Install Phoenix and query HBase with it
- [Activity] Integrate Phoenix with Pig
- Overview of Presto
- [Activity] Install Presto, and query Hive with it.
- [Activity] Query both Cassandra and Hive using Presto.
Managing your Cluster
- YARN explained
- Tez explained
- [Activity] Use Hive on Tez and measure the performance benefit
- Mesos explained
- ZooKeeper explained
- [Activity] Simulating a failing master with ZooKeeper
- Oozie explained
- [Activity] Set up a simple Oozie workflow
- Zeppelin overview
- [Activity] Use Zeppelin to analyze movie ratings, part 1
- [Activity] Use Zeppelin to analyze movie ratings, part 2
- Hue overview
- Other technologies worth mentioning
Feeding Data to your Cluster
- Kafka explained
- [Activity] Setting up Kafka, and publishing some data.
- [Activity] Publishing web logs with Kafka
- Flume explained
- [Activity] Set up Flume and publish logs with it.
- [Activity] Set up Flume to monitor a directory and store its data in HDFS
Analyzing Streams of Data
- Spark Streaming: Introduction
- [Activity] Analyze web logs published with Flume using Spark Streaming
- [Exercise] Monitor Flume-published logs for errors in real time
- Exercise solution: Aggregating HTTP access codes with Spark Streaming
- Apache Storm: Introduction
- [Activity] Count words with Storm
- Flink: An Overview
- [Activity] Counting words with Flink
Designing Real-World Systems
- The Best of the Rest
- Review: How the pieces fit together
- Understanding your requirements
- Sample application: consume webserver logs and keep track of top-sellers
- Sample application: serving movie recommendations to a website
- [Exercise] Design a system to report web sessions per day
- Exercise solution: Design a system to count daily sessions
Learning More
- Books and online resources
- Bonus Lecture: More courses to explore!
Instructors
Mr Frank Kane
Founder
Freelancer