- You, this course and Us
Learn By Example: Hadoop, MapReduce for Big Data problems
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Learn By Example: Hadoop, MapReduce for Big Data Problems certification course is designed by Loony Corn, a global e-learning platform with ex-Google, Stanford, and Flipkart team members, and is made available by Udemy for individuals looking to create sophisticated distributed computing applications to process large amounts of data using the capabilities of Hadoop and MapReduce. Learn By Example: Hadoop, MapReduce for Big Data problems online course aims to provide participants a hands-on introduction to Hadoop from the very beginning.
Learn By Example: Hadoop, MapReduce for Big Data problems online classes include more than 13.5 hours of video-based lessons accompanied by 112 downloadable study materials and articles that cover topics like parallel thinking, performance tuning, natural language processing, cluster management, serial computing, distributed computing, collaborative filtering, k-means clustering, as well as teach about the techniques to use VMs and the Cloud to build their clusters.
The highlights
- Certificate of completion
- Self-paced course
- 13.5 hours of pre-recorded video content
- 1 article
- 112 downloadable resources
Program offerings
- Online course
- Downloadable learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
Yes
certificate providing authority
Udemy
Who it is for
What you will learn
After completing the Learn By Example: Hadoop, MapReduce for Big Data problems online certification, participants will be introduced to the methodologies and techniques of MapReduce and Hadoop for big data operations. Participants will explore techniques involved with the interaction of YARN, Mapreduce, and HDFS as well as acquire the knowledge of the principle associated with parallel thinking. Participants will learn about concepts involved with performance tuning, collaborative filtering, natural language processing, k-means clustering, serial computing, distributed computing, and cluster management. Additionally, participants will learn how to use SQL group and SQL select, as well as inverted indices.
The syllabus
Introduction
Why is Big Data a Big Deal
- The Big Data Paradigm
- Serial vs Distributed Computing
- What is Hadoop?
- HDFS or the Hadoop Distributed File System
- MapReduce Introduced
- YARN or Yet Another Resource Negotiator
Installing Hadoop in a Local Environment
- Hadoop Install Modes
- Hadoop Standalone mode Install
- Hadoop Pseudo-Distributed mode Install
The MapReduce "Hello World"
- The basic philosophy underlying MapReduce
- MapReduce - Visualized And Explained
- MapReduce - Digging a little deeper at every step
- "Hello World" in MapReduce
- The Mapper
- The Reducer
- The Job
Run a MapReduce Job
- Get comfortable with HDFS
- Run your first MapReduce Job
Juicing your MapReduce - Combiners, Shuffle and Sort and The Streaming API
- Parallelize the reduce phase - use the Combiner
- Not all Reducers are Combiners
- How many mappers and reducers does your MapReduce have?
- Parallelizing reduce using Shuffle And Sort
- MapReduce is not limited to the Java language - Introducing the Streaming API
- Python for MapReduce
HDFS and Yarn
- HDFS - Protecting against data loss using replication
- HDFS - Name nodes and why they're critical
- HDFS - Checkpointing to backup name node information
- Yarn - Basic components
- Yarn - Submitting a job to Yarn
- Yarn - Plug in scheduling policies
- Yarn - Configure the scheduler
MapReduce Customizations For Finer Grained Control
- Setting up your MapReduce to accept command line arguments
- The Tool, ToolRunner and GenericOptionsParser
- Configuring properties of the Job object
- Customizing the Partitioner, Sort Comparator, and Group Comparator
The Inverted Index, Custom Data Types for Keys, Bigram Counts and Unit Tests!
- The heart of search engines - The Inverted Index
- Generating the inverted index using MapReduce
- Custom data types for keys - The Writable Interface
- Represent a Bigram using a WritableComparable
- MapReduce to count the Bigrams in input text
- Setting up your Hadoop project
- Test your MapReduce job using MRUnit
Input and Output Formats and Customized Partitioning
- Introducing the File Input Format
- Text And Sequence File Formats
- Data partitioning using a custom partitioner
- Make the custom partitioner real in code
- Total Order Partitioning
- Input Sampling, Distribution, Partitioning and configuring these
- Secondary Sort
Recommendation Systems using Collaborative Filtering
- Introduction to Collaborative Filtering
- Friend recommendations using chained MR jobs
- Get common friends for every pair of users - the first MapReduce
- Top 10 friend recommendation for every user - the second MapReduce
Hadoop as a Database
- Structured data in Hadoop
- Running an SQL Select with MapReduce
- Running an SQL Group By with MapReduce
- A MapReduce Join - The Map Side
- A MapReduce Join - The Reduce Side
- A MapReduce Join - Sorting and Partitioning
- A MapReduce Join - Putting it all together
K-Means Clustering
- What is K-Means Clustering?
- A MapReduce job for K-Means Clustering
- K-Means Clustering - Measuring the distance between points
- K-Means Clustering - Custom Writables for Input/Output
- K-Means Clustering - Configuring the Job
- K-Means Clustering - The Mapper and Reducer
- K-Means Clustering : The Iterative MapReduce Job
Setting up a Hadoop Cluster
- Manually configuring a Hadoop cluster (Linux VMs)
- Getting started with Amazon Web Servicies
- Start a Hadoop Cluster with Cloudera Manager on AWS
Appendix
- Setup a Virtual Linux Instance (For Windows users)
- [For Linux/Mac OS Shell Newbies] Path and other Environment Variables