- Introduction
Building Big Data Pipelines with PySpark + MongoDB + Bokeh
Acquire a thorough understanding of the strategies involved in building big data pipelines with PySpark, MongoDB, and ...Read more
Online
₹ 499 2299
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Big data pipelines are data pipelines that are built to facilitate one or more of the three characteristics of big data. The speed of big data makes it attractive to create big data streaming data pipelines. Data can be gathered and handled in real time, allowing for action to be taken. EBISYS R&D - Big Data Engineering and Consulting created the Building Big Data Pipelines with PySpark + MongoDB + Bokeh certification course, which is available on Udemy.
Building Big Data Pipelines with PySpark + MongoDB + Bokeh online course is a self-paced program that is aimed at students who want to master the skills and strategies useful for creating data pipelines using the core functionalities of tools like PySpark, Bokeh, and MongoDB. Building Big Data Pipelines with PySpark + MongoDB + Bokeh online classes cover topics like data preprocessing, data loading, data extraction, data manipulation, data transformation, and data visualization as well as explain the techniques to create machine learning scripts, PySpark ETL scripts, and dashboard server.
The highlights
- Certificate of completion
- Self-paced course
- 5 hours of pre-recorded video content
- 1 article
- 1 downloadable resource
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
Yes
certificate providing authority
Udemy
Who it is for
What you will learn
After completing the Building Big Data Pipelines with PySpark + MongoDB + Bokeh online certification, students will develop an understanding of big data and machine learning to develop big data pipelines using PySpark, MongoDB, Bokeh, and MLlib. Students will explore the methodologies associated with data processing, data analysis, data loading, data transformation, data extraction, data visualization, and data manipulation. Students will also learn about the strategies and concepts involved with geospatial machine learning and geo-mapping.
The syllabus
Introduction
Setup and Installations
- Python Installation
- Installing Third Party Libraries
- Installing Apache Spark
- Installing Java (Optional)
- Testing Apache Spark Installation
- Installing MongoDB
- Installing NoSQL Booster for MongoDB
Data Processing with PySpark and MongoDB
- Integrating PySpark with Jupyter Notebook
- Data Extraction
- Data Transformation
- Loading Data into MongoDB
Machine Learning with PySpark and MLlib
- Data Pre-processing
- Building the Predictive Model
- Creating the Prediction Dataset
Data Visualization
- Loading the Data Sources from MongoDB
- Creating a Map Plot
- Creating a Bar Chart
- Creating a Magnitude Plot
- Creating a Grid Plot
Creating the Data Pipeline Scripts
- Installing Visual Studio Code
- Creating the PySpark ETL Script
- Creating the Machine Learning Script
- Creating the Dashboard Server
Source Code and Notebook
- Source Code and Notebook