- Course and Tutor Introduction
- Course Overview and Objectives
- How to make the most out of the course ?
Online
₹ 649 799
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
Data Engineering on Google Cloud Platform online certification was developed by Cloud Resident, an education platform that offers courses in the cloud, data engineering, analytics, and architecture, and is offered by Udemy, which is intended for candidates looking for a thorough training program that could help them master the concepts and methods associated with Google Cloud Platform (GPC) for data engineering.
Data Engineering on Google Cloud Platform online classes is a short-term program that contains 10 hours of hands-on study materials supported by 42 downloadable resources which aim to offer the most useful answers to real-world scenarios for data engineering on the cloud. With Data Engineering on Google Cloud Platform online training, candidates will also be taught about strategies involved with PySpark structured streaming, real-time event data streaming, event time data processing, automation, data transformation, data ingestion, and more.
The highlights
- Certificate of completion
- Self-paced course
- 10 hours of pre-recorded video content
- 42 downloadable resources
Program offerings
- Online course
- Learning resources
- 30-day money-back guarantee
- Unlimited access
- Accessible on mobile devices and tv
Course and certificate fees
Fees information
certificate availability
certificate providing authority
What you will learn
After completing the Data Engineering on Google Cloud Platform certification course, candidates will be introduced to the fundamentals of the Google cloud platform (GCP) for data engineering operations as well as will acquire the knowledge of the concepts involved with cloud computing, ETL, and data warehousing. In this data engineering certification, candidates will explore the functionalities of Apache Airflow, HiveSQL, SparkSQL, CloudSQL, Bigquery, Hive tables, PySpark, Dataproc, and Ad-hoc queries. In this data engineering course, candidates will learn about strategies involved with automation, event time data processing, real-time data streaming, PySpark structured streaming, data ingestion, and data transformation.
The syllabus
Introduction and Overview
Batch Processing and ETL using BigQuery,Spark and Airflow / Google composer
- Introduction to Bigquery as a Data warehousing tool on GCP
- Practical - Partitioned tables & Loading Data
- Introduction to Dataproc Clusters
- Practical - Create Dataproc Clusters
- Practical - Problem Statement | Write PySpark ETL job using Jupyter notebooks
- Practical - Submit Pyspark Job and load data into Bigquery tables
- Introduction To Google Workflow Template
- Practical - Write a Google workflow to submit Pyspark applications
- Introduction to Apache Airflow / Google Composer
- Practical-Write airflow script in python for creating DAG and dependencies
Batch Data ingestion using Apache Sqoop and Apache Airflow / Google Composer
- Introduction to Apache Sqoop
- Practical - Setup Sqoop dependencies and Cloudsql database Setup
- Practical - Create Dataproc Cluster | Sqoop Commands/simple imports to GCS
- Practical - Sqoop - Incremental Imports from CloudSql Mysql Database
- Practical - Sqoop Boundary Query / Imports with no Primary keys
- Practical - Sqoop import using Apache Airflow / Google Composer
- Practical - Sqoop incremental imports using Apache Airflow / Google Composer
Kafka Crash Course
- Kafka Introduction
- Kafka - Topics , partitions and brokers
- Kafka - Replications
- Kafka - Role of Zookeeper
- Kafka - Practice Commands on Dataproc
Real-Time Streaming and Analytics using Spark Structured Streaming with Kafka
- Real time Streaming - Section Overview
- Understanding Spark streaming APIs - Dstreams and Structured Streaming
- Introduction to Spark Structured streaming
- Practical - Create Dataproc Clusters - With Initialization Actions
- Practical - Dataproc Cluster Setup and prerequisites for streaming application
- Practical - Pyspark Structured streaming - Testing streaming data and aggregates
- Practical - Problem Statement | Late Data handling and Streaming Aggregations
- Practical-Write background cloud Functions to load transformed data to bigquery
- Practical - Get the most visited categories in microbatches
- Problem Statement | Raw Data Streaming
- Practical - Raw Data Streaming|Hive external tables |Microbatching using Airflow
- Practical - Write GCS Triggered cloud functions to load data into bigquery
Real-Time Streaming with streaming files as source of data with IOT sensor data
- Understanding Streaming files as a source of data | IOT sensor data
- Understanding the Problem Statement
- Practical - Data generator python script setup
- Practical-Stateful Aggregations|Foreachbatch sink|GCS triggered cloud Functions
- Practical - Cloud functions & loading data into bigquery
- Problem Statement | Handling high consumption IOT device alerts
Update - BigQuery / CLoudSql - Federated Queries
Update - BigQuery / CLoudSql - Federated Queries