- Meet Alexey Dral
- Meet Natalia Pritykovskaya
- Meet Pavel Klemenkov
- Meet Pavel Mezentsev
- What is BigData Analysis?
- Tools for BigData Analysis
- Graph Data Analysis
- Computations Optimization
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Master the knowledge of Spark SQL, GraphFrames, DataFrames and Hive tools with The Big Data Analysis: Hive, Spark SQL, ...Read more
Expert
Online
6 Weeks
Free
Quick Facts
particular | details | |||
---|---|---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course overview
We are witnessing a phenomenal era, which is digitally powered with technologies flourishing exponentially. With the strong presence of Big Data, computer systems are now capable of deriving acute information and desired results through analysis of this structured and unstructured data, which ultimately forms the ‘big data’. It is also witnessed that such analysis of Big Data is highly relevant and informative for large organisations, businesses and professionals to optimize their performances.
Considered to be information assets, Big Data enables effective decision making, optimizing processes and cost effectiveness in large, medium as well as small organisations. However, to yield the benefits of this high volume, high velocity and high variety of big data, it becomes essential to analyse this big data using various tools and techniques. This is where the knowledge of using Hive, Spark SQL, DataFrames and GraphFrames comes very handy. Analysts engaged in big data analysis are highly in demand and using these tools, one can efficiently analyse the big data to facilitate important decision making and process optimisation in their employer organisations.
The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames online programme offered by Coursera will impart key skills to the participants in using big data analysis tools and pursue their careers in the area of big data analysis.
The highlights
- Offered by Yandex via Coursera platform
- 100% online learning mode
- About 39 hours of course content
- Flexible learning schedule and assignment deadlines
- Shareable certificate upon course completion
- Insights from industry experts
Program offerings
- Videos
- Readings
- Practice exercises
- Quizzes
Course and certificate fees
Type of course
Free
- Coursera offers this course via Purchase Course and Audit-Only Options.
- The price of purchasing the course is Rs. 2,159.
- There are no charges for Audit Mode. However, participants will not be able to gain access to graded assignments required to earn the certificate.
Fee details for Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames Course
Course Purchase Fees | Rs. 2,159 (includes full access to course material and graded assignments) |
Audit Only | Free access to course material except graded assignments |
Financial Aid | Available on application |
certificate availability
Yes
certificate providing authority
Coursera
certificate fees
₹2,152
Eligibility criteria
Certification Qualifying Details
In order to avail the certificate of completion, participants of The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames certificate course will have to complete as well as pass in all the graded items attached to the course. These assignments will consist of quizzes and other assignments (if applicable). Upon passing in these assignments, Coursera will issue an electronic certificate of completion which will be automatically added to the accomplishment page of participants. further, this certificate can be shared online via URL as well as be printed.
What you will learn
Upon completion of The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames, participants of the Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames course will be able to:
- Process graphs using the Spark GraphFrames
- Construct and use Spark DataFrames
- Write ad-hoc analytical jobs using Spark DataFrames
- Optimize and debug Spark application performance to its maximum
- Use Hive, Spark DataFrames and Spark SQL to efficiently warehouse your data
- Make effective use of networks and social graphs
- Write and executive queries using Hive and Spark SQL
The syllabus
Welcome to the Second Course: Big Data Analysis
Videos
Readings
- Slack Channel is the quickest way to get answers to your questions
Big Data SQL: Hive
Videos
- Hive Data Definition Language (DDL)
- Hive Data Manipulation Language (DML)
- Hive Analytics: RegexSerDe, Views
- Hive Streaming
- Hive Optimization: Data Skew
- Analytics: Business Use Cases
- Business Use Cases: Solution with Hive
- HTTP Web Service: Access Log Format
- Hive Analytics: UDF, UDAF, UDTF
- Hive Optimization: Partitioning, Bucketing and Sampling
- Hive PTF (Window Functions)
- Hive Map-Side Joins: Plain, Bucket, Sort-Merge
- Hive Optimization: Row-Columnar File Formats, Compression
- (optional) Regular Expressions, Likbez
- (optional) SQL: likbez
Big Data SQL: Hive (practice week)
Videos
- How to Install Docker on Windows 7, 8, 10
- How to submit your first Hive assignment
- How to submit your first assignment
Readings
- Docker Installation Guide
- Hive assignment. Intro and instructions
- Assignments. General requirements
- Grading System: Instructions and Common Problems
Spark SQL and Spark Dataframe
Videos
- How to process a DataFrame as SQL
- Advantages of Spark SQL
- Working with Hive
- What is Pandas DataFrame and how to create it
- RDD vs. DF vs. SQL
- Aggregates
- User Defined Functions
- Functions
- Projection and Filtering
- Reading and Writing Files
- Time Processing
- Window Functions
- Two-Dimensional Distributions
- Join
Graph Analysis from Big Data Perspective
Videos
- Counting common friends. Part I
- Graph representation
- Counting common friends. Part II
- Graph examples
- GraphFrames: Introduction
- Motif Finding: DSL
- Counting common friends. Part III
- Motif Finding: Counting Mutual Friends
- Triangles Count: Introduction
- Motif Finding: Under The Hood. Part 1
- Triangles Count: Edge Lists
- Triangles Count: GraphFrame
- Motif Finding: Under The Hood. Part 2
PageRank and Recent Advances
Videos
- Introduction
- GraphFrames
- Algorithm
- Taste Graph. Part I
- Page Rank Algorithm
- GraphFrames API
- Taste Graph. Part II
- RDD Implementation
- Taste Graph. Part III
- Random Walk
Readings
- Graph based Music Recommender
Spark Internals and Optimization
Videos
- Welcome
- Shuffle. Where to send data?
- Shuffle. How to send data?
- Spark Execution Model
- PageRank Optimization
- Optimizing Functions
- Catalyst
- Spark SQL. Motivation
- UDF Optimization
- Joins
- Optimizing Joins
- Catalyst Optimization Example
- Resource Allocation
- Memory Management
- Speculative Execution
- Persistence and Checkpointing
- Dynamic Allocation
Readings
- Deployment of the environment
Admission details
To enrol for the Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames online certificate course, the participants are required to follow the below mentioned steps.
Step 1: Visit the course page.
Step 2: Click on the “Enroll for Free” box.
Step 3: Log in or sign up using your Google or Email credentials.
Step 4: You will have the option of “Purchase Course” and “Audit Only”. “Purchase Course” will enable the applicants to receive full access to all course material and graded assignments. Through the “Audit Only” option, applicants will only receive access to course content, and not the graded items.
Step 5: Choose the option of your choice and proceed to make payment.
Please note that applicants via the “Audit Only” option, applicants will be able to access the course content, but not the graded assignments, which is required to avail course completion certificate from Coursera.
Scholarship Details
Coursera offers financial aid/sponsorship support for this course. Through this, participants will be able to access the entire course content including the graded assignments in order to earn the course completion certificate.
To apply for financial aid/scholarship, follow these steps.
Step 1: Click on the ‘Financial aid available’ button on the course homepage. Enter your desired log-in credentials to proceed.
Step 2: Fill the application form with basic information and required fields. To avoid rejection, ensure that the application is more than 150 words.
Step 3: While your application is being reviewed, you can begin the course through audit mode. Please note that the review process can take up to 15 days.
Step 4: Upon review, Coursera will notify your application status as accepted or rejected/denied via email. In the case of application being accepted, participants will be directly enrolled in the course.
Step 5: Participants will have 2 weeks’ time to unenroll from the course, once the application has been accepted.
Evaluation process
Participants will be required to pass in all the graded assignments of the course to earn the course completion certificate from Coursera. These assignments will consist majorly of quizzes and any other applicable assignments given by the instructors. Participants can opt for flexible deadlines, as well as save their progress to be picked up later. Participants will be eligible for a shareable Certificate from Coursera only after completing and passing all the required graded assignments.
How it helps
In this highly digitised era, technological advancements are beaming and achieving new heights. Digitalisation is simplifying processes and optimizing performance, allowing the businesses to flourish more efficiently. This is possible because of useful, informative and critical analysis of ‘Big Data’. Participants of this online course will be benefited with knowledge and skills to use various tools for building highly dynamic and well-organised big data workflows.
Through this online certification course, participants will gain useful insights and working knowledge of Hive, Spark SQL, DataFrames and GraphFrames, which will enable them to efficiently warehouse their data, write and execute queries, as well as work with social graphs and networks. The participants will benefit a great deal by gaining knowledge of Pandas DataFrame, Aggregates, PageRank Optimization, Memory Management and various other tools of big data analysis. The participants will also receive a shareable certificate from Coursera upon completion of the course.
Globally, there is a surge in demand for big data analysts and business analysts who possess the competency to deal with a humongous volume of data. By attending and completing this course, the participants will receive exposure to tools and techniques used in analysing big data and strengthen their skills and competencies to assist their employer organisations in better decision making, forecasting, cost reduction and process optimisation. Participants will also receive their training tutorials from leading instructors representing the big data analysis field.
FAQs
Is there a prerequisite to enrol for The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames course?
This course is an advanced level course designed to deliver learning on making use of big data analysis tools like Hive, Spark SQL, DataFrames and GraphFrames, irrespective of the personal skill levels of the participants.
How does the Audit Only option work?
Through the Audit Only option, participants will be able to access only the course material consisting of videos and readings, and not the graded assignments like quizzes. Participants do not have to pay any fees or charges to ‘audit’ the course.
Who can become eligible for certification in The Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames?
Only those participants who have completed all the course work and have passed with sufficient grades will become eligible for Coursera certification.
How does the shareable course certificate work?
You will receive your certificate once you have passed in all the graded assignments. From the accomplishments page, you can share the certificate online on LinkedIn, CV and more, as well as print the same.
What is the Purchase Course method?
By purchasing the course, participants will have access to full contents of the course including quizzes and other graded assignments required to earn certification.
Articles
Popular Articles
Similar Courses


Big Data Capstone Project
The University of Adelaide, Adelaide via Edx


Advanced Certification Program in Big Data
Belhaven University, Mississippi via Intellipaat

Big Data Applications Machine Learning at Scale
Yandex via Coursera

Data Architect
Udacity


Big Data and Education
Penn via Edx


Big Data Analytics using Spark
UC San Diego via Edx
Courses of your Interest

TOGAF 9 Combined Level 1 and Level 2 Training
SkillUp Online via Simplilearn

Advanced Certificate Program in DevOps
CMU School of Computer Science, Pitts... via TalentSprint

Mastering Deep Learning Using Apache Spark
Simpliv Learning

Devops with AWS CodePipeline Jenkins and AWS CodeD...
Simpliv Learning

Machine Learning with Python from Linear Models to...
MIT Cambridge via Edx

Computer Applications of Artificial Intelligence a...
Purdue University, West Lafayette via Edx
Advanced Power Searching With Google
Google via Edx

Automated Software Testing Model and State Based T...
Delft University of Technology via Edx

Capstone Exam in Statistics and Data Science
MIT Cambridge via Edx