Interested in this College?
Get updates on Eligibility, Admission, Placements Fees Structure

Quick Facts

Medium Of Instructions	Mode Of Learning	Mode Of Delivery
English	Self Study, Virtual Classroom	Video and Text Based

Course Overview

The Georgia Institute of Technology, USA, offers the Reinforcement Learning online programme by Udacity. This four-month-long course aims to teach you important Reinforcement Learning (RL) concepts. You will study through a blend of recent papers and classic work in the area.

Throughout the Reinforcement Learning online training, you will learn about automated decision-making from the point of view of Computer Science. The course offers in-depth learning content taught by industry professionals. You will learn via interactive quizzes, video lectures and practical exercises.

The Reinforcement Learning syllabus covers a wide range of RL topics comprehensively. These include convergence, generalisation, game theory, Bellman equations, MDP (Markov Decision Process), among others. You will also explore efficient algorithms, multiagent and single-agent planning, and more.

The Reinforcement Learning course by Udacity also describes Temporal Difference (TD) learning and related concepts. You will also reiterate a result from a published paper in RL at the end of this advanced online programme.

Also Read:
Deep Reinforcement Learning Certification Courses

The Highlights

Free access
Self-paced learning
Online course
Advanced-level course

Programme Offerings

Online Learning Platform
Practical exercises
Free programme access
Exhaustive curriculum
video lectures
Self-paced learning
An offering of Georgia Tech
Industry expert instructors

Courses and Certificate Fees

Certificate Availability
no

Eligibility Criteria

There are some prerequisites to join the Reinforcement Learning training. You must be familiar with Java programming. Plus, you must have completed a graduate-level machine learning programme. You also need some prior exposure to RL.

What you will learn

Machine learningApplication of ML Algorithms

Near the end of the Reinforcement Learning programme, you will have an understanding of:

Basic RL concepts
The theoretical perspective of Machine Learning (ML)
Algorithms and procedures to learn near-optimal decisions from experience
RL topics like Temporal Difference (TD), generalisation, convergence, Bellman equations, etc.

Admission Details

Step 1 – Reach the Reinforcement Learning course page by clicking here: https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893

Step 2 – Hit the ‘Start Free Course’ button to open a registration page.

Step 3 – Fill in some basic details and click on ‘Sign Up for a new Udacity account. Alternatively, you can sign in by linking an existing Facebook or Google account.

Step 4 – That’s it. You will get enrolment in the Reinforcement Learning programme by Udacity after logging in.

Application Details

It’s easy to enroll in the Reinforcement Learning course without filling up an application form. Just log on to Udacity’s web portal, go to the course page and create an account for getting admission. You simply need to enter your full name, email address and password while creating a new account.

The Syllabus

Let’s do the time warp again
Introduction
Decision making and Reinforcement Learning
The world – 1
The world – 2
Markov Decision Process – 1
Markov Decision Process – 2
Markov Decision Process – 3
Markov Decision Process – 4
More about rewards – 1
More about rewards – 2
More about rewards – 3
A sequence of rewards – 1
A sequence of rewards – 2
A sequence of rewards – 3
A sequence of rewards – 4
Assumptions
Policies – 1
Policies – 2
Finding policies – 1
Finding policies – 2
Finding policies – 3
Finding policies – 4
Back to the future
The Bellman Equations – 1
The Bellman Equations – 2
The Bellman Equations – 3
Bellman Equations relations
The third Bellman equation
What have we learned?

Mystery game – 1
Mystery game – 2
Behaviour structures – 1
Behaviour structures – 2
Evaluating a policy
Evaluating a learner
What have we learned?

Temporal difference learning
RL context – 1
RL context – 2
TD Lambda
Value computation example
Estimating from data
Computing estimates incrementally
Properties of learning rates
Selecting learning rates
TD(1) rule
TD(1) example – 1
TD(1) example – 2
TD(1) example – 3
Why TD(1) is “Wrong”
TD(0) rule
TD(Lambda) rule
K-step estimators
K-step estimators and TD(Lambda)
TD(Lambda) empirical performance
What have we learned?

Convergence: TD with control
Bellman equations
Bellman equations with actions
Bellman operator – 1
Bellman operator – 2
Contraction mappings
Contraction mapping quiz
Contraction properties
The Bellman operator contracts – 1
The Bellman operator contracts – 2
Max Is a non-expansion
Proof that Max Is a non-expansion – 1
Proof that Max Is a non-expansion – 2
Convergence – 1
Convergence – 2
Convergence theorem explained – 1
Convergence theorem explained – 2
Generalised MDPs
Generalised MDPs – Solutions – 1
Generalised MDPs – Solutions – 2
Generalised MDPs – Solutions – 3
What have you learned?

More on value iteration – 1
More on value iteration – 2
More on value iteration – 3
Linear programming – 1
Linear programming – 2
Linear programming – 3
Policy iteration
Domination
Why does policy iteration work?
B_2 is monotonic
Another property in policy iteration – 1
Policy iteration proof
Another property in policy iteration – 2
What have we learned?

Changing the reward function
Multiplying by a scalar
Adding a scalar
Reward shaping
Shaping in RL
Potential-based shaping in RL
State-based bonuses
Potential-based shaping – 1
Potential-based shaping – 2
Q-learning with potentials – 1
Q-learning with potentials – 2
What have we learned?

K-armed bandits – 1
K-armed bandits – 2
Confidence-based exploration – 1
Confidence-based exploration – 2
Metrics for bandits – 1
Metrics for bandits – 2
Metrics for bandits – 3
Metrics for bandits – 4
Find best implies few mistakes
Few mistakes imply do well – 1
Few mistakes imply do well – 2
Do well implies find the best
Putting it together
Hoeffding
Combining arm info – 1
Combining arm info – 2
Combining arm info – 3
How many samples? – 1
How many samples? – 2
Exploring deterministic MDPs – 1
MDP optimisation criteria
Exploring deterministic MDPs – 2
Exploring deterministic MDPs – 3
Rmax analysis – 1
Rmax analysis – 2
Rmax analysis – 3
Lower bound
General stochastic MDPs
General Rmax
Simulation lemma – 1
Simulation lemma – 2
Explore-or-exploit lemma
What have we learned?

Example: Taxi
Generalisation idea
Basic up[date rule
Linear value function approximation
Calculus
Does it work? – 1
Does it work? – 2
Does it work? – 3
Baird’s counterexample – 1
Baird’s counterexample – 2
Bad update sequence – 1
Bad update sequence – 2
Bad update sequence – 3
Bad update sequence – 4
Averagers – 1
Averagers – 2
Averagers – 3
Connection to MDPs
What have we learned? – 1
What have we learned? – 2

POMDPs
POMDPs generalise MDPs
POMDP example – 1
POMDP example – 2
State estimation – 1
State estimation – 2
Value iteration in POMDPs – 1
Value iteration in POMDPs – 2
Piecewise-linear and convex – 1
Piecewise-linear and convex – 2
Piecewise-linear and convex – 3
Piecewise-linear and convex – 4
Algorithmic approach
Domination
RL for POMDPs – 1
RL for POMDPs – 2
Learning a POMDP
Learning memoryless policies – 1
Learning memoryless policies – 2
Learning memoryless policies – 3
Bayesian RL – 1
Bayesian RL – 2
Bayesian RL – 3
Predictive state representation
PSR example – 1
PSR example – 2
PSR theorem
What have we learned? – 1
What have we learned? – 2

Generalising generalising
What makes RL hard?
Temporal Abstraction – 1
Temporal Abstraction – 2
Temporal Abstraction – 3
Temporal abstraction options – 1
Temporal abstraction options – 2
Temporal abstraction option function – 1
Temporal abstraction option function – 2
Temporal abstraction option function – 3
Temporal abstraction option function – 4
Temporal abstraction option function – 5
Pac-man problems – 1
Pac-man problems – 2
Pac-man problems – 3
Pac-man problems – 4
How it comes together – 1
How it comes together – 2
Goal abstraction – 1
Goal abstraction – 2
Goal abstraction – 3
Goal abstraction – 4
Goal abstraction – 5
Monte Carlo tree search – 1
Monte Carlo tree search – 2
Monte Carlo tree search – 3
Monte Carlo tree search – 4
Monte Carlo tree search – 5
Monte Carlo tree properties – 1
Monte Carlo tree properties – 2
What have we learned? – 1
What have we learned? – 2

Scooby Dooby Doo!
Game theory
What is game theory?
A simple game – 1
A simple game – 2
A simple game – 3
Minimax
Fundamental result
Game tree – 1
Game tree – 2
Von Neumann
Mini poker
Mini poker tree
Mixed strategy
Lines
Centre game
Snitch – 1
Snitch – 2
Snitch – 3
A beautiful equilibrium – 1
A beautiful equilibrium – 2
A beautiful equilibrium – 3
The two-step
2Step2Furious
What have we learned?

The sequencing
Iterated prisoner’s dilemma
Uncertain end
Tit-for-tat – 1
Tit-for-tat – 2
Facing TFT
Finite-state strategy
The best response in IPD
Folk theorem
Repeated games – 1
Repeated games – 2
Minmax profile
Security level profile
Folksy theorem
Frim trigger
Implausible threats
TFT versus TFT
Pavlov
Pavlov vs Pavlov
Pavlov is subgame perfect
Computational folk theorem
Stochastic games and multiagent RL
Stochastic games
Models and stochastic games
Zero-sum stochastic games – 1
Zero-sum stochastic games – 2
General-sum games
Lots of ideas
What have we learned?

Solution concepts
General Tso chicken – 1
General Tso chicken – 2
General Tso chicken – 3
Correlated GTC – 1
Correlated GTC – 2
Correlated GTC – 3
Correlated facts
Solution concepts revisited
Coco values – 1
Coco values – 2
Coco definition
Coco example
Coco properties
Mechanism design
Peer teaching
Peer teaching – 2
Peer teaching – 3
Peer teaching – 4
Peer teaching – 5
King Solomon – 1
King Solomon – 2
King Solomon – 3
King Solomon – 4
King Solomon – 5
King Solomon – 6
King Solomon – 7
What have we learned? – 1
What have we learned? 2

Coordination and communicating
DEC-POMDP
DEC-POMDP properties
DEC-POMDP example
Communicating and coaching
Inverse Reinforcement Learning – 1
Inverse Reinforcement Learning – 2
Output of MLIRL
What have we learned (or have we?)
Curly, bean me up
What we will have learned
Not reward shaping
Policy shaping – 1
Policy shaping – 2
Policy shaping – 3
Policy shaping – 4
Policy shaping – 5
Policy shaping – 6
Policy shaping – 7
Multiple sources – 1
Multiple sources – 2
Multiple sources – 3
Drama management
Drama management – 2
Trajectories as MDPs
Trajectories as TTD MDPs – 1
Trajectories as TTD MDPs – 2
What have we learned?

Outroduction – part 1
Outroduction – part 2

Instructors

Articles

40+ Distance Learning Courses Approved by UGC - PG, UG, and Diploma Courses

27 Sep'23, 06:16 PM

Top 19 Online Courses on Deep Learning for Enthusiasts

28 Sep'21, 06:54 AM

40+ Distance Learning Courses Approved by UGC - PG, UG, and Diploma Courses

27 Sep'23, 06:16 PM

40+ Distance Learning Courses Approved by UGC - PG, UG, and Diploma Courses

27 Sep'23, 06:16 PM

Top 19 Online Courses on Deep Learning for Enthusiasts

28 Sep'21, 06:54 AM

Similar College

Indian Institute of Science Education and Research Pune

Amity University, Gurugram

View All Courses

Explore on Careers360

Professional Guides

15+ Courses for Learning Data Mining

How to Make a Career in the Field of Artificial Intelligence

Top 10 Benefits of Holding a Certification in Business Intelligence

Which are the best certification courses for Photography in India

A Beginner's Guide to Pursue Python Programming

Knowledge Boosters

Want to Pursue a Career in Blockchain Technology? Here is all that you need to Know

How Entrepreneurs Can Use Machine Learning to Make their Business Successful?

The Scope of Artificial Intelligence in India

Top 10 Online Courses for Travel Lovers

10 Best Certification Courses After Hospital and Healthcare Management

Student Community: Where Questions Find Answers

Ask and get expert answers on exams, counselling, admissions, careers, and study options.

Download Careers360 App

All this at the convenience of your phone

Regular Exam Updates
Best College Recommendations
College & Rank predictors
Detailed Books and Sample Papers
Question and Answers

400M+

Students

36K+

Colleges

500+

Exams

3K+

eBooks

16K+

Certifications

Reinforcement Learning

Quick Facts

Course Overview

The Highlights

Programme Offerings

Courses and Certificate Fees

Eligibility Criteria

What you will learn

Admission Details

Application Details

The Syllabus

Lesson 1 Introduction to Reinforcement Learning

Lesson 1 Introduction to Reinforcement Learning

Lesson 2 Smoov and Curly’s bogus journey

Lesson 2 Smoov and Curly’s bogus journey

Lesson 3 Reinforcement learning basics

Lesson 3 Reinforcement learning basics

Lesson 4 TD and friends

Lesson 4 TD and friends

Lesson 5 Convergence

Lesson 5 Convergence

Lesson 6 Advanced Algorithmic Analysis (AAA)

Lesson 6 Advanced Algorithmic Analysis (AAA)

Lesson 7 Messing with rewards

Lesson 7 Messing with rewards

Lesson 8 Exploring exploration

Lesson 8 Exploring exploration

Lesson 9 Generalisation

Lesson 9 Generalisation

Lesson 10 Partially observable MDPS

Lesson 10 Partially observable MDPS

Lesson 11 Options

Lesson 11 Options

Lesson 12 Game theory

Lesson 12 Game theory

Lesson 13 Game theory reloaded

Lesson 13 Game theory reloaded

Lesson 14 Game theory revolutions

Lesson 14 Game theory revolutions

Lesson 15 CCC

Lesson 15 CCC

Lesson 16 Outroduction to Reinforcement Learning

Lesson 16 Outroduction to Reinforcement Learning

Instructors

Articles

Similar Courses

More Courses by Georgia Tech

Explore on Careers360

Explore Trending Courses

Explore Free Courses

Most Popular Branches

Most Popular Platforms

Popular Reads

Professional Guides

Knowledge Boosters

Download Careers360 App

All this at the convenience of your phone

Scan and download the app