- [IMPORTANT] English captions available for sections 1-4
- Welcome
- Course Structure
- Environment setup [Important]
- Setup - Mac
Online
₹ 449 799
Quick facts
particular | details | |
---|---|---|
Medium of instructions
English
|
Mode of learning
Self study
|
Mode of Delivery
Video and Text Based
|
Course and certificate fees
Fees information
₹ 449 ₹799
certificate availability
Yes
certificate providing authority
Udemy
The syllabus
Welcome module
The Markov decision process (MDP)
- The Markov decision process (MDP)
- Types of Markov decision process
- Trajectory vs episode
- Reward vs Return
- Discount factor
- Policy
- State values v(s) and action values q(s,a)
- Bellman equations
- Solving a Markov decision process
- Setup - MDP in code
- MDP in code - Part 1
- MDP in code - Part 2
Dynamic Programming
- Introduction to Dynamic Programming
- Value iteration
- Setup - Value iteration
- Coding - Value iteration 1
- Coding - Value iteration 2
- Coding - Value iteration 3
- Coding - Value iteration 4
- Coding - Value iteration 5
- Policy iteration
- Setup - Policy iteration
- Coding - Policy iteration 1
- Policy evaluation
- Coding - Policy iteration 2
- Policy Improvement
- Coding - Policy iteration 3
- Coding - Policy iteration 4
- Policy iteration in practice
- Generalized Policy Iteration (GPI)
Monte Carlo methods
- Monte Carlo methods
- Solving control tasks with Monte Carlo methods
- On-policy Monte Carlo control
- Setup - On-policy Monte Carlo control
- Coding - On-policy Monte Carlo control 1
- Coding - On-policy Monte Carlo control 2
- Coding - On-policy Monte Carlo control 3
- Setup - Constant alpha Monte Carlo
- Coding - Constant alpha Monte Carlo
- Off-policy Monte Carlo control
- Setup - Off-policy Monte Carlo control
- Coding - Off-policy Monte Carlo 1
- Coding - Off-policy Monte Carlo 2
- Coding - Off-policy Monte Carlo 3
Temporal difference methods
- Temporal difference methods
- Solving control tasks with temporal difference methods
- Monte Carlo vs temporal difference methods
- SARSA
- Setup - SARSA
- Coding - SARSA 1
- Coding - SARSA 2
- Q-Learning
- Setup - Q-Learning
- Coding - Q-Learning 1
- Coding - Q-Learning 2
- Advantages of temporal difference methods
N-step bootstrapping
- N-step temporal difference methods
- Where do n-step methods fit?
- Effect of changing n
- N-step SARSA
- N-step SARSA in action
- Setup - n-step SARSA
- Coding - n-step SARSA
Continuous state spaces
- Setup - Classic control tasks
- Coding - Classic control tasks
- Working with continuous state spaces
- State aggregation
- Setup - Continuous state spaces
- Coding - State aggregation 1
- Coding - State aggregation 2
- Coding - State aggregation 3
- Tile coding
- Coding - Tile coding 1
- Coding - Tile coding 2
- Coding - Tile coding 3
Brief introduction to neural networks
- Function approximators
- Artificial Neural Networks
- Artificial Neurons
- How to represent a Neural Network
- Stochastic Gradient Descent
- Neural Network optimization
Deep SARSA
- Deep SARSA
- Neural Network optimization (Deep Q-Network)
- Experience Replay
- Target Network
- Coding - Deep SARSA 1
- Coding - Deep SARSA 2
- Coding - Deep SARSA 3
- Coding - Deep SARSA 4
- Coding - Deep SARSA 5
- Coding - Deep SARSA 6
- Coding - Deep SARSA 7
- Coding - Deep SARSA 8
- Coding - Deep SARSA 9
- Coding -Deep SARSA 10
Deep Q-Learning
- Deep Q-Learning
- Setup - Deep Q-Learning
- Coding - Deep Q-Learning 1
- Coding - Deep Q-Learning 2
- Coding - Deep Q-Learning 3
REINFORCE
- Policy gradient methods
- Representing policies using neural networks
- Policy performance
- The policy gradient theorem
- REINFORCE
- Parallel learning
- Entropy regularization
- REINFORCE 2
- Coding - REINFORCE 1
- Coding - REINFORCE 2
- Coding - REINFORCE 3
- Coding - REINFORCE 4
- Coding - REINFORCE 5
Advantage Actor - Critic (A2C)
- A2C
- Setup - A2C
- Coding - A2C 1
- Coding - A2C 2
- Coding - A2C 3
- Coding - A2C 4
Outro
- Looking back
- Next steps