The goal of this independent study is to gain an introduction to the topic of Reinforcement Learning.
As such the majority of the semester will be following the textbook to gain an introduction to the topic, and the last part applying it to some problems.
Textbook
The majority of the content of this independent study will come from the textbook. This is meant to lessen the burden on the both us of as I already experimented with curating my own content.
The textbook also includes examples throughout the text to immediately apply what’s learned.
Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction” http://incompleteideas.net/book/bookdraft2017nov5.pdf
Discussions and Notes
Discussions and notes will be kept track of and published on my tilda space as time and energy permits. This is for easy reference and since it’s nice to write down what you learn.
Topics to be Discussed
###The Reinforcement Learning Problem (3 Sessions)
In this section we will get ourselves familiar with the topics that are commonly discussed in Reinforcement learning problems.
In this section we will learn the different vocab terms such as:
- Evaluative Feedback
- Non-Associative Learning
- Rewards/Returns
- Value Functions
- Optimality
- Exploration/Exploitation
- Model
- Policy
- Value Function
- Multi-armed Bandit Problem
Markov Decision Processes (4 Sessions)
This is a type of reinforcement learning problem that is commonly studied and well documented. This helps form an environment for which the agent can operate within. Possible subtopics include:
- Finite Markov Decision Processes
- Goals and Rewards
- Returns and Episodes
- Optimality and Approximation
Dynamic Programming (3 Sessions)
Dynamic Programming refers to a collection of algorithms that can be used to compute optimal policies given an environment. Subtopics that we are going over is:
- Policy Evaluation
- Policy Improvement
- Policy Iteration
- Value Iteration
- Asynchronous DP
- Generalized policy Iteration
- Bellman Expectation Equations
Monte Carlo Methods (3 Sessions)
Now we move onto not having complete knowledge of the environment. This will go into estimating value functions and discovering optimal policies. Possible subtopics include:
- Monte Carlo Prediction
- Monte Carlo Control
- Importance Sampling
- Incremental Implementation
- Off-Policy Monte Carlo Control
Temporal-Difference Learning (4-5 Sessions)
Temporal-Difference learning is a combination of Monte Carlo ideas and Dynamic Programming. This can lead to methods learning directly from raw experience without knowledge of an environment. Subtopics will include:
- TD Prediction
- Sarsa: On-Policy TD Control
- Q-Learning: Off-Policy TD Control
- Function Approximation
- Eligibility Traces