~/Reinfrocement Learning Syllabus

Brandon Rozek

Photo of Brandon Rozek

PhD Student @ RPI studying Automated Reasoning in AI and Linux Enthusiast.

The goal of this independent study is to gain an introduction to the topic of Reinforcement Learning.

As such the majority of the semester will be following the textbook to gain an introduction to the topic, and the last part applying it to some problems.


The majority of the content of this independent study will come from the textbook. This is meant to lessen the burden on the both us of as I already experimented with curating my own content.

The textbook also includes examples throughout the text to immediately apply what’s learned.

Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction” http://incompleteideas.net/book/bookdraft2017nov5.pdf

Discussions and Notes

Discussions and notes will be kept track of and published on my tilda space as time and energy permits. This is for easy reference and since it’s nice to write down what you learn.

Topics to be Discussed

###The Reinforcement Learning Problem (3 Sessions)

In this section we will get ourselves familiar with the topics that are commonly discussed in Reinforcement learning problems.

In this section we will learn the different vocab terms such as:

Markov Decision Processes (4 Sessions)

This is a type of reinforcement learning problem that is commonly studied and well documented. This helps form an environment for which the agent can operate within. Possible subtopics include:

Dynamic Programming (3 Sessions)

Dynamic Programming refers to a collection of algorithms that can be used to compute optimal policies given an environment. Subtopics that we are going over is:

Monte Carlo Methods (3 Sessions)

Now we move onto not having complete knowledge of the environment. This will go into estimating value functions and discovering optimal policies. Possible subtopics include:

Temporal-Difference Learning (4-5 Sessions)

Temporal-Difference learning is a combination of Monte Carlo ideas and Dynamic Programming. This can lead to methods learning directly from raw experience without knowledge of an environment. Subtopics will include: