A guided tour of the website

Sequential decision problems are a fantastically rich area, spanning virtually any problem setting where you are “trying to do better.” The SDA website was created to help people navigate this rich area, starting from basic questions such as “What is a decision” all the way through creating computer models to automate these processes.

This webpage represents a guided tour of the SDA website, helping newcomers to progress from basic starting material up through designing policies suited to specific situations. As you work down through the page, there are numerous links to various pages on the website. A useful strategy is to start by just reading this page, without clicking on the links. However, you will see that we are leaving a lot of information on the webpages within the website. If you click on the link to visit the page, be sure to return to this page to continue the tour.

We are going to conduct our tour using a series of passes:

The first pass – Decisions and framing the problem

We start our first pass by visiting two topics under the menu Sequential decision problems. We begin with a brief overview of sequential decision problems, where we introduce some basic notation that we will return to from time to time:

Overview of sequential decision problems

It helps to have a problem setting in mind (more than one is even better). It is best to pick an application you are familiar with, ideally one that you have chosen or are experiencing directly. The next webpage should help with identifying some examples of sequential decision problems:

Application settings

These applications should help illustrate the diversity of sequential decision problems. Despite the breadth of applications in this list, they are only scratching the surface.

We now investigate Modeling where we start with a brief introduction to modeling sequential decision problems:

Modeling overview

The next stop is a big one. It is called Decisions, decisions, and it starts by highlighting the importance of decisions, and the surprising question of defining a decision, which appears to have been overlooked since the time of the Greek philosophers:

What is a decision

Next visit the second section which lists different types of decision settings. This section lists nine types of decision settings. The point of this list is to help with the process of identifying decisions:

Types of decision settings

We are now going to make our way to the important task of framing decision problems. Our approach to framing lays the foundation for thinking about any sequential decision problem. It uses no math, but it asks the questions that would be needed if we choose to use a math model, which is a prerequisite to using the computer. This step consists of identifying performance metrics, types of decisions, and sources of uncertainty:

Framing decision problems

Framing represents a major plateau, since it is an important (and overlooked) skill set even by itself, despite requiring no mathematics. There are countless models that suffer from improper framing, a byproduct of forcing applications into mathematical methods that are popular with academics.

Framing is an important skill for people working in an area of application. Each of the dimensions of framing can be immediately translated to a mathematical model (as we do in the second pass), but only if a model is needed. I claim that while framing is a critical first step to creating a computer model, it also helps people understand a problem even if they never develop a model.

The second pass – The universal modeling framework

At this point you have learned from the first pass:

In the second pass, we are going to use the results of our problem framing to start building a formal model (yes, this means some math) of a sequential decision problem using what we call the universal modeling framework. Here we learn that any sequential decision problem can be modeled mathematically using five elements:

The universal modeling framework

What might be the most important characteristic of the UMF is what it does not include, which is any method for making a decision, known as a policy. The UMF introduces a fancy function \(X^{\pi}(S_t \mid \theta)\) where \(\pi = (f,\theta)\) carries the information that determines the structure of the policy (in “\(f\)”) while “\(\theta\)” carries any tunable parameters. We call this approach:

Model first, then solve

which means we model the problem and then design the policy (which “solves” the model).

Before we get to the task of designing policies, there are two important topics we need to cover on the modeling side:

  1. State variables – This is a term that has been widely used in the modeling of sequential decision problems in fields with names like “optimal control” and “Markov decision processes (MDP),” but without providing a proper definition. “State variable” is simply another name for information, and we describe precisely what information is included. The section on state variables is at:

    State variables

  2. Modeling uncertainty – Decisions are one form of information (the “information we control”) that changes the state variable. The second is the exogenous information that comes from an external source which means we do not know (exactly) what will be coming.

    Modeling uncertainty

    As with framing, modeling uncertainty starts in English by describing the most important sources of uncertainty. We guide this process by starting with 12 ways that uncertainty can impact a model. Mathematical models of uncertainty can be quite sophisticated (if they are needed), but the first step is identifying the sources in English.

At this point, we have laid the foundation to evaluate a policy by simulating it, whether this is done in a computer simulator, or by watching how it works in the real world.

The third pass – Designing policies

Now that we understand how to evaluate a policy, we need to design one. The research literature is full of methods to make decisions over time as new information is arriving. The problem is that the literature is heavily biased toward sophisticated techniques (such as those based on Bellman’s equation) that are rarely used in practice.

It is important to always keep in mind: Sequential decision problems are everywhere! We are constantly making decisions, and these are almost always decisions being made over time, and we have been making decisions over the entire history of humanity using some method.

We are now going to take a tour of the four classes of policies (more accurately, meta-classes) which include any method for making decisions, including whatever you are already using to make decisions now. In fact, the case can be made that people already use all four classes, albeit in an ad hoc way, without any formal training. What is hard is replicating these policies on a computer.

The third pass will consist entirely of a tour of the menu item Policies, and we are going to focus on the first entry Overview which provides its own tour through the four classes of policies. So, start by clicking on

Policies

This section consists of six elements that provide an overview of the different classes of methods for making decisions.

People who are interested in solving real problems always prefer the simplest policies, as they should. But keep in mind:

The price of simplicity is tunable parameters… and tuning is hard!

Once you have absorbed the material on the initial Policies page, it helps to choose one (or a few) applications and then start the process of learning more about each of the four classes by clicking on the menu items for each of the four classes under Policies. I recommend touring the policies in the following order:

  1. Policy function approximations (PFAs) – These are the simplest policies, and the only class which does not involve an imbedded optimization problem.
  2. Cost function approximations (CFAs) – CFAs involve an imbedded optimization, which is typically a simple sort where different choices are evaluated according to some metric. However, a CFA can be a deterministic linear, nonlinear or integer program, as might be used to assign machines to tasks.
  3. Deterministic direct lookahead approximations (Deterministic DLAs) – Deterministic DLAs are widely used to plan into the future, where we use point estimates of any uncertain quantity (Google Maps is an easy example for planning a path to the destination).

These three policies are likely to handle the vast majority of decision problems. However, if making a decision now requires planning for uncertainty in the future, keep reading:

  1. Stochastic direct lookahead approximations (Stochastic DLAs) – These are used when we really have to acknowledge the presence of uncertainty as we plan into the future. People who are familiar with decision trees will quickly recognize this class of policy, but decision trees represent just one of many approaches that might be used. The key is to remember that we are always using some approximation of our underlying stochastic problem.

And then there is the class of policies that is based on using Bellman’s equation:

  1. Policies based on value function approximations (VFAs) – This is where we approximate the downstream value of a decision using a device known as a value function approximation. The academic literature is strongly biased toward the use of Bellman’s equation, but as I like to say: “Bellman’s equation is a powerful strategy that works for a very small number of problems.” Note that I cover PFAs, CFAs and DLAs (deterministic and stochastic) using one chapter each in Reinforcement Learning and Stochastic Optimization, but I need five chapters to cover VFAs (Chapters 14 through 18).

A major motivation for using Bellman’s equation is that it dramatically simplifies making a decision now, since it avoids needing to explicitly plan into the future using a direct lookahead model. But estimating the downstream value of being in a state is hard.

The fourth pass – The “Ask Professor Powell” chatbot

At this point, you may start to have questions that are not being properly addressed by the information on the webpage. An easier path than purchasing the books and trying to find your answers there (which is unlikely), I recommend the “Ask Professor Powell” chatbot at

Ask Professor Powell

You may pose questions (using any of the major languages) that are related to the material in the webpage. The chatbot is trained on all of my books (written 2022 onward), over 1000 pages of LinkedIn posts on sequential decision problems, and all the material in this website. This material reflects over 40 years of research and the experience of applying these methods to countless applications (see, for example, the problems shown at Motivating applications).

It is not trained on the broader literature since the research literature is too biased toward techniques (such as those based on Bellman’s equation) that are favored by academics but rarely used in practice. As of this writing, I have seen it give responses that reflect a good understanding of the principles in the thousands of pages of training material.

While most questions are fairly short, I have seen users pose fairly complex questions that have produced multipage responses. At the bottom (left) of each response is a “Copy” button that copies your question and the response in a way that can be pasted into a nicely formatted Word document, or a text file.

You may ask questions, and then ask follow-up questions, and the chatbot will be able to continue a thread.

The chatbot can write out mathematics and can even write code. However, it is very unlikely that it would be able to solve a real problem given the subtleties of modeling. Please do not use any of the outputs of the chatbot in a production setting without understanding the concepts and verifying the recommendations.

Screenshot of the Ask Professor Powell chatbot page, showing suggested starter questions and a conversation in which the assistant frames an inventory replenishment problem using the five-element model

The fifth pass – Stochastic search

Stochastic search is easily the most common form of sequential decision problem. It arises in two forms:

  1. Derivative-based stochastic search – We may have a set of continuous parameters that affect the performance of our policy, or any other continuous parameter (diameter of a wafer, price of a product, dosage of a drug, temperature of a manufacturing process). At the moment I do not have a webpage on this topic, but you can download Chapter 5, Derivative-based stochastic search, in Reinforcement Learning and Stochastic Optimization. Pay special attention to the algorithm called SPSA (see section 5.4.4) which is a powerful strategy for doing derivative-based stochastic search when the parameters form a vector, and where you do not have access to an explicit formula for the derivatives (gradient).
  2. Derivative-free stochastic search – These problems arise when there is a set of discrete choices and we want to make the choice that is “best” according to some metric, but we are not sure about the performance of the choices. The problem is illustrated below. I have covered this problem under the label of “optimal learning” but it has been studied under names such as “multiarmed bandit problem,” “Bayesian optimization,” “ranking and selection,” “design of experiments” or “intelligent trial-and-error.”

An illustration of derivative-free stochastic search: a list of candidate choices (type of drug, supplier, trading policy, product design, and others) beside a bar chart of seven choices A–G, each bar showing an estimated goal/metric value with a confidence interval

Derivative-free stochastic search is a sequential decision problem where the state variable consists purely of the beliefs about the performance of each choice. Chapter 7 of Reinforcement Learning and Stochastic Optimization deals with this topic in depth, illustrating in the process all four classes of policies, although only two are widely used. We describe both on the SDA website.

The SDA website covers this topic under the heading of optimal learning which can be accessed at:

Optimal Learning

The page has a number of headings. Below is a brief summary of each, and you can click on any of the headings and go to the corresponding webpage.