Action spaces, particularly in combinatorial optimization problems, may grow unwieldy in size. This article discusses five strategies to handle them.
Handling large action spaces remains a fairly open problem in Reinforcement Learning. Researchers have made great strides in terms of handling large state spaces, with convolutional networks and transformers being some recent high-profile examples. However, there are three so-called curses of dimensionality: state, outcome, and action . As of yet, the latter is still rather understudied.
Still, there is a growing body of methods that attempt to handle large action spaces. This article presents five ways that handle the latter at scale, focusing in particular on the high-dimensional discrete action spaces that are often encountered in combinatorial optimization problems.
A quick refresher on the three curses of dimensionality is in order. Assuming we express the problem at hand as a system of Bellman equations, note there are three sets to evaluate — in practice in the form of nested loops — each of which may be prohibitively large:
At its core, Reinforcement Learning is a Monte Carlo simulation, sampling random transitions instead of enumerating all possible outcomes. By the Law of Large Numbers, the sample outcomes should ultimately facilitate convergence to the true value. This way, we transform the stochastic problem into a deterministic one:
The transformation allows us to handle large outcome spaces. To deal with large state spaces, we must be able to generalize to previously unseen states. Common approaches are feature extraction or aggregation, and this is where the bulk of research attention is focused.
As we can evaluate a single value corresponding to the state-action pair — rather than evaluating all outcomes corresponding to it — it is often not problematic to evaluate hundreds or thousands of actions. For many problems (e.g., chess, video games), this is sufficient, and there is no need to make further approximations w.r.t. the action…