Multiscale Analysis of Markov Decision Processes
I recently started working with Sridhar Mahadevan on Markov Decision Processes. The tools of harmonic analysis on graphs, especially the eigenfunctions of the Laplacian and Diffusion Wavelets seem to find very natural applications to the study of Markov Decision Processes. First of all they can efficiently encode functions on general state spaces for these processes, thus performing a dimensionality reduction task which is recognized as fundamental in order to be able to perform efficient computation of these processes. Secondly there many connections between Markov Decision Processes and harmonic analysis of Markov Chains, stemming from the connections between the Bellman equation and the Green's function or fundamental matrix of a Markov Chain. Nonlinearity of the optimization problem and stochasticity of most ingredients in a Markov Decision Processes on the other hand offer new challanges. I will add more to this page as we make progress in this program.
The most recent work on multiscale/hierarchical representations for MDP's is this paper: Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning, J. Bouvrie, M. Maggioni. A short version is available here: Efficient Solution of Markov Decision Problems with Multiscale Representations, J. Bouvrie and M. Maggioni, Proc. 50th Annual Allerton Conference on Communication, Control, and Computing, 2012.
In these works we introduce an automatic multiscale decomposition of MDP's, which leads to a hierarchical set of MDP's ``at different scales'' and corresponding to different portions of state space, separated by ``geometric'' and ``task-specific'' bottlenecks. The hierarchy of nested subproblems is such that each subproblem is itself a general type of MDP, that may be solved independently of the others (thereby giving trivial parallelization), and the solutions may then be stitched together through a certain type of ``boundary-conditions''. These boundary conditions are propagated top to bottom, by solving the coarsest problem(s) first, and propagating down the solutions (essentially, these boundary conditions for finer-scale problems), and ``filling-in'' these coarse solutions by solving the finer problems with the propagated boundary conditions. Besides giving fast parallelizable algorithms for the solution of large MDP's that are amenable to this decomposition, these decompositions also allow one to perform transfer learning of sub-problems (possibly at different scales), thereby enhancing the possibilities of transferability and power of transfer learning.Here are electronic copies of two Technical Reports written with my collaborator Sridhar Mahadevan at the Computer Science Dept. at University of Mass. at Amherst about the application of the eigenfunctions of the Laplacian and Diffusion Wavelets to the solution of Markov Decision Processes:
Sridhar and myself organized a workshop on application of spectral methods to Markov Decision Processes, check out the page on our ICML '06 Workshop to be held during ICML 2006.
Proto-value Functions: A Laplacian Framework for Learning
Representation and Control in Markov Decision Processes
, with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 2006-35,
May 2005.
This paper introduces a novel paradigmfor solvingMarkov decision processes (MDPs),
based on jointly learning representations and optimal policies. Proto-value functions are
geometrically customized task-independent basis functions forming the building blocks
of all value functions on a given state space graph or manifold. In this first of two
papers, proto-value functions are constructed using the eigenfunctions of the (graph
or manifold) Laplacian, which can be viewed as undertaking a Fourier analysis on the
state space graph. The companion paper (Maggioni and Mahadevan, 2006) investigates
building proto-value functions using a multiresolution manifold analysis framework
called diffusion wavelets, which is an extension of classical wavelet representations
to graphs and manifolds. Proto-value functions combine insights from spectral graph
theory, harmonic analysis, and Riemannian manifolds. A novel variant of approximate
policy iteration, called representation policy iteration, is described, which combines
learning representations and approximately optimal policies. Two strategies for scaling
proto-value functions to continuous or large discrete MDPs are described. For continuous
domains, the Nystr¨om extension is used to interpolate Laplacian eigenfunctions
to novel states. To handle large structured domains, a hierarchical framework is presented
that compactly represents proto-value functions as tensor products of simpler
proto-value functions on component subgraphs. A variety of experiments are reported,
including perturbation analysis to evaluate parameter sensitivity, and detailed comparisons
of proto-value functions with traditional parametric function approximators.
A Multiscale Framework for Markov Decision Processes using
Diffusion Wavelets
, with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 2006-36.
We present a novel hierarchical framework for solving Markov decision processes
(MDPs) using a multiscale method called diffusion wavelets. Diffusion wavelet bases
significantly differ from the Laplacian eigenfunctions studied in the companion paper
(Mahadevan and Maggioni, 2006): the basis functions have compact support, and are
inherently multi-scale both spectrally and spatially, and capture localized geometric
features of the state space, and of functions on it, at different granularities in space-
frequency. Classes of (value) functions that can be compactly represented in diffusion
wavelets include piecewise smooth functions. Diffusion wavelets also provide a novel
approach to approximate powers of transition matrices. Policy evaluation is usually the
expensive step in policy iteration, requiring O(|S|^3) time to directly solve the Bellman
equation (where |S| is the number of states for discrete state spaces or sample size in
continuous spaces). Diffusion wavelets compactly represent powers of transition matri-
ces, yielding a direct policy evaluation method requiring only O(|S|) complexity in many
cases, which is remarkable because the Green’s function (I -
P^\pi)-1 is usually a full
matrix requiring quadratic space just to store each entry. A range of illustrative exam-
ples and experiments, from simple discrete MDPs to classic continuous benchmark tasks
like inverted pendulum and mountain car, are used to evaluate the proposed framework.
Value Function
Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
, with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 05-38,
May 2005.
Analysis of functions of manifolds and graphs is
essential in many tasks, such as learning, classification,
clustering, and reinforcement learning. The construction of efficient
decompositions of functions has till now been quite problematic, and
restricted to few choices, such as the eigenfunctions of the
Laplacian on a manifold or graph, which has found interesting
applications. In this paper we propose a novel paradigm for analysis
on manifolds and graphs, based on the recently constructed diffusion
wavelets. They allow a coherent and effective multiscale analysis of
the space and of functions on the space, and are a promising new tool
in classification and learning tasks, in reinforcement learning,
among others. In this paper we overview the main motivations behind
their introduction, their properties, and sketch a series of
applications, among which multiscale document corpora analysis,
structural nonlinear denoising of data sets and the tasks of value
function approximation and policy evaluation in reinforcement
learning, analyzed in two companion papers. The final form of this
paper has appeared in the Proc.
NIPS 2005,
with Sridhar Mahadevan, Tech Report Univ. Mass. Amherst, CMPSCI
05-39, May 2005.
Fast
Direct Policy Evaluation using Multiscale Analysis of Markov
Diffusion Processes
, with Sridhar Mahadevan, accepted to ICML 2006.
Policy
evaluation is a critical step in the approximate solution of large
Markov decision processes (MDPs), typically requiring $O(|S|^3)$ to
directly solve the Bellman system of $|S|$ linear equations (where
$|S|$ is the state space size). In this paper we apply a recently
introduced multiscale framework for analysis on graphs to design a
faster algorithm for policy evaluation. For a fixed policy $\pi$,
this framework efficiently constructs a multiscale decomposition of
the random walk $P^\pi$ associated with the policy $\pi$. This
enables efficiently computing medium and long term state
distributions, approximation of value functions, and the {\it direct}
computation of the potential operator $(I-\gamma P^\pi)^{-1}$ needed
to solve Bellman's equation. We show that even a preliminary
non-optimized version of the solver competes with highly optimized
iterative techniques, requiring in many cases a complexity of $O(|S|
\log^2 |S|)$.
Learning Representation and Control in Continuous Markov Decision Processes , with Sridhar Mahadevan, Kimberly Ferguson, Sarah Osentoski, accepted to AAAI 2006. This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nystr¨om extension. A least-squares policy iterationmethod is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method
Here are some links to useful pages:
A Markov Decision Process Toolbox for Matlab with a quick intro to MDPS, and various useful links to review papers and books
Another Markov Decision Process Toolbox for Matlab, by Iadine Chadès, Marie-Josée Cros, Frédérick Garcia, Régis Sabbadin, at Inria.
Reinforcement Learning FAQ and Reinforcement Learning Software and Stuff, by Rich Sutton.
Least-Squares Policy Iteration code by R. Parr and M.G. Lagoudakis, Duke University.
An Introduction to Markov Decision Processes, by B. Givan and R. Parr.
Online book on Markov Decision Processes, by Rich Sutton.
Links to some people in the field (this is very very partial list and under continuous construction!):
Ronald Parr and Michael G. Lagoudakis, CS dept. at Duke University.
Sridhar Mahadevan at the Computer Science Dept. at University of Mass. at Amherst .
Andrew W. Moore, previously at Computer Science Department of Carnegie Mellon University, now at Google
Silvia Ferrari at the Laboratory for Intelligent Systems and Controls at Duke University.
Number of visitors since May 21st, 2004: