Multiscale Analysis of Markov Decision Processes

 

Contents

I recently started working with Sridhar Mahadevan on Markov Decision Processes. The tools of harmonic analysis on graphs, especially the eigenfunctions of the Laplacian and Diffusion Wavelets seem to find very natural applications to the study of Markov Decision Processes. First of all they can efficiently encode functions on general state spaces for these processes, thus performing a dimensionality reduction task which is recognized as fundamental in order to be able to perform efficient computation of these processes. Secondly there many connections between Markov Decision Processes and harmonic analysis of Markov Chains, stemming from the connections between the Bellman equation and the Green's function or fundamental matrix of a Markov Chain. Nonlinearity of the optimization problem and stochasticity of most ingredients in a Markov Decision Processes on the other hand offer new challanges. I will add more to this page as we make progress in this program.



Publications

The most recent work on multiscale/hierarchical representations for MDP's is this paper: Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning, J. Bouvrie, M. Maggioni. A short version is available here: Efficient Solution of Markov Decision Problems with Multiscale Representations, J. Bouvrie and M. Maggioni, Proc. 50th Annual Allerton Conference on Communication, Control, and Computing, 2012.

In these works we introduce an automatic multiscale decomposition of MDP's, which leads to a hierarchical set of MDP's ``at different scales'' and corresponding to different portions of state space, separated by ``geometric'' and ``task-specific'' bottlenecks. The hierarchy of nested subproblems is such that each subproblem is itself a general type of MDP, that may be solved independently of the others (thereby giving trivial parallelization), and the solutions may then be stitched together through a certain type of ``boundary-conditions''. These boundary conditions are propagated top to bottom, by solving the coarsest problem(s) first, and propagating down the solutions (essentially, these boundary conditions for finer-scale problems), and ``filling-in'' these coarse solutions by solving the finer problems with the propagated boundary conditions. Besides giving fast parallelizable algorithms for the solution of large MDP's that are amenable to this decomposition, these decompositions also allow one to perform transfer learning of sub-problems (possibly at different scales), thereby enhancing the possibilities of transferability and power of transfer learning.

Here are electronic copies of two Technical Reports written with my collaborator Sridhar Mahadevan at the Computer Science Dept. at University of Mass. at Amherst about the application of the eigenfunctions of the Laplacian and Diffusion Wavelets to the solution of Markov Decision Processes:

Sridhar and myself organized a workshop on application of spectral methods to Markov Decision Processes, check out the page on our ICML '06 Workshop to be held during ICML 2006.


Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 2006-35, May 2005.
This paper introduces a novel paradigmfor solvingMarkov decision processes (MDPs), based on jointly learning representations and optimal policies. Proto-value functions are geometrically customized task-independent basis functions forming the building blocks of all value functions on a given state space graph or manifold. In this first of two papers, proto-value functions are constructed using the eigenfunctions of the (graph or manifold) Laplacian, which can be viewed as undertaking a Fourier analysis on the state space graph. The companion paper (Maggioni and Mahadevan, 2006) investigates building proto-value functions using a multiresolution manifold analysis framework called diffusion wavelets, which is an extension of classical wavelet representations to graphs and manifolds. Proto-value functions combine insights from spectral graph theory, harmonic analysis, and Riemannian manifolds. A novel variant of approximate policy iteration, called representation policy iteration, is described, which combines learning representations and approximately optimal policies. Two strategies for scaling proto-value functions to continuous or large discrete MDPs are described. For continuous domains, the Nystr¨om extension is used to interpolate Laplacian eigenfunctions to novel states. To handle large structured domains, a hierarchical framework is presented that compactly represents proto-value functions as tensor products of simpler proto-value functions on component subgraphs. A variety of experiments are reported, including perturbation analysis to evaluate parameter sensitivity, and detailed comparisons of proto-value functions with traditional parametric function approximators.


A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets , with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 2006-36.
We present a novel hierarchical framework for solving Markov decision processes (MDPs) using a multiscale method called diffusion wavelets. Diffusion wavelet bases significantly differ from the Laplacian eigenfunctions studied in the companion paper (Mahadevan and Maggioni, 2006): the basis functions have compact support, and are inherently multi-scale both spectrally and spatially, and capture localized geometric features of the state space, and of functions on it, at different granularities in space- frequency. Classes of (value) functions that can be compactly represented in diffusion wavelets include piecewise smooth functions. Diffusion wavelets also provide a novel approach to approximate powers of transition matrices. Policy evaluation is usually the expensive step in policy iteration, requiring O(|S|^3) time to directly solve the Bellman equation (where |S| is the number of states for discrete state spaces or sample size in continuous spaces). Diffusion wavelets compactly represent powers of transition matri- ces, yielding a direct policy evaluation method requiring only O(|S|) complexity in many cases, which is remarkable because the Green’s function (I - P^\pi)-1 is usually a full matrix requiring quadratic space just to store each entry. A range of illustrative exam- ples and experiments, from simple discrete MDPs to classic continuous benchmark tasks like inverted pendulum and mountain car, are used to evaluate the proposed framework.


Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , with S. Mahadevan. Tech Report Univ. Mass. Amherst, CMPSCI 05-38, May 2005.
Analysis of functions of manifolds and graphs is essential in many tasks, such as learning, classification, clustering, and reinforcement learning. The construction of efficient decompositions of functions has till now been quite problematic, and restricted to few choices, such as the eigenfunctions of the Laplacian on a manifold or graph, which has found interesting applications. In this paper we propose a novel paradigm for analysis on manifolds and graphs, based on the recently constructed diffusion wavelets. They allow a coherent and effective multiscale analysis of the space and of functions on the space, and are a promising new tool in classification and learning tasks, in reinforcement learning, among others. In this paper we overview the main motivations behind their introduction, their properties, and sketch a series of applications, among which multiscale document corpora analysis, structural nonlinear denoising of data sets and the tasks of value function approximation and policy evaluation in reinforcement learning, analyzed in two companion papers. The final form of this paper has appeared in the Proc. NIPS 2005, with Sridhar Mahadevan, Tech Report Univ. Mass. Amherst, CMPSCI 05-39, May 2005.


Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes , with Sridhar Mahadevan, accepted to ICML 2006.
Policy evaluation is a critical step in the approximate solution of large Markov decision processes (MDPs), typically requiring $O(|S|^3)$ to directly solve the Bellman system of $|S|$ linear equations (where $|S|$ is the state space size). In this paper we apply a recently introduced multiscale framework for analysis on graphs to design a faster algorithm for policy evaluation. For a fixed policy $\pi$, this framework efficiently constructs a multiscale decomposition of the random walk $P^\pi$ associated with the policy $\pi$. This enables efficiently computing medium and long term state distributions, approximation of value functions, and the {\it direct} computation of the potential operator $(I-\gamma P^\pi)^{-1}$ needed to solve Bellman's equation. We show that even a preliminary non-optimized version of the solver competes with highly optimized iterative techniques, requiring in many cases a complexity of $O(|S| \log^2 |S|)$.

Learning Representation and Control in Continuous Markov Decision Processes , with Sridhar Mahadevan, Kimberly Ferguson, Sarah Osentoski, accepted to AAAI 2006. This paper presents a novel framework for simultaneously learning representation and control in continuous Markov decision processes. Our approach builds on the framework of proto-value functions, in which the underlying representation or basis functions are automatically derived from a spectral analysis of the state space manifold. The proto-value functions correspond to the eigenfunctions of the graph Laplacian. We describe an approach to extend the eigenfunctions to novel states using the Nystr¨om extension. A least-squares policy iterationmethod is used to learn the control policy, where the underlying subspace for approximating the value function is spanned by the learned proto-value functions. A detailed set of experiments is presented using classic benchmark tasks, including the inverted pendulum and the mountain car, showing the sensitivity in performance to various parameters, and including comparisons with a parametric radial basis function method





Links

Here are some links to useful pages:



Links

2006 NSF Workshop and Outreach Tutorials on Approximate Dynamic Programming. Check out tutorials and workshop presentations.

People

Links to some people in the field (this is very very partial list and under continuous construction!):


Number of visitors since May 21st, 2004:


Thank you for visiting my home page