In several applications there is a need to organize structures in a multi-resolution fashion, in order to process them, understand them, compress them etc...Maybe the most classical example is given by
Fourier analysis, where different resolutions correspond to different frequency bands in which a signal can be analyzed. However,
wavelets allow one to perform a multi-resolution analysis in a somehow stronger and better organized way in the spatial domain. The construction of wavelets in Euclidean spaces is by now in many ways quite well understood, even if interesting open questions remain for higher-dimensional wavelet constructions. Wavelets have found also wide applications in numerical analysis, both as mathematical foundation of
Fast Multipole Methods, and by providing bases for
Galerkin methods. Already in this latter setting, though, higher flexibility has been required than low-dimensional Euclidean wavelets, since multi-resolution are needed on rather general domains and manifolds. Both the Fast Multipole Methods and the Galerkin wavelet methods are not very well adapted to the operator, and only through some efforts are they adapted to the geometry of the domain/manifold. Finally, we want to mention the setting of
Spectral Graph Theory, where one can naturally do Fourier analysis through the Laplacian of a graph, and several multi-scale constructions, for use in a variety of applications, are still quite ad hoc.
In the paper “
Diffusion Wavelets” we propose a construction of wavelets on discrete (or discretized continuous) graphs and spaces, that are adapted to the “geometry” of a given
diffusion operator T, where the attribute “diffusion” is intended in a rather general sense. The motivation for starting with a given diffusion operator is that in many cases one is interested in studying functions on the graph/space, and hence it seems natural to start with a local operator generating local relationships between functions. Powers of the operator propagate these relationships further away till they become global.
If the spectrum of
T decays, large powers have low-rank, and hence are compressible. For example we can think of the range of high power of
T as being essentially spanned by very smooth functions with small gradient, or even band-limited functions. It is natural to take advantage of this, and compress the ranks of (dyadic) powers of the operator, thus obtaining a decreasing chain of subspaces, which can be interpreted as scaling function approximation spaces. The difference between these subspace can be called wavelet subspaces. The construction of the basis elements is non-trivial: we show one can build orthonormal bases of scaling functions and wavelets, with good localization properties in about order
n(log n)^2, even if the constants are still big and their improvement is of great practical significance and is being investigated.

The algorithm proceeds by applying
T^(2^j), expressed on the basis of orthonormal scaling functions spanning
V_j, then orthogonalizing the resulting set of vectors, discarding the ones not needed to span (numerically) the same subspace, and so on. Hence the orthonormalization step encapsulate the downsampling step.

Our construction works on a Riemannian manifold, with respect to, for example, the Laplace-Beltrami diffusion on the manifold; on a weighted graph, with respect, for example, to the natural diffusion induced by the weights on graph.
Examples
Homogeneous diffusion on a circle
The first example is a standard diffusion on the circle: our set is the circle sampled at 256 points, an initial orthonormal basis is given by the set of 256 delta-functions at each points, and the diffusion is given by the standard homogeneous diffusion on the circle. In the picture, we plot several diffusion scaling functions in various scaling function spaces. The finest scaling functions are the given delta-functions. These diffuse in 'triangle functions' (linear splines): these are orthonormalized into a (non-translation invariant) basis of 'triangle functions' and linear combinations of 'triangle functions'. Those are diffuse again twice etc.. The coarse scaling function spaces are spanned by the first few top eigenfunctions of the diffusion operator, which are simply trigonometric polynomials with small frequency. The algorithm still tries to build well-localized scaling functions out these trigonometric polynomials (see e.g. V_9, V_10, V_11) when possible.

On the left we plot the compressed matrices representing powers of the diffusion operator, on the right we plot the entries of the same matrices which are above precision. The initial powers get fuller because the spectrum of the diffusion is slowly decaying. It is enough to consider, instead of the given diffusion, a small power of it to avoid this partial fill-up.
Non-homogeneous diffusion on the circle
We consider a circle as before, but now the diffusion operator is not translation invariant or homogeneous: the conductivity is non-constant and is depicted in the figure in the top-left position. Think of the circle of different materials, which is most conductive at the top and least conductive (almost insulating material) at the bottom. In the top right picture we represent some scaling functions at level 15, so by construction/definition, these scaling functions span the range of
T^(2^16-1). Observe that the “scale” of these scaling functions is far from uniform if we measure it in a translation invariant way (for example with standard wavelets on the circle, or with the diffusion wavelets of the previous example). However this is exactly the scale at which the corresponding power of
T is operating: at the top of the circle the diffusion acted fast , the numerical rank of the operator restricted to that region is very small, and the scaling functions are trivial there; on the bottom part of the circle, this power of
T is still far from trivial, since there the diffusion is much slower, and scaling functions exhibit a more complex behavior (they still contain local high-frequency components).
In the second row we show how one can compute an eigenfunction of the compressed operator and extend it back to the whole space, with good precision.
In the third row we show the entries above precision of a power of the operator (left); on the right we show a “diffusion scaling function” embedding, which shows that points at the bottom of the circle have a large “diffusion distance” (it is hard/takes a long time to diffuse from one to the other) while points at the top have a small diffusion distance. Diffusion distance in original space is roughly Euclidean distance in this “diffusion scaling function” embedding.
Diffusion on a graph of 3 Gaussian clouds
In this example we consider the graph induced by points randomly distributed according to three Guassian random variables centered at different points. A graph is associated to these points, by putting edges between close-by points, with weights which are an exponentially decays function of the distances between the points.
The picture top-left shows the points, the picture top center shows the image of the points under the Laplacian eigenfunction map, the remaining figures show some diffusion scaling functions and wavelets, hinting at how they could be used to localized the analysis of this graph.
Diffusion a dumbbell-shaped manifold
We consider a dumbbell-shaped manifold, with diffusion given by (a discrete approximation to) the Laplace-Beltrami operator.
The picture shows different scaling functions and wavelets at different scales, on this manifold.
We also propose an algorithm for extending scaling functions and wavelets from the data set. In the figure we show the extension of some scaling functions from the set (left) to many points randomly generated around the set.
We consider two classes of functions on the sphere, and the problem of learning a good set of features such that the projection on them allows for good discrimination between the functions of the two classes.
Functions of class A are build as the superposition of three ripply functions, with equi-oriented ripples of the same frequency, centered around three points moving slightly in a noisy way, and two Guassian bumps acting as decoys, which are non-overlapping with the three ripply functions. Functions of class B are similar but one (randomly chosen) ripply function has ripples in a different direction than the other two.
Running CART on the original data has an error of 48%, running it on the top 40 LDB features leads to an error of about 12.5%, and running it onto the top discriminating eigenfunctions leads to an error of about 18%.

As a second example of local discriminant basis, we consider the following two classes of functions. We fix a direction v, and slightly perturb it with random Gaussian noise. Around that direction we create a ripply spherical cap, with one or two oscillation depending on the class. We add then 5 non-overlapping ripply functions, acting as decoy, randomly on the sphere.
CART run on the original data has an error of about 17.5%, CART on the top 20 LDB features has an error of about 3.5%, and CART on the first 300 eigenfunctions has an error of about 31%.