Overview

Diffusion wavelets generalize classical wavelets, allowing for multiscale analysis on general structures, such as manifolds, graphs and point clouds in Euclidean space. They allow to perform signal processing tasks on functions on these spaces. This has several applications. The ones we are currently focusing on arise in the study of data sets which can be modeled as graphs, and one is interested in learning functions on such graphs. For example we can consider a graph whose vertices are proteins, the edges connect interacting proteins, and the function on the graph labels a functionality of the protein. Or each vertex could be an image (e.g. a handwirtten digit), the edge connect very similar images, and the function at each vertex is the value of digit represented by that vertex. In classical, one-dimensional wavelet theory one applies dilations by powers of 2 and translations by integers to a mother wavelet, and obtains orthonormal wavelet bases. The classical construction has been of course generalized in many ways, considering wide groups of transformations, spaces different from the real line, such as higher-dimensional Euclideanspaces, Lie groups, etc...Most of these constructions are based on groups of geometrical transformations of the space, that are then applied (as a "change of variable") to functions on that space to obtain wavelets. On general graphs, point clouds and manifolds there may not be nice or rich groups of transformations. So instead of assuming the existence of these "symmetries" we directly use semigroups of operators acting on functions on the space (and not on the space itself). We typically use "diffusion operators", because of their nice properties. Let T be a diffusion operator on a graph T (e.g. the heat operator). In many cases, our graph will be a discretization of a manifold. The study of the eigenfunctions and eigenvalues of T is known as Spectral Graph Theory and can be viewed (for our purposes) a generalization of the classical theory of Fourier series on the circle. The advantages of wavelets and multi-scale techniques over classical Fourier series are well known. So it is natural to attempt to generalize the wavelet theory to the setting of diffusion operators on graphs. Following Stein, we take the view that the dyadic powers of the operator T establish a scale for performing multiresolution analysis on the graph. In the paper, "Diffusion Wavelets," we introduce a procedure for construction scaling functions and wavelets adapted to these scales.

Overview

Papers

Matlab code

Available here.

Sketch of the Construction

In several applications there is a need to organize structures in a multi-resolution fashion, in order to process them, understand them, compress them etc...Maybe the most classical example is given by Fourier analysis, where different resolutions correspond to different frequency bands in which a signal can be analyzed. However, wavelets allow one to perform a multi-resolution analysis in a somehow stronger and better organized way in the spatial domain. The construction of wavelets in Euclidean spaces is by now in many ways quite well understood, even if interesting open questions remain for higher-dimensional wavelet constructions. Wavelets have found also wide applications in numerical analysis, both as mathematical foundation of Fast Multipole Methods, and by providing bases for Galerkin methods. Already in this latter setting, though, higher flexibility has been required than low-dimensional Euclidean wavelets, since multi-resolution are needed on rather general domains and manifolds. Both the Fast Multipole Methods and the Galerkin wavelet methods are not very well adapted to the operator, and only through some efforts are they adapted to the geometry of the domain/manifold. Finally, we want to mention the setting of Spectral Graph Theory, where one can naturally do Fourier analysis through the Laplacian of a graph, and several multi-scale constructions, for use in a variety of applications, are still quite ad hoc. In the paper “Diffusion Wavelets” we propose a construction of wavelets on discrete (or discretized continuous) graphs and spaces, that are adapted to the “geometry” of a given diffusion operator T, where the attribute “diffusion” is intended in a rather general sense. The motivation for starting with a given diffusion operator is that in many cases one is interested in studying functions on the graph/space, and hence it seems natural to start with a local operator generating local relationships between functions. Powers of the operator propagate these relationships further away till they become global. If the spectrum of T decays, large powers have low-rank, and hence are compressible. For example we can think of the range of high power of T as being essentially spanned by very smooth functions with small gradient, or even band-limited functions. It is natural to take advantage of this, and compress the ranks of (dyadic) powers of the operator, thus obtaining a decreasing chain of subspaces, which can be interpreted as scaling function approximation spaces. The difference between these subspace can be called wavelet subspaces. The construction of the basis elements is non-trivial: we show one can build orthonormal bases of scaling functions and wavelets, with good localization properties in about order n(log n)^2, even if the constants are still big and their improvement is of great practical significance and is being investigated. The algorithm proceeds by applying T^(2^j), expressed on the basis of orthonormal scaling functions spanning V_j, then orthogonalizing the resulting set of vectors, discarding the ones not needed to span (numerically) the same subspace, and so on. Hence the orthonormalization step encapsulate the downsampling step. Our construction works on a Riemannian manifold, with respect to, for example, the Laplace-Beltrami diffusion on the manifold; on a weighted graph, with respect, for example, to the natural diffusion induced by the weights on graph.

Examples

Homogeneous diffusion on a circle

The first example is a standard diffusion on the circle: our set is the circle sampled at 256 points, an initial orthonormal basis is given by the set of 256 delta-functions at each points, and the diffusion is given by the standard homogeneous diffusion on the circle. In the picture, we plot several diffusion scaling functions in various scaling function spaces. The finest scaling functions are the given delta-functions. These diffuse in 'triangle functions' (linear splines): these are orthonormalized into a (non-translation invariant) basis of 'triangle functions' and linear combinations of 'triangle functions'. Those are diffuse again twice etc.. The coarse scaling function spaces are spanned by the first few top eigenfunctions of the diffusion operator, which are simply trigonometric polynomials with small frequency. The algorithm still tries to build well-localized scaling functions out these trigonometric polynomials (see e.g. V_9, V_10, V_11) when possible. On the left we plot the compressed matrices representing powers of the diffusion operator, on the right we plot the entries of the same matrices which are above precision. The initial powers get fuller because the spectrum of the diffusion is slowly decaying. It is enough to consider, instead of the given diffusion, a small power of it to avoid this partial fill-up.

Non-homogeneous diffusion on the circle

We consider a circle as before, but now the diffusion operator is not translation invariant or homogeneous: the conductivity is non-constant and is depicted in the figure in the top-left position. Think of the circle of different materials, which is most conductive at the top and least conductive (almost insulating material) at the bottom. In the top right picture we represent some scaling functions at level 15, so by construction/definition, these scaling functions span the range of T^(2^16-1). Observe that the “scale” of these scaling functions is far from uniform if we measure it in a translation invariant way (for example with standard wavelets on the circle, or with the diffusion wavelets of the previous example). However this is exactly the scale at which the corresponding power of T is operating: at the top of the circle the diffusion acted fast , the numerical rank of the operator restricted to that region is very small, and the scaling functions are trivial there; on the bottom part of the circle, this power of T is still far from trivial, since there the diffusion is much slower, and scaling functions exhibit a more complex behavior (they still contain local high-frequency components). In the second row we show how one can compute an eigenfunction of the compressed operator and extend it back to the whole space, with good precision. In the third row we show the entries above precision of a power of the operator (left); on the right we show a “diffusion scaling function” embedding, which shows that points at the bottom of the circle have a large “diffusion distance” (it is hard/takes a long time to diffuse from one to the other) while points at the top have a small diffusion distance. Diffusion distance in original space is roughly Euclidean distance in this “diffusion scaling function” embedding.

Diffusion on a graph of 3 Gaussian clouds

In this example we consider the graph induced by points randomly distributed according to three Guassian random variables centered at different points. A graph is associated to these points, by putting edges between close-by points, with weights which are an exponentially decays function of the distances between the points. The picture top-left shows the points, the picture top center shows the image of the points under the Laplacian eigenfunction map, the remaining figures show some diffusion scaling functions and wavelets, hinting at how they could be used to localized the analysis of this graph.

Diffusion a dumbbell-shaped manifold

We consider a dumbbell-shaped manifold, with diffusion given by (a discrete approximation to) the Laplace-Beltrami operator. The picture shows different scaling functions and wavelets at different scales, on this manifold.

Extension outside the set

We also propose an algorithm for extending scaling functions and wavelets from the data set. In the figure we show the extension of some scaling functions from the set (left) to many points randomly generated around the set.

Diffusion Wavelet Packets

Diffusion Wavelet Packets can be constructed by further splitting the wavelet subspace further, as in the classical case.

Local Discriminant Bases with Diffusion Wavelet Packets

We consider two classes of functions on the sphere, and the problem of learning a good set of features such that the projection on them allows for good discrimination between the functions of the two classes. Functions of class A are build as the superposition of three ripply functions, with equi-oriented ripples of the same frequency, centered around three points moving slightly in a noisy way, and two Guassian bumps acting as decoys, which are non-overlapping with the three ripply functions. Functions of class B are similar but one (randomly chosen) ripply function has ripples in a different direction than the other two. Running CART on the original data has an error of 48%, running it on the top 40 LDB features leads to an error of about 12.5%, and running it onto the top discriminating eigenfunctions leads to an error of about 18%. As a second example of local discriminant basis, we consider the following two classes of functions. We fix a direction v, and slightly perturb it with random Gaussian noise. Around that direction we create a ripply spherical cap, with one or two oscillation depending on the class. We add then 5 non-overlapping ripply functions, acting as decoy, randomly on the sphere. CART run on the original data has an error of about 17.5%, CART on the top 20 LDB features has an error of about 3.5%, and CART on the first 300 eigenfunctions has an error of about 31%.