Diffusion wavelets generalize
classical wavelets, allowing for multiscale analysis on general
structures, such as manifolds, graphs and point clouds in Euclidean
space. They allow to perform signal processing tasks on functions on
these spaces. This has several applications. The ones we are
currently focusing on arise in the study of data sets which can be
modeled as graphs, and one is interested in learning functions on
such graphs. For example we can consider a graph whose vertices are
proteins, the edges connect interacting proteins, and the function on
the graph labels a functionality of the protein. Or each vertex could
be an image (e.g. a handwirtten digit), the edge connect very similar
images, and the function at each vertex is the value of digit
represented by that vertex.
In classical, one-dimensional
wavelet theory one applies dilations by powers of 2 and translations
by integers to a mother wavelet, and obtains orthonormal wavelet
bases. The classical construction has been of course generalized in
many ways, considering wide groups of transformations, spaces
different from the real line, such as higher-dimensional Euclidean
spaces, Lie groups, etc...Most of these constructions are based on
groups of geometrical transformations of the space, that are then
applied (as a "change of variable") to functions on that
space to obtain wavelets.
On general graphs, point clouds and
manifolds there may not be nice or rich groups of transformations. So
instead of assuming the existence of these "symmetries", we
directly use semigroups of operators acting on functions on the space
(and not on the space itself). We typically use "diffusion
operators", because of their nice properties. Let T be a
diffusion operator on a graph T (e.g. the heat operator). In many
cases, our graph will be a discretization of a manifold. The study of
the eigenfunctions and eigenvalues of T is known as Spectral Graph
Theory and can be viewed (for our purposes) a generalization of the
classical theory of Fourier series on the circle. The advantages of
wavelets and multi-scale techniques over classical Fourier series are
well known. So it is natural to attempt to generalize the wavelet
theory to the setting of diffusion operators on graphs. Following
Stein, we take the view that the dyadic powers of the operator T
establish a scale for performing multiresolution analysis on the
graph. In the paper, "Diffusion Wavelets," we introduce a
procedure for construction scaling functions and wavelets adapted to
these scales.
Index
This fairly long web
page is divided in the following sections:
In several
applications there is a need to organize structures in a
multi-resolution fashion, in order to process them, understand them,
compress them etc... Maybe the most classical example is given by
Fourier analysis, where different resolutions correspond to
different frequency bands in which a signal can be analyzed. However,
wavelets allow one to perform a multi-resolution analysis in a
somehow stronger and better organized way in the spatial domain. The
construction of wavelets in Euclidean spaces is by now in many ways
quite well understood, even if interesting open questions remain for
higher-dimensional wavelet constructions. Wavelets have found also
wide applications in numerical analysis, both as mathematical
foundation of Fast Multipole Methods, and by providing bases
for Galerkin methods. Already in this latter setting, though,
higher flexibility has been required than low-dimensional Euclidean
wavelets, since multi-resolution are needed on rather general domains
and manifolds. Both the Fast Multipole Methods and the Galerkin
wavelet methods are not very well adapted to the operator, and only
through some efforts are they adapted to the geometry of the
domain/manifold. Finally, we want to mention the setting of
Spectral Graph Theory, where one can naturally do Fourier
analysis through the Laplacian of a graph, and several multi-scale
constructions, for use in a variety of applications, are still quite
ad hoc.
In the paper
“Diffusion Wavelets” we propose a construction of
wavelets on discrete (or discretized continuous) graphs and spaces,
that are adapted to the “geometry” of a given diffusion
operator T, where the attribute “diffusion” is
intended in a rather general sense. The motivation for starting with
a given diffusion operator is that in many cases one is interested in
studying functions on the graph/space, and hence it seems natural to
start with a local operator generating local relationships between
functions. Powers of the operator propagate these relationships
further away till they become global.
If the
spectrum of T decays, large powers have low-rank, and hence
are compressible. For example we can think of the range of high
power of T as being essentially spanned by very smooth
functions with small gradient, or even band-limited functions. It
is natural to take advantage of this, and compress the ranks of
(dyadic) powers of the operator, thus obtaining a decreasing chain
of subspaces, which can be interpreted as scaling function
approximation spaces. The difference between these subspace can be
called wavelet subspaces. The construction of the basis elements
is non-trivial: we show one can build orthonormal bases of scaling
functions and wavelets, with good localization properties in about
order n(log n)^2, even if the constants are still big and
their improvement is of great practical significance and is being
investigated.
The algorithm proceeds by applying T^(2^j), expressed on
the basis of orthonormal scaling functions spanning V_j,
then orthogonalizing the resulting set of vectors, discarding the
ones not needed to span (numerically) the same subspace, and so
on. Hence the orthonormalization step encapsulate the downsampling
step.
Our construction works on a Riemannian manifold, with respect to, for
example, the Laplace-Beltrami diffusion on the manifold; on a
weighted graph, with respect, for example, to the natural diffusion
induced by the weights on graph.
The first
example is a standard diffusion on the circle: our set is the
circle sampled at 256 points, an initial orthonormal basis is
given by the set of 256 delta-functions at each points, and the
diffusion is given by the standard homogeneous diffusion on the
circle. In the picture, we plot several diffusion scaling
functions in various scaling function spaces. The finest scaling
functions are the given delta-functions. These diffuse in
'triangle functions' (linear splines): these are orthonormalized
into a (non-translation invariant) basis of 'triangle functions'
and linear combinations of 'triangle functions'. Those are
diffuse again twice etc.. The coarse scaling function spaces are
spanned by the first few top eigenfunctions of the diffusion
operator, which are simply trigonometric polynomials with small
frequency. The algorithm still tries to build well-localized
scaling functions out these trigonometric polynomials (see e.g.
V_9, V_10, V_11) when possible.
On the left we plot the compressed matrices representing
powers of the diffusion operator, on the right we plot the
entries of the same matrices which are above precision. The
initial powers get fuller because the spectrum of the diffusion
is slowly decaying. It is enough to consider, instead of the
given diffusion, a small power of it to avoid this partial
fill-up.
NON-HOMOGENOUS DIFFUSION ON THE
CIRCLE
We consider a circle as before, but
now the diffusion operator is not translation invariant or
homogeneous: the conductivity is non-constant and is depicted in
the figure in the top-left position. Think of the circle of
different materials, which is most conductive at the top and
least conductive (almost insulating material) at the bottom. In
the top right picture we represent some scaling functions at
level 15, so by construction/definition, these scaling functions
span the range of T^(2^16-1). Observe that the “scale”
of these scaling functions is far from uniform if we measure it
in a translation invariant way (for example with standard
wavelets on the circle, or with the diffusion wavelets of the
previous example). However this is exactly the scale at which the
corresponding power of T is operating: at the top of the
circle the diffusion acted fast , the numerical rank of the
operator restricted to that region is very small, and the scaling
functions are trivial there; on the bottom part of the circle,
this power of T is still far from trivial, since there the
diffusion is much slower, and scaling functions exhibit a more
complex behavior (they still contain local high-frequency
components).
In the second
row we show how one can compute an eigenfunction of the
compressed operator and extend it back to the whole space, with
good precision.
In the third
row we show the entries above precision of a power of the
operator (left); on the right we show a “diffusion scaling
function” embedding, which shows that points at the bottom
of the circle have a large “diffusion distance” (it
is hard/takes a long time to diffuse from one to the other) while
points at the top have a small diffusion distance. Diffusion
distance in original space is roughly Euclidean distance in this
“diffusion scaling function” embedding.
DIFFUSION ON A GRAPH OF 3 GAUSSIAN
CLOUDS
In this example we consider the
graph induced by points randomly distributed according to three
Guassian random variables centered at different points. A graph
is associated to these points, by putting edges between close-by
points, with weights which are an exponentially decays function
of the distances between the points.
The picture top-left shows the points, the picture top center
shows the image of the points under the Laplacian eigenfunction
map, the remaining figures show some diffusion scaling functions
and wavelets, hinting at how they could be used to localized the
analysis of this graph.
DIFFUSION ON A DUMBBELL-SHAPED
MANIFOLD
We consider a dumbbell-shaped
manifold, with diffusion given by (a discrete approximation to)
the Laplace-Beltrami operator.
The picture shows different scaling functions and wavelets at
different scales, on this manifold.
EXTENSION OUTSIDE THE SET
We also propose an algorithm for extending scaling
functions and wavelets from the data set. In the figure we
show the extension of some scaling functions from the set (left)
to many points randomly generated around the set.
Diffusion
Wavelet Packets can be constructed by further splitting the
wavelet subspace further, as in the classical case.
DIFFUSION WAVELET PACKETS ON THE
SPHERE
We build diffusion wavelet packets
on the sphere with respect to a discretized version of the
Laplace-Beltrami operator.
Notice that nothing in the algorithm is specific to
the sphere: the only input to the algorithm is the diffusion
operator, no knowledge of the geometry is passed to the algorithm
(but of course the operator itself does contain quite some
information about the geometry!).
COMPRESSION:
EXAMPLE 1
We
consider the characteristic function of a wedge on the sampled
sphere, and run best basis algorithm for compressing this
function using diffusion wavelet packets. We do obtain good
compression reates when compared to the delta-function basis and
Laplacian eigenfunction basis.
We compare the reconstruction by the top 50 best basis packets
and by the top 200 eigenfunctions.
The eigenfunctions of the Laplacian used here are computed on
an oversampled sphere and then restricted to the 2000 point
sphere, since the eigenfunctions computed on the 2000 are very
far from the true eigenfunctions of the Laplace operator on the
sphere (the L^2 error in the approximation goes like 1/\sqrt(n),
where n is the number of points!). The Laplacian eigenfunctions
on the 2000 point sphere actually perform much better and have
performance more or less comparable with the one of the packets
LOCAL DISCRIMINANT BASES WITH
DIFFUSION WAVELET PACKETS
We consider two classes of
functions on the sphere, and the problem of learning a good set
of features such that the projection on them allows for good
discrimination between the functions of the two classes.
Functions of class A are build as the superposition of three
ripply functions, with equi-oriented ripples of the same
frequency, centered around three points moving slightly in a
noisy way, and two Guassian bumps acting as decoys, which are
non-overlapping with the three ripply functions. Functions of
class B are similar but one (randomly chosen) ripply function has
ripples in a different direction than the other two.
Running CART on the original data has an error of 48%, running
it on the top 40 LDB features leads to an error of about 12.5%,
and running it onto the top discriminating eigenfunctions leads
to an error of about 18%.
As a second example of local discriminant basis, we consider
the following two classes of functions. We fix a direction v, and
slightly perturb it with random Gaussian noise. Around that
direction we create a ripply spherical cap, with one or two
oscillation depending on the class. We add then 5 non-overlapping
ripply functions, acting as decoy, randomly on the sphere.
CART run on the original data has an error of about 17.5%,
CART on the top 20 LDB features has an error of about 3.5%, and
CART on the first 300 eigenfunctions has an error of about 31%.