Diffusion geometries refer to the large-scale geometry of a manifold or a graph representing a data set, which is determined by long-time heat flows on the manifold/graph/data set.
A multiscale analysis, that includes all scales and not just large scales, is possible with diffusion wavelets.
Code for computing proximity graphs, Laplacians, diffusion wavelets is available here.
This behavior can be
accurately described by the bottom eigenfunctions of the Laplacian
operator on the manifold or on the graph (or data set). Moreover,
these eigenfunctions can be used to define an embedding (sometimes
called eigenmap) of the manifold or graph into low-dimensional
Euclidean space, in such a way that Euclidean distances in the range
of the eigenmap correspond to “diffusion distances” on
the manifold or graph.
Techniques based on eigenvectors of
similarity matrices (which are related to the Laplacian have been
used successfully for some time in several problems in data analysis,
segmentation, clustering, etc...
These connections, and connection to differential-geometric structures of manifolds, and problems related to the stability, computability, and extendibility of these eigenfunctions has been explored extensively in Stephane Lafon's thesis (a link to his web-site, still in construction, soon!).
Further material, including talks and papers, are available on Stephane's web-page.
The link below points to a
talk I gave some time ago at l'Escorial, the first part of which is
about these Diffusion Geometries:
Diffusion
Geometries, Diffusion Wavelets and Harmonic Analysis of Large Data
Sets, talk given at the International Harmonic Analysis
Conference a l'Escorial, Madrid, May 2004.
Most of the pictures on this web page are courtesy of Stephane Lafon (now at Google) Check his home page for cool demos introducing the basics of diffusion geometries.
Geodesic distance ---> Diffusion distance
Diffusion distance is more stable, uses a “preponderance of evidence”

On the graph of “documents” with similarities there is a natural random walk: we get a Markov chain represented by a matrix P(x,y). If P is symmetric and positive semidefinite, we can define the diffusion distance by
![]()
![]()
![]()
![]()
Embeds the graph in Euclidean space, up to precision, via the eigenfunctions, mapping diffusion distance into Euclidean distance.
For a set of points in Euclidean space, sampled from a Riemannian manifold, one can build a discretized Laplace-Beltrami operator (associated to the canonical Brownian motion constrained on the manifold) and map the manifold with diffusion distance isometrically in Euclidean space.

From
left to right: points on a spiral in R^3 (but could be R^n for any n)
are provided in random order, and are sampled nonuniformly; the
embedding with the Laplacian as commonly used; the embedding with the
Laplacian normalized a la Beltrami; the density function on the
points. The color is used to identify the points across the pictures.
|
|
|
|---|---|
|
|
|
|
|
|

Example: document analysis
Given 1,000 documents, each of which is associated with 10,000 words, with a value indicating the relevance of each word in that document.
View this as a set of 1,000 in 10,000 dimensions, construct a graph with 1,000 vertices, each of which with few selected edges to very close-by documents.

Number of visitors since May 21st, 2004: