This site supported by NSF CAREER grant DMS-05-48153. Last update: 14 September 2008.

In print

Exponential tails in the running time of perfect simulation

Abstract.Monte Carlo algorithms typically need to generate random variates from a probability distribution described by an unnormalized density of probability mass function. Perfect simulation algorithms generate random variates exactly from these distributions, but have a running time T that is itself an unbounded random variable. This paper shows that commonly used protocols for creating perfect simulation algorithms, such as Coupling From the Past can be used in such a fashion that the running time is unlikely to be very much larger than the expected running time.

M. Huber. Perfect simulation with exponential tails, Random Structures and Algorithms, vol. 33 no. 1 (August, 2008), pp. 29--43, Wiley InterScience

Image restoration

Abstract.The coupling method has been an enormously useful tool for studying the mixing time of Markov chains and as the basis of perfect sampling algorithms such as Coupling From the Past. Several methods such as Wilson's layered multishift coupling and Breyer and Roberts' catalytic coupling have been introduced to use the coupling approach on continuous state spaces. This work builds upon these approaches by using a simple coupling for small Metropolis moves together with catalytic coupling. As an application, the analysis of the Autonormal model in the Wasserstein metric of A. Gibbs is extended to an analysis in total variation distance. Moreover, a perfect sampling algorithms is constructed that provably runs in O(N ln N) time for fixed values of the parameters of the model.

M. Huber. Pefect simulation for image restoration, Stochastic Models, vol. 23 no. 3 (August, 2007), pp. 475--487, Taylor & Francis

Correlated evolution tests

Abstract. We present the ancestral distance test, a new test to detect correlated evolution between two binary traits. It is appropriate for use with phylogenies that lack resolved subclades, branch lengths, and/or comparative data. We define the ancestral distance as the time separating a randomly sampled taxon from its most recent common ancestor (MRCA) that has one or more descendants possessing an independent trait. The sampled taxon either has (target sample) or lacks (non-target sample) a dependent trait. Modeled as a Markov process, we show that the distribution of ancestral distances for the target sample is identical to the non-target sample when characters are uncorrelated, whereas ancestral distances are smaller on average for the target sample when characters are correlated. Simulations suggest that the ancestral distance can be estimated using the time, total branch length, taxonomic rank, or number of speciation events between a sampled taxon and the MRCA. These results are shown to be robust to deviations from Markov assumptions. We also provide a Monte Carlo technique to estimate p-values when full resolved phylogenies with branch lengths are available. Software is available from Hearn. We apply this Monte Carlo approach to a published data set.

D. Hearn and M. Huber. The Ancestral Distance test: A topdown approach to detect correlated evolution in large lineages with missing character data and incomplete phylogenies Systematic Biology, vol. 55 no. 5 (2006), pp. 803--817, Taylor & Francis

Hardy-Weinberg Proportions

Abstract. The Hardy-Weinberg law is among the most important principles in the study of biological systems (Crow, 1998, Genetics 119,473--476). Given its important, many tests have been devised to determine whether a finite population follows Hardy-Weinberg proportions. Because asymptotic tests can fail, Guo and Thompson (1992, Biometrics 42, 361--372) developed an exact test; unforutnately, the Monte Carlo method they proposed to evaluate their test has a running time that grows linearly in the size of the population N. Here, we propose a new algorithm whose expected running time is linear in the size of the table produced, and completely independent of N. In practice, this new algorithm can be considerably faster than the original method.

M. Huber., Y. Chen, I. Dinwoodie, A. Dobra, and M. Nicholas, Monte Carlo Algorithms for Hardy-Weinberg proportions, Biometrics, vol. 62 no. 1 (March, 2006), pp.49--63, Blackwell Synergy

Linear extensions of a poset

Abstract. In this paper we study the problem of sampling (exactly) uniformly from the set of linear extensions of an arbitrary partial order. Previous Markov chain techniques have yielded algorithms that generate approximately uniform samples. Here we create a bounding chain for one such Markov chain, and by using a non-Markovian coupling together with a modified form of coupling from the past, we build an algorithms for perfectly generating samples. The expected running time of the procedure is O(n^3 ln n), making the technique as fast as the mixing time of the Karzanov/Khachiyan chain upon which it is based.

M. L. Huber. Fast perfect sampling from linear extensions, Discrete Mathematics, vol. 306 (2006), pp. 420--428, Elsevier

Perfect matching in dense regular graphs

Abstract. We present the first algorithm for generating random variates exactly uniformly from the set of perfect matchings of a bipartite graph with a polynomial expected running time over a nontrivial set of graphs. Previous Markov chain results obtain approximately uniform variates for arbitrary graphs in polynomial time, but their running time is &Theta(n^10 (ln n)^2). Other algorithms (such as Kasteleyn's O(n^3) algorithm for planar graphs) concentrated on restricted versions of the problem. Our algorithm employs acceptance/rejection together with a new upper limit on the permanent of a form similar to Bregman's theorem. For graphs with 2n nodes, where the degree of every node is &gamman for a constant &gamma, the expected running time is O(n^{1.5 + .5/&gamma}). Under these conditions, Jerrum and Sinclair showed that a Markov chain of Broder can generate approximately uniform variates in &Theta(n^{4.5 + .5 / &gamma} ln n) time, making our algorithm significantly faster on this class of graphs. The problem of counting the number of perfect matchings in these types of graphs is #P complete. In addition, we give a 1 + &sigma approximation algorithm for finding the permanent of 0-1 matrices with identical row and column sumns that runs in &O(n^{1.5 + .5/&gamma} (1 / &sigma^2) ln (1 / &sigma)) time, where the probability that the output is within 1 + &sigma of the permanent is at least 1 - &delta.

M. Huber. Exact sampling from perfect matchings of dense regular bipartite graphs, Algorithmica, vol. 44 (2006), pp. 183--193.

Force distributions in rigid lattices

Abstract. We study the uniformly weighted ensemble of force balanced configurations on a triangular isotropic compressive stress, we find that the probability distribution for single-contract forces decays faster than exponentially. This super-exponential decay persists in lattice diluted to the rigidity percolation threshold. On the other hand, for anisotropic imposed stresses, a broader tail emerges in the force distribution, becoming a pure exponential in the limit of infinite size and infinitely strong anisotropy.

B. P. Tighe, J. E.S. Socolar, D.G. Schaeffer, W. G. Mitchener, and M. L. Huber. Force distributions in a triagonal lattice of rigid bars, Physical Review E, vol. 72 no. 031306 (2005), APS Journals [e031306]

Contigency tables

Abstract. We study the uniformly weighted ensemble of force balanced configurations on a triangular isotropic compressive stress, we find that the probability distribution for single-contract forces decays faster than exponentially. This super-exponential decay persists in lattice diluted to the rigidity percolation threshold. On the other hand, for anisotropic imposed stresses, a broader tail emerges in the force distribution, becoming a pure exponential in the limit of infinite size and infinitely strong anisotropy.

Yuguo Chen, Ian Dinwoodie, Adrian Dobra, and Mark Huber. Lattice Point, contingency tables, and sampling, Contemporary Mathematics, vol. 374 (2005), pp. 65--78, American Mathematical Society

Yuguo Chen, Ian Dinwoodie, Adrian Dobra, and Mark Huber. Lattice Point, contingency tables, and sampling, 6th SIAM Conference on Dynamical Systems (2003).

Perfect sampling using bounding chains

Abstract. Bounding chains are a technique that offers three benefits to Markov chain practitioners: a theoretical bound on the mixing time of the chain under restricted conditions, experimental bounds on the mixing time of the chain that are provably accurate, and construction of perfect sampling algorithms when used in conjunction with protocols such as coupling from the past. Perfect sampling algorithms generate variates exactly from the target distribution without the need to know the mixing time of a Markov chain at all.

M. Huber. Perfect sampling using bounding chains, The Annals of Applied Probability, vol 14, no. 2, (2004), pp.734--753, Institute of Mathematical Statistics.

Exact sampling for the Antivoter Model

Abstract. The antivoter model is a Markov chain on regular graphs which has a unique stationary distribution, but is not reversible. This makes the stationary distribution difficult to describe. Despite the fact that in general nothing is known about the stationary distribution other than it exists and is unique, we present a method for sampling exactly from this distirubiton. The method has running time O(n^3 r/c), where n is the number of nodes in the graph, c is the size of the minimum cut in the graph, and r is the degree of each node in the graph. We also show that the original chain has O(n^3 r/c) mixing time. For the antivoter model on the complete graph we derive a closed form solution for the stationary distribution. Moreover, we bound the total variation distance between the stationary distribution for the antivoter model on multipartite graph and the stationary distribution on the complete graph.

M. Huber and G. Reinert The Stationary Distribution in the Antivoter Model: Exact Sampling and Approximations, in Stein's Method: Expository Lectures and Applications (2004), pp. 79--94.

Bounding chain for Swendsen-Wang

Abstract. The greatest drawback to Monte Carlo Markov chain methods is lack of knowledge of the mixing time of the chain. The use of bounding chains solves this difficulty for some chains by giving theoretical and experimental upper bounds on the mixing time. Moreover, when used with methodologies such as coupling from the past, bounding chains allow the user to take samples drawn exactly from the stationary distribution without knowledge of the mixing time. Here we present a bounding chain for the Swendsen-Wang process. The Swendsen-Wang bounding chain allows us to efficiently obtain exact samples from the ferromagnetic Q-state Potts model for certain classes of graphs. Also, by analyzing this bounding chain we show that Swendsen-Wnag is rapidly mixing over a larger range of parameters than was known previously.

M. L. Huber A bounding chain for the Swendsen-Wang process, Random Structures and Algorithms, vol. 22 no 1 (2002), pp. 53--59

Randomness Recycler

Abstract. For many probability distributions of interest, it is very difficult to obtain samples efficiently. Often, Markov chains are employed to obtain approximately random samples from these distributions. The primary drawback to traditional Markov chain methods is that the mixing time of the chain is usually unknown, which makes it impossible to determine how close the output samples are to having the target distributions. Here we present a new protocol, the randomness recycler (RR), that overcomes this difficulty. Unlike classical Markov chain approaches, an RR-based algorithm creates samples drawn exactly from the desired distribution. Other perfect sampling methods use existing Markov chains, but RR does not use the traditional Markov chain at all. While by no means universally useful, RR does apply to a wide variety of problems. In restricted instances of certain problems, it gives the first expected linear time algorithms for generating samples. Here we apply RR to self-organizing lists, the Ising model, random independent sets, random colorings, and the random cluster model.

J. A. Fill and M. L. Huber The Randomness Recycler: A New Technique for Perfect Sampling, Proc. of the 41th Annual IEEE Symposium on the Foundations of Computer Science (2000)

A four page version of this work was presented at the 53rd Annual Meeting of the ISI:

J. A. Fill and M. L. Huber The Randomness Recycler Approach to Perfect Sampling, 53rd annual meeting of the ISI (2001)

Bounding chains for independent sets

Abstract. The problem of finding a random independent set arises in several contexts. In statistical physics, it is known as the hard core gad model. These samples may also be used to approximate the number of independent sets in a graph or to find large independent sets. One sampling approach is to run a Markov chain "for a long time." One such chain for this problem was introduced by Dyer and Greenhill, and here we develop a bounding chain for the Dyer-Greenhill chain. This bounding chain allows us to develop experimental upper bounds on the mixing time of the chain, and to create perfect samples using this chain. An implementation of this algorithm shows that on the two dimensional lattice it is faster than a competing chain, the single-site heat bath chain.

M. Huber A faster method for sampling independent sets, Proc. of the 41th Annual IEEE Symposium on the Foundations of Computer Science (2000)

An earlier version of this work appeared as a technical report:

Mark Huber. Exact Random Sampling from Independent Sets, Technical Report 1246, School of Operations Research and Industrial Engineering, Cornell University. (May 1999). (postscript) (dvi)


Bounding chains for CFTP, colorings, antiferromagnetic Potts model

This paper introduces the idea of a bounding chain for determining when complete coupling has occurred so that Coupling From the Past may be run on the chain to obtain exact samples. A bounding chain for the antiferromagnetic Potts model is given which is shown to couple completely in polynomial time given either a large number of colors or a sufficiently high temperature. The same analytical techniques also show the single-site update model of Propp and Wilson (ferromagnetic Ising) and Haggstrom and Nelander (antiferromagnetic Ising) run in polynomial time given high enough temperature.

Mark Huber. "Exact Sampling and Approximate Counting Techniques." Proceedings of the 30th ACM Symposium on the Theory of Computing, 1998. (postscript) (dvi) (pdf)


Bounding chain for Swendsen-Wang

This preprint uses the bounding chain approach to get exact samples from the Ising model using the Swendsen-Wang chain. We prove that over a range of parameters, the Swendsen-Wang chain completely couples in polynomial time, and give an algorithm for determining when complete coupling has occurred. This not only implies that the chain is rapidly mixing over this set of parameters, but also allows coupling from the past to be used to exactly sample from the chain.

Mark Huber. "Efficient Exact Sampling From the Ising Model Using Swendsen-Wang." preprint. (postscript) (dvi) (pdf)

A 2-page version of this result appeared in the 10th Annual Symposium on Discrete Algorithms. (postscript) (dvi)

The full version of this paper appeared in Random Structures and Algorithms:

Mark Huber, "A bounding chain for Swendsen-Wang." Random Structures & Algorithms, Vol. 22, No. 1, p. 53—59, 2002

An earlier 2-page summary of this work appears in SODA:


Ph. D. dissertation

This is my Ph.D. dissertation. It contains an extensive discussion of the bounding chain technique for perfect sampling, from discrete and continuous chains. Two major perfect sampling algorithms are discussed here, and their running times compared. Also, this work contains new chains for some problems, such as the Widom-Rowlinson model and the continuous hard core gas model.

Mark Huber. "Perfect Sampling Using Bounding Chains" Ph.D. Thesis. Cornell University. 1999. (postscript) (dvi)


Early Preprints

Bounding chains for swap moves (Independent Sets-continuous and discrete, Widom-Rowlinson)

This paper contains an application of the "swap" move to creating Markov chains that are easier to analyze. Chains for the continuous hard core gas model and the Widom-Rowlinson model are given. In addition, it is shown how the idea of bounding chains can be used to get perfect sampling algorithms (using CFTP and Fill's algorithm). This technique also applies to the swapping chain of Dyer and Greenhill for the discrete hard core gas model.

Mark Huber. "The swap move: a tool for building better Markov chains" preprint, (dvi) (postscript)


Fill, Machida, Murdoch, Rosenthal's interruptible exact sampling algorithm

Fill introduced an "interruptible" exact sampling algorithm for monotonic chains. Unlike CFTP, where the user is required to wait a random amount of time for the algorithm to finish, in Fill's algorithm the user could abort the algorithm in the middle of a run. Recently details of the extension of this technique to more general chains was given in a paper of Fill, Machida, Murdoch, and Rosenthal. In this paper we analyze the running time of this technique and give several examples of its application. The running time is shown to be similar to its chief competitor, CFTP.

Mark Huber. "Perfect sampling without a lifetime commitment " preprint (postscript) (dvi)

Back to Mark's Home Page

Support for the maintenance of this site comes from NSF CAREER grant DMS-05-48153. All downloads provided for use within the restrictions of the Fair Use Act, and all copyrights remain with their owners.