Partition models
Peter McCullagh
University of Chicago
For integer , a partition of the finite set is
(i) a collection of disjoint non-empty subsets, called blocks, whose union is ;
(ii) an equivalence relation on , i.e. a symmetric Boolean function that is also reflexive and transitive;
(iii) a block factor or symmetric binary matrix of order such that if belong to the same block.
These equivalent representations are not distinguished in the notation, so is a set of subsets, a Boolean function, a subset of , or a symmetric binary matrix, as the context demands. In practice, a partition is frequently written in an abbreviated form, such as for a partition of or for a partition of three objects .. In this notation, the partitions of are and , and the five partitions of are

The blocks are unordered and unlabelled, so there is no concept of a first block or a last block, and is the same partition as and .
A partition is a sub-partition of if each block of is a subset of some block of or, equivalently, if implies . This relationship is a partial order denoted by , which can be interpreted as if each partition is regarded as a subset of . The partition lattice is the set of partitions of with this partial order. To each pair of partitions there corresponds a greatest lower bound , which is the set intersection or Hadamard component-wise matrix product. The least upper bound is the least element of that is greater than or equal to both, the transitive completion of . The least element is the partition with singleton blocks, and the greatest element is the single-block partition denoted by . As matrices, is the identity, whereas is the matrix whose components are all one.
A permutation induces an action by composition such that the transformed partition is in the form of an equivalence relation. In matrix notation, , so the action by conjugation maintains symmetry by permuting both the rows and columns of in the same way. The block sizes are preserved and are maximally invariant under conjugation. In this way, the 15 partitions of may be grouped into five orbits or equivalence classes as follows:
Thus, for example, is the representative element for one orbit, which also includes and .
The symbol applied to a set denotes the number of its elements, so is the number of blocks, and is the size of block . As a matrix, is positive semi-definite of rank . A partition distribution is defined on the finite set , and the first few values of are 1, 2, 5, 15, 52, called Bell numbers. More generally, is the th moment of the unit Poisson distribution whose exponential generating function is
In the discussion and manipulation of explicit probability models on , it is helpful to use the ascending and descending factorial symbols
for integer . Note that for positive integers . By convention . It is not a coincidence that is the ordinary generating function for the Stirling numbers of the first kind , the number of permutations having exactly cycles.
The term partition model refers to a probability distribution, or family of probability distributions, on the set of partitions of . In some cases, the probability is concentrated on the the subset of partitions having or fewer blocks. A distribution on such that for every permutation is said to be finitely exchangeable. Equivalently, is exchangeable if depends only on the block sizes of .
Historically, the most important examples are Dirichlet-multinomial partitions generated for fixed in three steps as follows.
(i) First generate the random probability vector from the Dirichlet distribution with parameter .
(ii) Given , the sequence is independent and identically distributed, each component taking values in with probability . Each sequence in which the value occurs times has probability
where .
(iii) Now forget the labels and consider only the partition generated by the sequence , i.e. if . Since is an exchangeable sequence, the partition distribution is also exchangeable, but an explicit simple formula is available only for the uniform case , which is now assumed. The number of sequences generating the same partition is , and these have equal probability in the uniform case. Consequently, the induced partition has probability
 |
(1) |
called the uniform Dirichlet-multinomial partition distribution. The factor ensures that partitions having more than blocks have zero probability.
In the limit as , the uniform Dirichlet-multinomial partition becomes
 |
(2) |
This is the celebrated Ewens distribution, or Ewens sampling formula, which arises in population genetics as the partition generated by allele type in a population evolving according to the Fisher-Wright model by random mutation with no selective advantage of allele types (Ewens, 1972). The preceding derivation, a version of which can be found in chapter 3 of Kingman (1980), goes back to Watterson (1974). The Ewens partition is the same as the partition generated by a sequence drawn according to the Blackwell-MacQueen urn scheme (Blackwell and MacQueen, 1973).
Although the derivation makes sense only if is a positive integer, the distribution (1) is well defined for negative values . For a discussion of this and the connection with GEM distributions and Poisson-Dirichlet distributions, see Pitman (2006, section 3.2).
Deletion of element from the set , or deletion of the last row and column from the matrix representation , determines a map , a projection from the larger to the smaller lattice. Equivalently, is the restriction of to the subset . These deletion maps preserve partial order and make the sets into a projective system
A family in which is a probability distribution on is said to be mutually consistent, or Kolmogorov-consistent, if each is the marginal distribution obtained from under deletion of element from the set . In other words, for . Kolmogorov consistency guarantees the existence of a random partition of the natural numbers whose finite restrictions are distributed as . The partition is infinitely exchangeable if each is finitely exchangeable. Some authors, for example Kingman (1980), refer to as a partition structure.
An exchangeable partition process may be generated from an exchangeable sequence by the transformation if and zero otherwise. The Dirichlet-multinomial and the Ewens processes are generated in this way. Kingman's (1978) paintbox construction shows that every exchangeable partition process may be generated from an exchangeable sequence in this manner. Moreover, the list of relative block sizes in decreasing order has a limit, which may be random. In the case of the Ewens process, the relative size of the largest block, , has a limit distributed as beta with parameter , i.e. with density for . Given the size of the largest block, the relative size of the next largest block as a fraction of the remaining elements has the same distribution, and so on.
Let be an infinitely exchangeable partition, , which means that the restriction of to is distributed as . Let be a fixed partition in , and suppose that the event occurs. Then lies in the lattice interval , which means that is the concatenation (union) of partitions of the blocks . For each block , the restriction is distributed as , so it is natural to ask whether, and under what conditions, the blocks of are partitioned independently given . Conditional independence implies that
![$\displaystyle p_n(B {\,\vert\,}B[n] \le B^*) = \prod_{b\in B^*} p_{\char93 b} (B[b]) ,$ $\displaystyle p_n(B {\,\vert\,}B[n] \le B^*) = \prod_{b\in B^*} p_{\char93 b} (B[b]) ,$](http://statprob.com/cache/objects/230/l2h/img169.png) |
(3) |
which is a type of non-interference or lack-of-memory property not dissimilar to that of the exponential distribution on the real line. It is straightforward to check that the condition is satisfied by (2) but not by (1). Aldous (1996) shows that conditional independence uniquely characterizes the Ewens family. Mixtures of Ewens processes do not have this property.
Although Dirichlet partition processes are the most common in applied work, it is useful to know that many alternative partition models exist. Although some of these are easy to simulate, most do not have simple expressions for the distributions, but there are exceptions of the form
 |
(4) |
for certain polynomials of degree in . One such polynomial is
which depends on only through the block sizes. The functions and are multiplicative , and is the inverse of the product of block sizes.
For each , depends on only through the block sizes, so the distribution is exchangeable. Moreover, it can be shown that the family is mutually consistent in the Kolmogorov sense. However, the conditional independence property (3) is not satisfied.
The expected number of blocks grows slowly with , approximately for the Ewens process, and for the process shown above.
A partition process is a random partition of a countably infinite set , and the restriction of to is distributed as . The conditional distribution of given is determined by the probabilities assigned to those events in that are compatible with , i.e. the events for and . For the uniform Dirichlet-multinomial model (1), these are
![$\displaystyle \mathop{\rm pr}\nolimits (u_{n+1} \mapsto b {\,\vert\,}B[n]=B) = ... ...B \\ \lambda(1-\char93 B/k)/(n+\lambda) &\quad b=\emptyset. \end{array} \right.$ $\displaystyle \mathop{\rm pr}\nolimits (u_{n+1} \mapsto b {\,\vert\,}B[n]=B) = ... ...B \\ \lambda(1-\char93 B/k)/(n+\lambda) &\quad b=\emptyset. \end{array} \right.$](http://statprob.com/cache/objects/230/l2h/img199.png) |
(5) |
In the limit as , we obtain
![$\displaystyle \mathop{\rm pr}\nolimits (u_{n+1} \mapsto b{\,\vert\,}B[n]=B) = \... ...a) &\quad b\in B \\ \lambda/(n+\lambda) &\quad b=\emptyset, \end{array} \right.$ $\displaystyle \mathop{\rm pr}\nolimits (u_{n+1} \mapsto b{\,\vert\,}B[n]=B) = \... ...a) &\quad b\in B \\ \lambda/(n+\lambda) &\quad b=\emptyset, \end{array} \right.$](http://statprob.com/cache/objects/230/l2h/img201.png) |
(6) |
which is the conditional probability for the Ewens process.
To each partition process there corresponds a sequential description called the Chinese restaurant process, in which is the arrangement of the first customers at tables. The placement of the next customer is determined by the conditional distribution (Pitman, 1996). For the Ewens process, the customer chooses a new table with probability or one of the occupied tables with probability proportional to the number of occupants. This description, which is due to Dubins and Pitman, first appears in print in section 11 of Aldous (1983). It was used initially in connection with the Ewens and Dirichlet-multinomial models, but has subsequently been applied more broadly to general partition models.
Beginning with the uniform distribution on the set of permutations of , the exponential family with canonical parameter and canonical statistic equal to the number of cycles is
The Stirling number of the first kind, , is the number of permutations of having exactly cycles, for which is the ordinary generating function. The cycles of the permutation determine a partition of whose distribution is (2), and a partition of the integer whose distribution is (7). From the cumulant function
it follows that is the sum of independent Bernoulli variables with parameter , which is evident also from the Chinese restaurant representation. For large , the number of cycles is roughly Poisson with parameter , implying that is a consistent estimate as , but practically inconsistent.
A minor modification of the Chinese restaurant process also generates a random permutation by keeping track of the cyclic arrangement of customers at tables. After customers are seated, the next customer chooses a table with probability (5) or (6), as determined by the partition process. If the table is occupied, the new arrival sits to the left of one customer selected uniformly at random from the table occupants. The random permutation thus generated is from to the left neighbour .
The cycles of a permutation determine a partition , which is a mapping from permutations to partitions. Thus, any probability distribution on partitions can be lifted to a probability distribution on permutations. Provided that the partition process is consistent and exchangeable, the lifted distributions are exchangeable and mutually consistent under the projection on permutations in which element is deleted from the cycle representation (Aldous, 1983; Pitman, 2006, section 3.1). In this way, every infinitely exchangeable random partition also determines an infinitely exchangeable random permutation of the natural numbers. Since the group acts on itself by conjugation, distributional exchangeability in this context is not to be confused with uniformity on .
A partition of the set is a set of blocks, and the block sizes determine a partition of the integer . For example, the partition of the set is associated with the integer partition , one singleton and two doubletons. An integer partition is a list of multiplicities, also written as , such that . The number of blocks, usually called the number of parts of the integer partition, is the sum of the multiplicities .
Under the natural action of permutations on set partitions, each orbit is associated with a partition of the integer . The multiplicity vector contains all the information about block sizes, but there is a subtle transfer of emphasis from block sizes to the multiplicities of the parts.
By definition, an exchangeable distribution on set partitions is a function only of the block sizes, so , where is the integer partition corresponding to . Since there are
set partitions corresponding to a given integer partition , to each exchangeable distribution on set partitions there corresponds a marginal distribution
on integer partitions. For example, the Ewens distribution on integer partitions is
 |
(7) |
where the combinatorial factor is the size of the conjugacy class , i.e. the number of permutations whose cycle structure is .
Arratia, Barbour and Tavaré, (1992) noted that this version leads naturally to an alternative description of the Ewens distribution in which the multiplicities are independent Poisson random variables with mean . Then the conditional distribution is the Ewens integer-partition distribution with parameter (Kingman 1993, section 9.5). In fact, we may consider the more general two-parameter Poisson model with means for , in which case the pair is minimal sufficient for , and the conditional distribution given is (7) independent of . For a response vector in the form of an integer partition, for example Fisher (1943) or Efron and Thisted (1976), this representation leads naturally to a simple method of estimation and testing, using Poisson log-linear models with model formula and offset .
The problem of estimating the number of unseen species was first tackled in a paper by Fisher (1943), using an approach that appears to be entirely unrelated to partition processes. Specimens from species occur as a Poisson process with rate , the rates for distinct species being independent and identically distributed gamma random variables. The number of occurrences of species in an interval of length is a negative binomial random variable
 |
(8) |
In this setting, is a monotone function of the sampling time, whereas is a fixed number independent of . Specimen counts for distinct species are independent and identically distributed random variables with parameters and .
The probability that no specimens from species occur in the sample is , the same for every species. Most species are unlikely to be observed if either is small, i.e. the time interval is short, or is small.
Let be the number of species occurring times, so that is the unknown total number of species of which are observed. The approach followed by Fisher is to estimate the parameters by conditioning on the number of species observed and regarding the observed multiplicities for as multinomial with parameter vector proportional to the negative binomial frequencies (8). For Fisher's entomological examples, this approach pointed to , consistent with the Ewens distribution (7), and indicating that the data are consistent with the number of species being infinite. Fisher's approach using a model indexed by species is less direct for ecological purposes than a process indexed by specimens. Nonetheless, subsequent analyses by Good and Toulmin (1956), Holgate (1969) and Efron and Thisted (1976) showed how Fisher's model can be used to make predictions about the likely number of new species in a subsequent temporal extension of the original sample. This amounts to a version of the Chinese restaurant process.
At this point, it is worth clarifying the connection between Fisher's negative binomial formulation and the Ewens partition formulation. The relation between them is the same as the relation between binomial and negative binomial sampling schemes for a Bernoulli process: they are not equivalent, but they are complementary. The partition formulation is an exchangeable process indexed by specimens: it gives the distribution of species numbers in a sample consisting of a fixed number of specimens. Fisher's version is also an exchangeable process, in fact an iid process, but this process is indexed by species: it gives the distribution of the sample composition for a fixed set of species observed over a finite period. In either case, the conditional distribution given a sample containing species and specimens is the distribution induced from the uniform distribution on the set of permutations having cycles. For the sorts of ecological or literary applications considered by Good and Toulmin (1956) or Efron and Thisted (1976), the partition process indexed by specimens is much more direct than one indexed by species.
Fisher's finding that the multiplicities decay as , proportional to the frequencies in the log-series distribution, is a property of many processes describing population structure, either social structure or genetic structure. It occurs in Kendall's (1975) model for family sizes as measured by surname frequencies. One explanation for universality lies in the nature of the transition rates for Kendall's process, a discussion of which can be found in section 2.4 of Kelly (1978).
A family of distributions on permutations indexed by a parameter matrix , is said to be equivariant under the induced action of the symmetric group if for all , and for each group element . By definition, the parameter space is closed under conjugation: implies . The same definition applies to partition models. Unlike exchangeability, equivariance is not a property of a distribution, but a property of the family. In this setting, the family is indexed by for some fixed . There is no implication that the family is the same as the family of marginal distributions induced by deletion from .
Exponential family models play a major role in both theoretical and applied work, so it is natural to begin with such a family of distributions on permutations of the matrix-exponential type
where and is the trace of the ordinary matrix product. The normalizing constant is the -permanent
where is the component-wise exponential matrix. This family of distributions on permutations is equivariant.
The limit of the -permanent as gives the sum of cyclic products
giving an alternative expression for the -permanent
as a sum over partitions. The induced marginal distribution (11) on partitions
is of the product-partition type recommended by Hartigan (1990), and is also equivariant. Note that the matrix and its transpose determine the same distribution on partitions, but they do not usually determine the same distribution on permutations. The -permanent has a less obvious convolution property that helps to explain why this function might be expected to occur in partition models:
![$\displaystyle \sum_{b\subset[n]} \mathop{\rm per}\nolimits _\alpha(K[b]) \matho... ...olimits _{\alpha'}(K[\bar b]) = \mathop{\rm per}\nolimits _{\alpha+\alpha'}(K).$ $\displaystyle \sum_{b\subset[n]} \mathop{\rm per}\nolimits _\alpha(K[b]) \matho... ...olimits _{\alpha'}(K[\bar b]) = \mathop{\rm per}\nolimits _{\alpha+\alpha'}(K).$](http://statprob.com/cache/objects/230/l2h/img330.png) |
(9) |
The sum extends over all subsets of , and is the complement of in . A derivation can be found in section 2.4 of McCullagh and Møller (2006). If is a partition of , the symbol denotes the Hadamard component-wise matrix product for which
is the product over the blocks of of -permanents restricted to the blocks. Thus the function is of the product-partition type.
With as parameters, we may define a family of probability distributions on , i.e. partitions of having or fewer blocks, as follows:
 |
(10) |
The fact that (10) is a probability distribution on follows from the convolution property of permanents. The limit as
![$\displaystyle p_n(B) = \alpha^{\char93 B} \prod_{b\in B} \mathop{\rm cyp}\nolimits (K[b]) / \mathop{\rm per}\nolimits _\alpha(K),$ $\displaystyle p_n(B) = \alpha^{\char93 B} \prod_{b\in B} \mathop{\rm cyp}\nolimits (K[b]) / \mathop{\rm per}\nolimits _\alpha(K),$](http://statprob.com/cache/objects/230/l2h/img350.png) |
(11) |
is a product-partition model satisfying the conditional independence property (3).
Properties of the -permanent are discussed by Vere-Jones (1997) and by McCullagh and Møller (2006) in the context of point processes. For , the matrix whose elements are all one, the -permanent is, by definition, the generating function for the Stirling numbers of the first kind. Thus, is the ascending factorial function, and for this exchangeable case, the distributions (10) and (11) coincide with (1) and (2).
Partition models are used to construct for use in classification and cluster analysis. Cluster analysis means a partitioning of the sample units into non-overlapping blocks such that the -values in (feature values) are more similar within blocks than between blocks. It is important to remember that the goal of cluster analysis is not a partition of the feature space , but a partition of the finite set of units or specimens.
Exchangeable partition models are used to construct non-trivial, processes suitable for cluster analysis. See Richardson and Green (1997), Fraley and Raftery (2002) or Booth, Casella and Hobert (2008) for a discussion of computational techniques. The simplest of these models is the marginal Gauss-Ewens process in which the sample partition is to be inferred from the finite sequence . The conditional distribution on is the posterior distribution on clusterings or partitions of , and is the array of one-dimensional marginal distributions for pairs of units, i.e. is the posterior probability that units belong to the same block. The conditional distribution contains further information about triplets and -tuples of units, from which it is possible in principle to compute the posterior distribution for the number of clusters or blocks. In estimating the number of clusters, it is important to distinguish between the sample number , which is necessarily finite, and the population number , which could be infinite (McCullagh and Yang, 2008). The latter problem is essentially the same as estimating the number of unseen species given that the blocks are so well separated that determines .
The same Gauss-Ewens model may be used for density estimation, which refers to the conditional distribution of given the sample values. Usually, this is to be done for an exchangeable process in the absence of external covariate or relational information about the units. In the computer-science literature, cluster detection is also called unsupervised learning.
Exchangeable partition models are also used to provide a Bayesian solution to the multiple comparisons problem (Gopalan and Berry 1998). In this setting is the number of distinct treatments, and the key idea is to associate with each partition of a subspace equal to the span of the columns of . Thus, consists of vectors such that if . For a treatment factor having levels with values , the Gauss-Ewens prior distribution on puts positive mass on the subspaces for each . Likewise, the posterior distribution also puts positive probability on these subspaces, which enables us to compute in a coherent way the posterior probability or the marginal posterior probability .
This is a revised version of the article Random permutations and partition models from Lovric, Miodrag (2011), International Encyclopedia of Statistical Science, Heidelberg: Springer Science +Business Media, LLC. Support for this research was provided in part by NSF Grant DMS-0906592.
- 1
- Aldous, D.J. (1983) Exchangeability and Related Topics. In École d'Éte de Probabilités de Saint-Flour XIII Springer Lecture Notes in Mathematics vol 1117, 1-198.
- 2
- Aldous, D.J. (1996) Probability distributions on cladograms. In . IMA Vol. Appl. Math 76. Springer, New York, 1-18.
- 3
- Arratia, R., Barbour, A.D. and Tavaré, S. (1992) Poisson process approximations for the Ewens sampling formula. Advances in Applied Probability 2, 519-535.
- 4
- Booth, J.G., Casella, G. and Hobert, J.P. (2008) Clustering using objective functions and stochastic search. J. Roy. Statist. Soc. B 70, 119-139.
- 5
- Blackwell, D. and MacQueen, J. (1973) Ferguson distributions via Pólya urn schemes. Ann. Statist. 1, 353-355.
- 6
- Efron, B. and Thisted, R.A. (1976) Estimating the number of unknown species: How many words did Shakespeare know? Biometrika 63, 435-447.
- 7
- Ewens, W.J. (1972) The sampling theory of selectively neutral alleles. Theoretical Population Biology 3, 87-112.
- 8
- Fraley, C. and Raftery, A.E. (2002) Model-based clustering, discriminant analysis and density estimation. J. Amer. Statist. Assoc. 97, 611-631.
- 9
- Good, I.J. and Toulmin, G.H. (1956) The number of new species, and the increase in population coverage when a sample is increased. Biometrika 43, 45-63.
- 10
- Gopalan, R. and Berry, D.A. (1998) Bayesian multiple comparisons using Dirichlet process priors. J. Amer. Statist. Assoc. 93, 1130-1139.
- 11
- Hartigan, J.A. (1990) Partition models. Communications in Statistics: Theory and Methods 19, 2745-2756.
- 12
- Holgate, P. (1969) Species frequency distributions. Biometrika 65, 651-660.
- 13
- Kelly, F.P. (1978) Reversibility and Stochastic Networks. Wiley, Chichester.
- 14
- Kingman, J.F.C. (1975) Random discrete distributions (with discussion). J. Roy. Statist. Soc. B 37, 1-22.
- 15
- Kingman, J.F.C. (1977) The population structure associated with the Ewens sampling formula. Theoretical Population Biology 11, 274-283.
- 16
- Kingman, J.F.C. (1978) The representation of partition structures. J. Lond. Math. Soc. 18, 374-380.
- 17
- Kingman, J.F.C. (1980) Mathematics of Genetic Diversity. CBMS-NSF conference series in applied math, 34 SIAM, Philadelphia.
- 18
- McCullagh, P. and Møller, J. (2006) The permanental process. Adv. Appl. Prob. 38, 873-888.
- 19
- McCullagh, P. and Yang, J. (2008). How many clusters? Bayesian Analysis 3, 1-19.
- 20
- Pitman, J. (1996) Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory: Papers in Honor of David Blackwell, T.S. Ferguson et al editors. IMS Lecture Notes Monograph Series No. 30, 245-267.
- 21
- Pitman, J. (2006) Combinatorial . Springer-Verlag, Berlin.
- 22
- Richardson, S. and Green, P.J. (1997) On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Statist. Soc. B 59, 731-792.
- 23
- Vere-Jones, D. (1997) Alpha-permanents and their application to multivariate gamma, negative binomial and ordinary binomial distributions. New Zealand J. Math. 26 125-149.
- 24
- Watterson, G.A. (1974) The sampling theory of selectively neutral alleles. Adv. Appl. Prob. 6, 217-250.
|