阶乘分布（Factorial Distribution）和变分推断中的均场（Mean Field）方法：中英双语

中文版

什么是阶乘分布（Factorial Distribution）？

在统计学中，阶乘分布（Factorial Distribution） 是一种离散概率分布，用于描述某些特殊类型的随机过程，尤其是在计数问题中。这种分布的名称来源于其概率质量函数（PMF）包含阶乘项，因此有时也被称为“Poisson阶乘分布”或“离散负指数分布”。

阶乘分布通常用于建模那些涉及到多次事件发生的概率情况，特别是在某些特定的条件下，每个事件的发生概率可能不独立，或者存在某种形式的依赖结构。

阶乘分布的定义

假设 ( $X$ ) 服从阶乘分布，其概率质量函数（PMF）定义为：
$\frac{\lambda^x}{x!} e^{-\lambda}, \quad x = 0, 1, 2, \dots$
其中：

( $\lambda$ ) 是一个正数参数，通常代表事件的平均发生率或强度。
( $x$ ) 是随机变量 ( $X$ ) 的取值，表示发生了 ( $x$ ) 次事件。

这种分布看起来类似于泊松分布，但与泊松分布的不同之处在于，泊松分布的概率质量函数没有阶乘项。

什么是均场（Mean Field）方法？

均场（Mean Field）方法 是一种在变分推断中常用的近似推理方法，目的是通过将复杂的概率模型转化为多个简单的、独立的子问题来简化推理过程。均场方法常用于大规模概率模型的推断，尤其是在具有大量隐变量（latent variables）和复杂依赖关系的情况下。

均场方法的基本思想

均场方法的核心思想是将联合分布 ( $p(\mathbf{z}, \mathbf{x})$ )（其中 ( $\mathbf{z}$ ) 是隐变量，( $\mathbf{x}$ ) 是观测变量）近似为一组分布的乘积形式。假设联合分布 ( $p(\mathbf{z}, \mathbf{x})$ ) 可以被近似为：
$q(\mathbf{z}, \mathbf{x}) = \prod_{i} q_i(z_i)$
其中，( $q_i(z_i)$ ) 是关于每个隐变量 ( $z_i$ ) 的单独分布。这种近似意味着我们将原本可能存在依赖关系的隐变量分布，近似为独立分布。

通过这种方式，均场方法使得原本非常复杂的推理过程（例如计算后验分布）变得更加简单，因为每个 ( $q_i(z_i)$ ) 可以通过独立优化来求解。

变分推断中的均场方法

变分推断是一种通过优化变分分布来近似后验分布的方法。均场方法是变分推断中的一种特殊情形，它假设后验分布可以被写成各个隐变量的独立分布的乘积形式。

给定一个概率模型，变分推断的目标是通过最小化变分下界（ELBO，Evidence Lower Bound）来找到一个近似后验分布 ( $q(\mathbf{z})$ )，使得它尽可能接近真实后验分布 ( $p(\mathbf{z}|\mathbf{x})$ )：
$\mathcal{L}(q) = \mathbb{E}_{q}[\log p(\mathbf{x}, \mathbf{z})] - \mathbb{E}_{q}[\log q(\mathbf{z})]$

在均场方法中，假设后验分布可以写成：
$q(\mathbf{z}) = \prod_{i} q_i(z_i)$
然后，通过独立地优化每个 ( $q_i(z_i)$ ) 来最大化变分下界。最终，均场方法通过迭代优化每个 ( $q_i(z_i)$ ) 来获得近似的后验分布。

举例：均场方法在变分推断中的应用

假设我们有一个简单的概率模型，其中包含一个隐变量 ( $z$ ) 和一个观测变量 ( $x$ )，其联合分布为：
$p (x, z) = p (x ∣ z) p (z)$
目标是通过变分推断方法来逼近后验分布 ( $p (z ∣ x)$ )，但由于模型可能非常复杂，直接计算后验是不可行的。

步骤1：假设均场分布

我们假设后验分布 ( $p (z ∣ x)$ ) 可以用均场近似 ( $q (z)$ ) 来表示。具体地，我们假设：
$q_1(z_1) q_2(z_2) \cdots q_K(z_K)$
其中 ( $(z_1, z_2, \dots, z_K)$ ) 是所有隐变量的集合。

步骤2：最小化变分下界

然后，我们通过最大化变分下界 ( $\mathcal{L}(q)$ ) 来找到最优的 ( $q (z)$ )。变分下界定义为：
$\mathcal{L}(q) = \mathbb{E}_{q(z)}[\log p(x, z)] - \mathbb{E}_{q(z)}[\log q(z)]$

步骤3：独立优化每个 ( $q_i(z_i)$ )

在均场方法中，目标是通过独立优化每个 ( $q_i(z_i)$ ) 来最大化 ( $\mathcal{L}(q)$ )。每次优化时，其他 ( $q_j(z_j)$ ) 被固定。因此，均场方法将问题拆分为多个独立的优化问题。

步骤4：迭代更新

通过迭代更新每个 ( $q_i(z_i)$ )，我们可以逐步逼近真实的后验分布。

总结

阶乘分布（Factorial Distribution） 是一种离散概率分布，常用于描述某些特殊的计数问题，尤其是在事件之间具有某种特定依赖关系时。
均场（Mean Field）方法 是变分推断中的一种近似推理技术，它通过将复杂的联合分布分解为独立的单变量分布，简化了计算和推理过程。均场方法广泛应用于大规模概率模型的推断，尤其是在具有大量隐变量的情形下。

通过均场方法，我们能够在高维的概率空间中，采用独立的局部优化算法，逐步逼近全局的最优解。这使得均场方法成为处理复杂概率模型的重要工具，特别是在机器学习和统计学中的贝叶斯推断任务中。

英文版

What is the Factorial Distribution?

In statistics, the Factorial Distribution is a discrete probability distribution used to describe certain types of random processes, particularly in counting problems. The name of this distribution comes from its probability mass function (PMF) containing factorial terms, which is why it is sometimes called the “Poisson factorial distribution” or “discrete negative exponential distribution.”

The factorial distribution is often used to model the probability of certain events occurring multiple times, especially in cases where the occurrence of each event may not be independent or there is some form of dependency structure.

Definition of the Factorial Distribution

Suppose ( $X$ ) follows a factorial distribution, its probability mass function (PMF) is defined as:
$\frac{\lambda^x}{x!} e^{-\lambda}, \quad x = 0, 1, 2, \dots$
where:

( $\lambda$ ) is a positive parameter, typically representing the average occurrence rate or intensity of the events.
( $x$ ) is the random variable ( $X$ )'s value, representing the number of events that occurred.

This distribution resembles the Poisson distribution, but the key difference is that the Poisson distribution does not have the factorial term in its PMF.

What is the Mean Field Method?

The Mean Field Method is a common approximation technique used in variational inference to simplify the inference process by decomposing a complex probabilistic model into simpler, independent subproblems. The mean field method is widely used for inference in large-scale probabilistic models, especially when there are many latent variables and complex dependencies.

Basic Idea of the Mean Field Method

The core idea of the mean field method is to approximate a joint distribution ( $p(\mathbf{z}, \mathbf{x})$ ) (where ( $\mathbf{z}$ ) are latent variables and ( $\mathbf{x}$ ) are observed variables) as a product of independent distributions for the latent variables:
$q(\mathbf{z}, \mathbf{x}) = \prod_{i} q_i(z_i)$
where ( $q_i(z_i)$ ) is the individual distribution for each latent variable ( $z_i$ ). This approximation assumes that the latent variables are independent of each other.

By doing so, the mean field method simplifies what would otherwise be a highly complex inference problem (such as calculating the posterior distribution) by enabling each ( $q_i(z_i)$ ) to be optimized independently.

Mean Field Method in Variational Inference

Variational inference is a method of approximating the posterior distribution by optimizing a variational distribution. The mean field method is a special case in variational inference where the posterior distribution is assumed to be factorized into independent distributions for each latent variable.

Given a probabilistic model, the goal of variational inference is to find an approximate posterior distribution ( $q(\mathbf{z})$ ) that is as close as possible to the true posterior distribution ( $p(\mathbf{z}|\mathbf{x})$ ) by minimizing the variational lower bound (ELBO, Evidence Lower Bound):
$\mathcal{L}(q) = \mathbb{E}_{q}[\log p(\mathbf{x}, \mathbf{z})] - \mathbb{E}_{q}[\log q(\mathbf{z})]$

In the mean field method, we assume the posterior distribution can be written as:
$q(\mathbf{z}) = \prod_{i} q_i(z_i)$
Then, we independently optimize each ( $q_i(z_i)$ ) to maximize the variational lower bound. Eventually, the mean field method uses iterative optimization to find the approximate posterior distribution.

Example: The Mean Field Method in Variational Inference

Suppose we have a simple probabilistic model that includes a latent variable ( $z$ ) and an observed variable ( $x$ ), with a joint distribution defined as:
$p (x, z) = p (x ∣ z) p (z)$
The goal is to approximate the posterior distribution ( $p (z ∣ x)$ ) through variational inference, but direct computation of the posterior is infeasible due to the complexity of the model.

Step 1: Assume a Mean Field Distribution

We assume that the posterior distribution ( $p (z ∣ x)$ ) can be approximated by a mean field distribution ( $q (z)$ ). Specifically, we assume:
$q_1(z_1) q_2(z_2) \cdots q_K(z_K)$
where ( $(z_1, z_2, \dots, z_K)$ ) represents all the latent variables.

Step 2: Minimize the Variational Lower Bound

Next, we maximize the variational lower bound ( \mathcal{L}(q) ) to find the optimal ( q(z) ). The variational lower bound is defined as:
$\mathcal{L}(q) = \mathbb{E}_{q(z)}[\log p(x, z)] - \mathbb{E}_{q(z)}[\log q(z)]$

Step 3: Independently Optimize Each ( q_i(z_i) )

In the mean field method, the goal is to maximize ( $\mathcal{L}(q)$ ) by independently optimizing each ( $q_i(z_i)$ ). When optimizing ( $q_i(z_i)$ ), all other ( $q_j(z_j)$ ) are fixed. Therefore, the mean field method decomposes the problem into multiple independent optimization problems.

Step 4: Iterative Updates

By iteratively updating each ( $q_i(z_i)$ ), the mean field method can progressively approximate the true posterior distribution.

Summary

The Factorial Distribution is a discrete probability distribution that is used to describe certain counting problems, particularly when there are specific dependencies between events.
The Mean Field Method is an approximation technique in variational inference that decomposes complex joint distributions into independent distributions for each latent variable. This simplifies the inference process and is widely used in large-scale probabilistic models, especially in cases involving many latent variables.

By using the mean field method, we can handle high-dimensional probability spaces and progressively approximate the global optimal solution through independent local optimization. This makes the mean field method an important tool for inference in complex probabilistic models, particularly in Bayesian inference tasks in machine learning and statistics.