Decision Flow

Decision Flow (DF) is a discrete-state, discrete-time sampling framework that transports a prior Markov process to a target terminal distribution by re-weighting its transition probabilities through the Green function of a linearly-solvable Markov decision process Chertkov, Ahn & Behjoo (2025). It is the discrete counterpart of Path Integral Diffusion Behjoo & Chertkov (2025); Kappen (2005), built on the linearly-solvable Markov decision problem of Todorov (2007), and generalises the Generative Flow Network training objective of Bengio et al. (2021). The resulting sampler is a Markov process whose transitions are a convolution of the reverse-time Green function of the prior with the target distribution Chertkov, Ahn & Behjoo (2025).

From continuous PID to discrete Decision Flow

Path Integral Diffusion (PID) poses generative sampling as a stochastic optimal control problem on a continuous state space with a quadratic control cost and a state potential; its optimal drift is the gradient of the logarithm of an explicit integral over forward and backward Green functions of two linear conjugate PDEs, obtained by Hopf–Cole linearisation of the Hamilton–Jacobi–Bellman equation Behjoo & Chertkov (2025); Kappen (2005). Decision Flow transposes this construction to a discrete state space and a discrete time index: the controlled Itô diffusion is replaced by a controlled Markov chain, the two conjugate PDEs are replaced by two conjugate linear recursions on the value function and its exponentiated form, and the Gaussian Green functions of the harmonic case Behjoo & Chertkov (2025) are replaced by the transition kernels of the prior Markov process and its time reverse Chertkov, Ahn & Behjoo (2025).

The key structural property inherited from PID is linear solvability: because the Bellman operator acts linearly on the exponentiated value function, the optimal control policy is obtained without iterative dynamic-programming fixed points and without training a neural network in the basic formulation Chertkov, Ahn & Behjoo (2025).

Linearly-solvable Markov decision processes

Decision Flow is built on the Linearly-Solvable MDP (LMDP) of Todorov (2007), in which the control cost is the Kullback–Leibler divergence between the controlled transition kernel and a passive prior kernel \(p(\cdot\mid x)\). For a state cost \(q(x)\) the Bellman equation for the cost-to-go \(v(x)\) takes the form

\[ v(x) = q(x) - \log \mathbb{E}_{x'\sim p(\cdot\mid x)}\left[e^{-v(x')}\right], \]

which, under the substitution \(z(x) = e^{-v(x)}\), becomes the linear equation

\[ z(x) = e^{-q(x)}\,\mathbb{E}_{x'\sim p(\cdot\mid x)}\left[z(x')\right]. \]

The controlled transition kernel is then the passive kernel re-weighted by the desirability \(z\):

\[ u^*(x'\mid x) \propto p(x'\mid x)\,z(x'). \]

In Decision Flow this re-weighting enforces a terminal distribution rather than an infinite-horizon cost: the desirability \(z_t(x)\) satisfies a finite linear recursion whose boundary condition is the target density at the terminal time Chertkov, Ahn & Behjoo (2025). The Hopf–Cole linearisation of the HJB equation used in continuous PID Kappen (2005) and its discrete analogue in LMDPs Todorov (2007) are manifestations of the same identity: an exponential substitution turns a log-sum-exp Bellman operator into a linear operator.

Green-function convolution form

On a graph or discrete lattice with a prior transition kernel \(p_t(x'\mid x)\), Decision Flow expresses the controlled marginal at each intermediate time as a convolution of the prior's reverse-time Green function \(G^{\mathrm{prior}}_{-}(t,x;T,y)\) with the target density \(p^{\mathrm{tar}}\):

\[ p^*_t(x) \;\propto\; \sum_y \; G^{\mathrm{prior}}_{-}(t,x;T,y)\,p^{\mathrm{tar}}(y), \]

with controlled transitions derived from the corresponding log-ratios Chertkov, Ahn & Behjoo (2025). This is the discrete counterpart of the Green-function integral representation of the PID drift Behjoo & Chertkov (2025) and, by the LMDP reduction of Todorov (2007), it is the minimum-KL (Schrödinger-bridge) process between the prior and the target on the discrete state space.

Relation to Generative Flow Networks

Generative Flow Networks train a stochastic policy on a directed acyclic graph so that the terminal marginal over leaves is proportional to a given reward Bengio et al. (2021). The GFN training objective (flow matching or trajectory balance) enforces a conservation law that is equivalent, under the LMDP reduction, to the linear Bellman equation of Todorov (2007). Decision Flow makes this equivalence explicit and extends it: the prior Markov process need not be uniform over the DAG, and the construction applies on general discrete time-space lattices rather than on terminating DAGs alone Chertkov, Ahn & Behjoo (2025). In the basic formulation no neural network is trained — the Green function of the prior is used directly — and a neural parameterisation may be grafted on when the state space is too large for exact recursions.

Ising-model sampling

Chertkov, Ahn & Behjoo (2025) illustrate the Decision Flow framework on a finite Ising spin system, taking as prior sampler a standard Markov-chain Monte Carlo kernel (single-site Metropolis updates) and as target the Boltzmann measure of the Ising Hamiltonian \(H(\sigma) = -\sum_{\langle i,j\rangle} J_{ij}\,\sigma_i\sigma_j\) on a small lattice. The DF control then re-weights Metropolis transitions using the reverse-time Green function of the prior against the Boltzmann target, and the authors compare sample quality against vanilla Metropolis–Hastings. Closed-form desirabilities are available whenever the prior's reverse-time Green function can be tabulated, which holds for the small Ising instances considered in Chertkov, Ahn & Behjoo (2025).

Relation to other articles

Decision Flow sits downstream of Path Integral Diffusion — it is the discrete-time, discrete- state specialisation of the same Hopf–Cole / Green-function machinery — and in parallel with Harmonic Path Integral Diffusion, which is the exactly solvable continuous instance with Gaussian Green functions. Whereas Mean-Field PID extends the continuous PID construction to interacting ensembles through a Vlasov coupling, Decision Flow extends it in an orthogonal direction: from continuum state spaces to discrete ones, at the cost of giving up the closed-form Gaussian kernels of the harmonic case. Its integrability comes instead from the log-linearity of the LMDP Bellman equation Todorov (2007); Chertkov, Ahn & Behjoo (2025).

External links

Markov decision process — Wikipedia