Importance_Sampling

Monte Carlo Importance Sampling¶

Let us introduce the concept of importance sampling method by application to classical many-particle system (like Ising model or classical gas).

The basic idea of Monte Carlo Simulation: The simulation is performed by random walk through very large configuration space. The probability to make a move has to be such that the system gets to thermal equilibrium (and remains in thermal equilibrium) at certain temperature $T$ (is usually a parameter) after a lot of Monte Carlo steps.

The basic idea of simulated annealing: Slowly decreasing the temperature of the system leads to the ground state of the system. When using this type of cooling down by Monte Carlo, one can find a global minimum of a general minimization problem. This is called simulated annealing.

MC Importance Sampling by Markov chain¶

If a configuration in phase space is denoted by $X$, the probability for configuration according to Boltzman is \begin{equation} \rho(X)\propto e^{-\beta E(X)}\qquad \beta=\frac{1}{T} \end{equation}

How to sample over the whole phase space for a general problem? How to generate configurations?

Brute force: generate a truly random configuration X and accept it with probability $e^{-\beta E(X)}$ where all $E>0$. Successive $X$ are statistically independent.

VERY INEFFICIENT
Markov chain: Successive configurations $X_i$, $X_{i+1}$ are NOT statistically independent but are distributed according to the choosen disribution (such as Boltzman distribution).

Can be made VERY EFFICIENT

What is the difference between Markov chain and uncorrelated sequence? \begin{itemize}

Truly random or uncorrelated sequence of configurations satisfies the identity

$$P(X_1,X_2,\cdots,P_{X_N})=P_1(X_1)P_1(X_2)\cdots P_1(X_{N})$$

Markov chain satisfies the equation

$$P(X_1,X_2,\cdots,P_{X_N})=P_1(X_1)T(X_1\rightarrow X_2) T(X_2\rightarrow X_3)\cdots T(X_{N-1}\rightarrow X_N)$$

where the transition probabilities $T(X\rightarrow X')$ are normalized $$\sum_{X'} T(X \rightarrow X')=1$$

We want to generate Markov chain where distribution of states is proportional to the choosen distribution, for example $$e^{-\beta E(X)},$$ and the distribution of states is independent of the position within the chain and independent of the initial configuration.

1) Connectedness

The necessary conditions for generating such Markov chain is that every configuration in the phase space should be accesible from any other configuration within finite number of steps (connectedness or irreducibility) - (Be careful to check this condition when choosing Monte Carlo step!)

2) Detail balance

We need to find transition probability $T(X\rightarrow X')$ which leads to a given stationary distribution $\rho(X)$ (in this case $\rho(X)\propto e^{-\beta E(X)}$).

The probability for $X$ decreases, if system goes from $X$ to any other $X'$: $\Delta \rho(X)=-\sum_{X'}\rho(X)T(X\rightarrow X')$ and increases if $X$ configuration is visited from any other state $X'$: $\Delta \rho(X) = \sum_{X'}\rho(X')T(X'\rightarrow X)$. The step (time) difference of the probability $X$ is therefore \begin{equation} \rho(X,t+1)-\rho(X,t) = -\sum_{X'}\rho(X)T(X\rightarrow X')+\sum_{X'}\rho(X')T(X'\rightarrow X) \end{equation} We look for stationary solution, i.e., $\rho(X,t+1)-\rho(X,t)=0$ and therefore \begin{equation} \sum_{X'}\rho(X)T(X\rightarrow X')=\sum_{X'}\rho(X')T(X'\rightarrow X) \end{equation} General solution of this equation is not accesible, but a particular solution is obvious \begin{equation} \rho(X)T(X\rightarrow X')=\rho(X')T(X'\rightarrow X) \end{equation} This solution is called DETAIL BALANCE solution.

To construct algorithm, we divide transition prob. $T(X\rightarrow X')=\omega_{X\rightarrow X'}A_{X\rightarrow X'}$:

trial step probability $\omega_{X\rightarrow X'}$.

Many times it is symmetric, i.e., $\omega_{X\rightarrow X'}=\omega_{X'\rightarrow X}$. (for example spin flip in ising: $\omega_{XX'}$ is $1/L^2$ if $X$ and $X'$ differ for a single spin flip and zero otherwise ).
acceptance probability $A_{X\rightarrow X'}$

(for example accepting or rejecting new configuration with probability proportional to $min(1,\exp(-\beta(E(X')-E(X))))$).

Detail balance condition becomes $$\frac{A_{X\rightarrow X'}}{A_{X'\rightarrow X}}=\frac{\omega_{X' \rightarrow X}\;\rho(X')} {\omega_{X\rightarrow X'}\;\rho(X)}$$

Metropolis chooses \begin{eqnarray} \begin{array}{lc} A_{X\rightarrow X'}=1 & \;\mathrm{if}\;\; \omega_{X'\rightarrow X} \rho(X')> \omega_{X\rightarrow X'} \rho(X)\\ A_{X\rightarrow X'}= \frac{\omega_{X'\rightarrow X}\,\rho(X')}{\omega_{X\rightarrow X'}\,\rho(X)} & \;\mathrm{if}\;\;\omega_{X'\rightarrow X}\rho(X')<\omega_{X\rightarrow X'}\;\rho(X). \end{array} \end{eqnarray} Obviously, this acceptance probability satisfies detail balance condition and therefore leads to desired Markov chain with stationary probability for any configuration $X\propto \rho(X)$ for long times.

To summarize Metropolis algorithm

$T(X\rightarrow X')=\omega_{X\rightarrow X'}A_{X\rightarrow X'}$ (we have to consider trial step probability)
$\sum_{X'}\omega_{X\rightarrow X'}=1$ and $\omega_{X\rightarrow X'}>0$ for all $X, X'$ after finite number of steps (ergodicity)
$A_{X\rightarrow X'}=min(1,\frac{\rho(X')\omega_{X'\rightarrow X}}{\rho(X)\omega_{X\rightarrow X'}})$ (acceptance step probability for Metropolis)

How to accept a step with probability $A_{XX'}<1$? One can generate a random number $r\in[0,1]$ and accept the step if $r<A_{XX'}$.

Keep in mind:

Configurations that are generated by Markov chain are correlated. The theory guarantees that we arrive at invariant distribution $\rho$ after a long time.
Two configurations are statistically independent only if they are far apart in the Markov chain. This distance is called {\it correlation time}, and is usually very long.
(Be careful: To meassure distance in Markov chain, every step counts, not only successful.)

Most of the time we are simulating configurations which correspond to the partition function, so that the probability to find a given state $X$ in the Markov's chain is equal to $\rho(X) = \frac{e^{-\beta E(X)}}{\sum_X e^{-\beta E(X)}}=\frac{\Delta Z(X)}{Z} $.

The average of any quantity is then calculated by $$\overline{A}\equiv \sum_X \rho(X) A(X) \rightarrow \frac{1}{n-n_0}\sum_{i>n_0}^n A_i$$ where $n_0$ steps are used to "warm-up". This is because the configurations in the Markov's chain are distributed according to $\rho(X)$.

We can also try compute the following quantity $$\sum_X \rho(X)(A(X)-\overline{A})^2 \rightarrow^? \frac{1}{n-n_0}\sum_{i>n_0}^n (A_i-\overline{A})^2.$$ This is A WRONG ESTIMATION OF THE ERROR OF MC. The error is much bigger than this estimation, because configurations $X$ are correlated! Imagine the limit of large correlations when almost all values $A_i$ are the same (very slowly changing configurations). We would estimate that standard deviation is zero regardless of the actual error!

To compute standard deviation, we need to group meassurements within the correlation time into bins and than estimate the standard deviation of the bins: \begin{eqnarray} B_l = \frac{1}{N_0}\sum_{i=N_l}^{i<N_l+N_0} A_i\\ \sigma^2 = \frac{1}{M}\sum_{j=0}^{M-1}(B_j-\overline{A})^2 \end{eqnarray} where we took into account that $\overline{A}=\overline{B}$. The correlation time (here denoted by $N_0$) is not very easy to estimate. Maybe the best algorithm is to compute $\sigma^2$ for few different $N_0$ and as long as $\sigma^2$ is increasing with $N_0$, the correlation time is still larger than $N_0$. When $\sigma^2$ stops changing with increasing $N_0$, we reached correlation time and $\sigma^2$ is a good estimation of standard deviation.

In [ ]: