![]() |
MaCh3
2.4.2
Reference Guide
|
It is good practice to run an MCMC diagnostic before looking at posterior distributions, although in most cases you will stumble on the need to diagnose after checking posteriors. If your posterior looks like this, either you do have not enough steps or wrongly tune the chain.

Before discussing step size tuning, first we need to understand how a step is proposed.
There are several plots worth studying for MCMC diagnostic you can find the executables which produce them in the Diagnostics folder.
We can study chain autocorrelation, which tells us how many particular steps are correlated. To test it, we introduce a variable Lag(n)= corr(Xi, Xi−n) which tells us how much correlated are steps that are n steps apart, where i is the maximal considered distanced, here i = 25000. Fig. shows autocorrelations for studied chains since we want our steps to be more random, and less correlated to quickly converge. The rule of thumb is for autocorrelation to drop below 0.2 for Lag(n = 10000). This isn’t a strict criterion, so if sometimes autocorrelations drop slightly slower than the blue line in our Figure, it’s not a problem.
Example of well-tuned step scale (colours represent chains which have different starting positions)

If your autocorrelation looks like this, though, you really should increase step size. One exception would be parameter which has no effect. Imagine you run ND only fit, but the parameter affects FD only. Then it is expected autocorrelation to look badly.

Sometimes it is difficult to go gauge which configuration is better with order of hundreds of parameters. Therefore, it can be useful to look average autocorrelations as in plot below then one can clearly see which has the lowest autocorrelations:
In many cases (though not always), reducing autocorrelations can also decrease the acceptance rate—that is, how often a proposed step is accepted. While very low autocorrelations are generally desirable, achieving them at the cost of an acceptance rate of only a few percents is usually a sign of problems with the tuning. As a rule of thumb, we should aim for an acceptance rate between 10% and 30%, which provides a good balance between efficient exploration and stable sampling.
Fig. shows the trace, which is the value of a chosen parameter at each step. It can be seen that at first, the chains have different traces but after a thousand steps, they start to stabilise and oscillates around a very similar value, indicating that the chain converged, and a stationary state was achieved.

Fig. shows a mean value of acceptance probability (A(θ',θ)) in an interval of 5k steps (batched mean). This quantity is quite high at the beginning, indicating the chain didn’t converge. When the chain gets close to the stationary state, it starts to stabilise. Orange stabilised the fastest, while blue and green are slowly catching up, but the red didn’t converge yet.

Usually, we run several chains which are later combined. There is a danger that not all chains will converge, then using them will bias results. R hat is meant to estimate whether chains converged successfully or not. According to Gelman, you should calculate R hat for at least 4 chains and if R hat > 1.1 then it might indicate wrong chains convergence. Below you can find an example of chains which wrongly converged and one which successfully
Chains converged to different values

Successfully converged chains

Geweke Diagnostic helps to define what burn it should it be. You should select burn-in around 15% in this case, as this is where distribution stabilises.
According to [2] (Section 3.2.1), the global step size should be equal to 2.38^2/N_params. Keep in mind this refers to the global step scale for a given covariance object like xsec covariance or detector covariance.
This procedure is very tedious and requires intuition of how a given parameter behaves. It is a bit of dark magic however skilled users should be able to tune it relatively fast compared with the non-skilled user. The process is as follows, you run the chain, run the diagnostic executable look at plots adjust the step scale then run again fit and the process repeats. Each time you should look at autocorrelations, traces, etc. (see discussion above). Another important trick is not to run full fit. Instead of running 10M chain, you might run 200k. Number of steps depends on number of parameters and Lag(n = ?) you are interested in.
There are a few things you should be aware of when tuning:
The last point is that data fit may require a different tuning than the Asimov fit. Still, if you tune for Asimov it should be easy to re-tune for a data fit.
[1] Kamil Skwarczynski PhD thesis
[2] https://asp-eurasipjournals.springeropen.com/track/pdf/10.1186/s13634-020-00675-6.pdf
If you have complaints, blame: Kamil Skwarczynski