![]() |
MaCh3
2.4.2
Reference Guide
|
we will consider the predictive distributions of the number of events $Z$. The prior predictive distribution of $Z$ is given by:
$$ p(Z) = \int_{\vec{\theta}} p(\vec{\theta}) \, p(Z \mid \vec{\theta}) \, d\vec{\theta}. $$
where $p( \vec{\theta} )$ is the same prior probability as in Bayes' theorem. We use the following procedure to obtain the prior predictive distribution:
After the fit, we can predict the observable $Z_{pred}$, which estimates the expected measurement of the same physical process as $Z$. This is knows as posterior predictive distribution $p(Z_{pred}|Z)$ and can be expressed as~$$ p(Z_{pred} Z) = \int_{\vec{\theta}} p(Z_{pred}|\vec{\theta}) \, p(\vec{\theta}|Z) \, d\vec{\theta}. $$
where $p(\vec{\theta}|Z)$ is the posterior probability. We obtain the posterior predictive distribution with the following steps:
Figure shows the prior and posterior predictive distributions of the number of events. The prior predictive distribution is very wide compared to the posterior predictive distribution, and the relative error on the number of events for this sample has been reduced from 14% to 0.7%, which demonstrates how fit can reduce uncertainties.
Bayesian posterior predictive $p$-value is meant to estimate how likely we are to observe the data described by our postfit model if we were to take the same amount of data again. Therefore, it is a much more `‘demanding’' $p$-value test than the frequentist $p$-value, which uses the larger prior parameter phase space.
Firstly, an ensemble of parameter values explored by the MCMC, once the stationary state has been reached, is used. We draw parameter values from a random MCMC step after burn-in stage and build the predictions for each sample (by reweighting the nominal MC to drawn parameters values). Then, we statistically fluctuate the drawn prediction by applying Poissonian smearing to each bin. Afterwards, for each sample, we calculate -2LLH between the drawn prediction and its statistical fluctuation: -LLH(Draw Fluc, Draw), and similarly between the drawn prediction and the data distribution: -LLH(Data, Draw). We repeat this process a few thousand times. An example of -LLH(Data, Draw) vs. -2LLH(Draw Fluc, Draw) is shown below.
We identify two methods for calculating the $p$-value, statistically fluctuating two different distributions. The first method uses the prediction from the draw -LLH(Draw Fluc, Draw), and the other uses the averaged prediction for all draws and its statistical fluctuation -LLH(Pred Fluc, Draw). On average, we expect the $p$-value from the second method to be better.
[1] Kamil Skwarczynski PhD thesis