Introduction to Bayesian Parameter Estimation and Censored Data (Theory)

What is Bayes Theorem?

Bayes theorem can provide a framework for parameter estimation that allows for the estimation of parameters in a flexible way that starts from simple principles. This is appealing for people who do not enjoy memorizing formulas and deriving an estimate from first principles.

The model building all starts from the following formula called Bayes theorem similar to how classical mechanics in physics starts from Newtons Laws.

 p(M|D) = \frac{p(D|M) p(M)}{p(D)}

In this formula the posterior defined by the p(M|D) term is the probability of our model M. The model probability is proportional to the prior belief in the model (the p(M) term) and the likelihood of the data given our model (the p(D|M) term). From this equation we can derive ordinary least squares regression which is familiar to anyone who has analyzed a simple data set by given the following 3 assumption; the error in our model otherwise known as the residuals follow a Normal (Gaussian) distribution, an uniform prior belief of our model, and  that elements in our data are statistically independent. Typically it is also assumed that the variance in the data set is fixed across measurements but in this Bayesian framework that assumption can be relaxed.

What does one mean by censored data?

In this post the details of applying this Bayesian framework will be explored for the problem of estimating the mean of a distribution given we can only get samples which are below a given threshold. In an upcoming post we will implement the theoretical results obtained to the following mean weight estimation problem. What is the average weight of males in the US given our only scale is one that maxes out at 190 pounds? To answer this one could model the data using the following Bayesian logic assuming that male weight is distributed like a bell curve. This example of not being able to measure values above a certain threshold is known as right censoring

Assuming that the mean US male weight is 187.5 pounds with a standard deviation of 25 pounds the goal will be to estimate the mean male weight with 1000 measurements from our 190 pound maximized scale. To apply Bayes theorem the likelihood given a measurement must be determined.

p(weight_i | (weight_i < \lambda) , \mu , \sigma)

Where \lambda is the scale maximum which will be 190 pounds and weight_i is the ith measurement obtained from our scale. For the case where our scale has effectively no maximum value this function effectively simply a Normal (Gaussian) function.

A simple discrete example for calculating conditional probabilities

To obtain the form of the likelihood we will start with a simple discrete example given by rolling a loaded dice with the following probabilities for each value.

  1. p(X = 1) = \frac{1}{4}
  2. p(X = 2) = \frac{1}{8}
  3. p(X = 3) = \frac{1}{6}
  4. p(X = 4) = \frac{1}{8}
  5. p(X = 5) = \frac{1}{6}
  6. p(X = 6) = \frac{1}{6}

The probability of rolling a 1 given that the roll is below 4 mathematically notated as p(X=1|X<4) will be calculated. The total probability for all possible rolls is equal to 1.

\sum_i p(X=i) = 1

Similarly the conditional probability across all the cases in that satisfy the constraint X<4. These constraint satisfying rolls are given by the value 1, 2, 3 .

 p(X=1|X<4) +p(X=2|X<4) +p(X=3|X<4) = 1

The probability of X<4 can be easily calculated as the sum of the unconditional probabilities for 1, 2, 3. This value of p(X<4) is \frac{13}{24}. Intuitively this number will be proportional to the conditional probability of interest. This proportionality along with the constraint that the probabilities are normalized leads to the following conditional probability for p(X=1|X<4)

 p(X=1|X<4) = \frac{p(X=1)}{p(X<4)} = \frac{p(X=1)}{p(X=1)+p(X=2)+p(X=3)}

 p(X=1|X<4) = \frac{\frac{1}{4}}{\frac{13}{24}}=\frac{6}{13}

Applying Bayes theorem to parameter estimation : What is the mean of the underlying distribution?

Applying the form obtained in the discrete example to the continuous case leads to the following for a limiting value of \lambda which for our example of the weight limited scale \lambda is given by 190 pounds.

p(x|x<\lambda) = \frac{p(x)}{\int_{-\infty}^x p(u)du}

Now we apply this formula to our weight example where p(x_i) for the ith measurement is given by a Normal (Gaussian) function.

Let us define the cumulative distribution function for a standard normal distribution (\mu=0,\sigma=1) as \Phi(x).

p(x|(x<\lambda),\mu,\sigma) = \frac{p(x|\mu,\sigma)}{\Phi(\frac{\lambda-\mu}{\sigma})}=\frac{\frac{\exp{\frac{-(x-\mu)^2}{2\sigma^2}}}{\sigma \sqrt{2\pi}}}{\int_{-\infty}^{\lambda}\frac{\exp{\frac{-(t-\mu)^2}{2\sigma^2}}}{\sigma \sqrt{2\pi}}dt}

If we apply Bayes theorem with a uniform prior distribution we obtain the following posterior for a complete data set of N measurements X\in (x_1,.....x_{N}).

p(\mu|X,\sigma) \propto \frac{p(X|\mu,\sigma)}{\Phi(\frac{\lambda-\mu}{\sigma})}=\frac{\prod_{i=1}^N p(x_i|\mu,\sigma)}{\Phi(\frac{\lambda-\mu}{\sigma})}

We will take the log of this posterior and find the value of  the mean \mu which has the maximum probability given the data by maximizing this log likelihood.

L = \log{p(\mu|X,\sigma)} \propto (-1) \cdot \sum_{i=1}^N[ \frac{(x-\mu)^2}{2\sigma^2} + \log{\sigma \sqrt{2\pi}} + \log{\Phi(\frac{\lambda-\mu}{\sigma})} ]

L = \log{p(\mu|X,\sigma)} \propto (-1) \cdot( N\log{\Phi(\frac{\lambda-\mu}{\sigma})} +\sum_{i=1}^N [\frac{(x-\mu)^2}{2\sigma^2} + \log{\sigma \sqrt{2\pi}}] )

The most likely value of the mean is given by the following equation.

\frac{dL}{d\mu}|_{\mu=\mu_0} = 0

We now have the mathematical equation for the likelihood along with the constraint for the estimate of the mean \mu which has the maximum probability and is therefore the most likely value. In an upcoming post we will implement this theoretical result in Python to obtain the estimate of the mean male weight with 1000 measurements from our 190 pound maximized scale from data simulated using the assumption that the mean US male weight is 187.5 pounds with a standard deviation of 25 pounds.

--j

Leave a Reply

Your email address will not be published. Required fields are marked *