# A statistical analysis of a malfunctioning hot water heater

Our shower experiment follows a binomial distribution with a known parameter N, the total numbers of showers taken, and an unknown parameter p, the probability of the water turning cold during a single shower. We do not know the exact value of the parameter p, but we can just assume that it has a certain, arbitrary value p_{0} and see what would happen if that was the true value.

So let us take a look at the resulting probability mass functions for p_{0} = 0.1, 0.2, 0.3 and 0.4. Let us also assume that we take 100 showers, because it leads to smoother PMFs and thus makes it easier to explain the principle:

# example distributoins p0 = [0.1, 0.2, 0.3, 0.4] N0 = 100 # PMF fig, ax = plt.subplots(1,1) ax.grid() ax.set_xlabel('bad showers $N_b$') ax.set_ylabel('probability $P(N_b)$') ax.set_title('Probability mass function - $N = %d$' % (N0)) for p in p0: ax.step(range(60), binom(N0, p).pmf(range(60)), where='mid', label='$p_0 = %.1f$' % p) ax.legend(loc='best') fig.savefig('Plots/PMF.png')

These are the probabilities of getting a certain result if we were to repeat the experiment. Now we want to create an algorithm that takes an experimental result and yields an interval for p that includes the true value in 95% of all the possible results. One possible algorithm is the following:

- We calculate the PMF of every possible p
_{0}, i.e. on the interval [0,1]. - On each PMF we define a range of results, that would include 95% of all experimental outcomes, if that p
_{0}was the true parameter p. - If our experimental result lies within that range, we include that p
_{0}in our confidence interval, otherwise we exclude it.

This might seem like a kind of roundabout way to do this, but it does exactly what we want. One of the tested p_{0} *is* the true value of p. *Only that one* PMF is “real”, meaning that the experimental results follow that PMF. For the 95% of outcomes that lie within our defined range of that PMF, we include that p_{0} in our confidence interval. The remaining 5% of possible outcomes lie outside that interval, which means we would exclude that p_{0} from our confidence interval. So even if we do not know the true value of p, this algorithm includes it in the interval 95% of the times we do this experiment.

This method is pretty universal and leaves us with another choice to make: How do we choose the 95% of each PMF? There are three very common ways to do this, each with their own use:

- Cut away (up to) 5% on the right.
- Cut away (up to) 5% on the left.
- Cut away (up to) 2.5% on the right and (up to) 2.5% on the left.

All these “up to”s are necessary, because we are dealing with discrete random variables here. You can only cut away whole numbers and the sum of those probabilities usually does not add up to exactly 5%. So you cut away a bit less than that, which makes sure that the probability of producing a confidence interval that includes the true value is *at least* 95%. Applying those possible selections to our example distributions, we get the following picture:

# possible intervals colors = ['b', 'g', 'r', 'c'] fig, ax = plt.subplots(3,1, sharex=True) ax[2].set_xlabel('bad showers $N_b$') ax[1].set_ylabel('probability $P(N_b)$') ax[0].set_title('Cut right tail') ax[0].grid() for p, c in zip(p0, colors): bino = binom(N0, p) ax[0].step(range(60), bino.pmf(range(60)), where='mid', color=c, label='$p_0 = %.1f$' % p) Nmin = 1 Nmax = bino.ppf(0.95) NN = Nmax-Nmin+1 # we have to do some trickery, since matplotlib does not support filled step plots Nrange = linspace(Nmin, Nmax, NN) stepx = reshape(vstack((Nrange-0.5, Nrange+0.5)), 2*NN, order='F') stepy = reshape(vstack((bino.pmf(Nrange), bino.pmf(Nrange))), 2*NN, order='F') ax[0].fill_between(stepx, 0, stepy, color=c, alpha=0.5) ax[1].set_title('Cut left tail') ax[1].grid() for p, c in zip(p0, colors): bino = binom(N0, p) ax[1].step(range(60), bino.pmf(range(60)), where='mid', color=c, label='$p_0 = %.1f$' % p) Nmin = bino.isf(0.95) Nmax = 59 NN = Nmax-Nmin+1 # we have to do some trickery, since matplotlib does not support filled step plots Nrange = linspace(Nmin, Nmax, NN) stepx = reshape(vstack((Nrange-0.5, Nrange+0.5)), 2*NN, order='F') stepy = reshape(vstack((bino.pmf(Nrange), bino.pmf(Nrange))), 2*NN, order='F') ax[1].fill_between(stepx, 0, stepy, color=c, alpha=0.5) ax[2].set_title('Cut both tails') ax[2].grid() for p, c in zip(p0, colors): bino = binom(N0, p) ax[2].step(range(60), bino.pmf(range(60)), where='mid', color=c, label='$p_0 = %.1f$' % p) Nmin, Nmax = bino.interval(0.95) NN = Nmax-Nmin+1 # we have to do some trickery, since matplotlib does not support filled step plots Nrange = linspace(Nmin, Nmax, NN) stepx = reshape(vstack((Nrange-0.5, Nrange+0.5)), 2*NN, order='F') stepy = reshape(vstack((bino.pmf(Nrange), bino.pmf(Nrange))), 2*NN, order='F') ax[2].fill_between(stepx, 0, stepy, color=c, alpha=0.5) fig.savefig('Plots/intervals.png')

Let us assume that we measured 21 bad showers out of the 100 showers we took. Looking at the plots above, we can see that if we choose to cut the right tail of the accepted ranges, we can exclude p_{0} = 0.1, while the measured value is in the accepted range of 0.2, 0.3 and 0.4. Please note that the PMFs extend all the way to both sides, even though the probabilities might get so small that you cannot see them. If we cut away the left tail, we can exclude 0.3 and 0.4, while 0.1 and 0.2 would end up inside our confidence interval. And if we cut both tails, we exclude 0.1 and 0.4, while 0.2 and 0.3 end up in the confidence interval.

This should demonstrate the different applications of the three methods:

- Cutting the right tail, lets you find a lower limit for your parameter, since you can only exclude PMFs (and their p
_{0}) that are “left” of your measured value. - Cutting the left tail, conversely only lets you find an upper limit for the parameter, since you can only exclude PMFs that are “right” of your measured value.
- Cutting on both sides, lets you find an upper
*and*a lower limit for the parameter, but those limits are weaker than the single limits of cutting just left or just right. This is due to the fact that you are basically making two statements: The true value is above the lower limit and the true value is below the upper limit. If you want to be right with*both*statements 95% of the time, you have to be more careful about each single statement.

Next up: Putting everything we learned together to actually calculate the confidence intervals for our measurement.