Objective

This document outlines the process required to check whether an assertion made during the Grenfell Tower Inquiry is reasonable - in order to do this, it is necessary to create a logical statement or hypothesis which we can check. There are two approaches that can be used, one via simulation and one via calculation. I will start with the former, as I have no idea as yet how to tackle the latter!

What is the likellihood that a random sample of 7 doors from a population of 126 doors would show no defects, given that the overall defect level is 64%?

Simulation

To simulate this situation, we will create a vector of 126 numbers, each of which is a random number between 0 and 1. We will then sample the first seven numbers repeatedly to see what proportion of these has no defects - a defect for any given door has a chance of 0.64.

I have drawn heavily from data scientist David Robinson’s screencasts, which are excellent - here

set.seed(20210601)

# create the simulation dataframe - 1,000,000 trials

doors <- crossing(trials = 1:1000000,
                  door_check = 1:126) %>% 
  mutate(compliant = sample(c(TRUE, FALSE),
                            prob = c(0.36, 0.64),
                            replace = TRUE,
                            n()))

doors %>% 
  filter(door_check <= 7) %>% 
  group_by(trials) %>% 
  summarise(first_seven_compliant = all(compliant)) %>%
  count(first_seven_compliant) -> sim_counts

sim_counts$n[2]/sim_counts$n[1] -> sim_prob

In this simulation of a million trials, the first seven doors were all compliant roughly 800 times, so a 0.08% chance of this happening. The actual value for this run was 8.096550110^{-4}, but will vary due to the random sampling process.

Now to tackle the calculation method….!

Calculation

This problem is equivalent to drawing marbles from a bag without replacement.

compliant = ceiling(0.36 * 126)

(compliant/126) * ((compliant - 1)/125)* ((compliant - 2)/124)* ((compliant - 3)/123)* ((compliant - 4)/122)* ((compliant - 5)/121)* ((compliant - 6)/120) -> calc_prob

The resultant answer is 6.339438810^{-4}, which is slightly less than the 8.096550110^{-4} obtained via simulation. Looking back at the simulation attempt, it seems like it might be an overestimate of the probability and I think this might be because the simulation doesn’t account for the diminishing probability for each successive door - i.e. the “non-replacement” element of the draw.