Is it the end of the world if $\alpha=0.005$ is the new norm?

In this paper by Benjamin et al (2017) on redefining statistical significance, they proposed to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. That is the proposed p-value is one tenth of the conventional one!! Suppose the world changed to p=0.005. Do we need 10X more sample? As a researcher without sufficient funding, we care about how much additional sample we need suppose our hypothesis is true.

First let’s review how sample size is calculated. It is really a good review of basic concepts in probability theory and statistics.

How Sample Size is Determined?

First we have to imagine that \(N\) participants will be randomly assigned into two groups: Treatment (T) and Control (C) group. Assume that there are equal numbers of participants in each group (\(N_T=N_C\)). The researcher is interested in testing whether the mean of the distribution \(\mu\) in two groups are different. More specifically, the researcher is comparing the two hypotheses:

\[H_0: \mu_T-\mu_C=\mu_0=0\] \[H_A: \mu_T-\mu_C =\mu_A \neq 0\]

The researcher is interested in the impact of the treatment on variable \(x\). Let \(X_T\) denote the sample average of the treatment group, and \(X_C\) the sample average of the control group. The significance level is defined as \(\alpha=Prob(Reject \quad H_0|H_0)\). It can be expressed in the following way: \(\alpha = Prob(X_T-X_C\geq v|H_0)\), where \(v\) is the critical value. For two-sided test \(\frac{\alpha}{2} = Prob(X_T-X_C\geq v|H_0)\). That is, if we want to reject the null hypothesis, the difference \(X_T-X_C\) should be large enough. Some simple algebra is needed to derive the critical value:

\[\frac{\alpha}{2}=Prob(\mu_T-\mu_C\geq v|H_0)\] \[=1-Prob(X_T-X_C\leq v|H_0)\] \[=1-Prob(\frac{X_T-X_C-\mu_0}{\sigma_N}\leq \frac{v-mu_0}{\sigma_N}|H_0)\] \[=1-Prob(\frac{X_T-X_C}{\sigma_N}\leq \frac{v}{\sigma_N}|H_0)\](under Null hypothesis \(\mu_0=0\)) \[=1-\phi(\frac{v}{\sigma_N})\](\(\phi\) is standardized normal distribution function)

Let \(z_{1-\frac{\alpha}{2}}=\frac{v}{\sigma_N}\), and then \(v=z_{1-\frac{\alpha}{2}} \cdot \sigma_N\).

Since power is defined as the probability of accepting the alternative hypothesis given that the alternative is true \(1-\beta=Prob(Accept \quad H_A|H_A)=Prob(X_T-X_C\geq v|H_A)\), which can be expressed as: \[1-\beta=1-Prob(X_T-X_C\leq z_{1-\frac{\alpha}{2}} \cdot \sigma_N |H_A)\] \[=1-Prob(\frac{X_T-X_C-\mu_A}{\sigma_N}\leq \frac{z_{1-\frac{\alpha}{2}} \cdot \sigma_N-\mu_A}{\sigma_N}|H_A)\] \[=1-\phi(\frac{z_{1-\frac{\alpha}{2}} \cdot \sigma_N -\mu_A}{\sigma_N})\] \[=\phi(\frac{\mu_A}{\sigma_N}-z_{1-\frac{\alpha}{2}})\] As a result, \(z_{1-\beta}=\frac{\mu_A}{\sigma_N}-z_{1-\frac{\alpha}{2}}\). Under the alternative hypothesis, \(X_T-X_C \sim N(\mu_A,\sigma^2/n)\). Now suppose the variable of interest is binomial distributed. Then \(\sigma_N=\sqrt{X_T(1-X_T)/N_T + X_C(1-X_C)/N_C}\) \[\frac{1}{\sqrt{X_T(1-X_T)/N_T + X_C(1-X_C)/N_C}}= \frac{z_{1-\beta}+z_{1-\frac{\alpha}{2}}}{X_T-X_C}\] $ $

\[N_T=N_C=(X_T(1-X_T)+X_C(1-X_C))\cdot (\frac{z_{1-\beta}+z_{1-\frac{\alpha}{2}}}{X_T-X_C})^2\]

Hence, if we want to calculate the sample size, we only need to specify the expected rate/proportion in treatment, the expected rate/proportion in control, power, and the significance level. The code to calculate sample size for binary outcome assuming \(\alpha= .05\) is:

## Calculate Sample Size Based on Binary Outcome
SampleSize = function(PropTreat, PropCont, Power){
  N = ((PropTreat*(1-PropTreat)+PropCont*(1-PropCont))*
    ((qnorm(1-.05/2)+qnorm(1-(1-Power)))^2))/((PropTreat-PropCont)^2)
  return(ceiling(N))
}

Using the same logic, we can get the formula to calculate the sample size for normally distributed outcome:

## Calculate Sample Size Based on Continuous Outcome
## Two sided test at 0.05 significance level
## kappa is the ratio of the sample size in control and treatment
## SDTreat is the standard deviation 
SampleSizeM = function(MeanTreat, MeanCont, SDTreat, SDCont, Power){
  NTreat =(SDTreat^2+SDCont^2)*
    ((qnorm(1-.05/2)+qnorm(1-(1-Power)))/(MeanTreat-MeanCont))^2
  return(ceiling(NTreat))
}

Is it the end of the world if \(\alpha = 0.005\)?

The original paper gave an answer for this question. It only requires the sample size to increase by 70%. Consider the following senariors:

  • Suppose the rate of response in the control group is 50%, 40%, …,10%;
  • Suppose the treatment will reduce 2%, 4%, 6%, and 10% percentage points .

That is, if the rate of reponse in the control group is 50%, we look at the desired sample size if the treatment will reduce the rate to 48%, 46%, 44%, and 40%. Similar to other base rate. As a result, we will have a 5 X 5 matrix for respective sample size. We compare difference between the case where \(\alpha = 0.05\) and \(\alpha = 0.005\).

SampleSize05 = function(PropTreat, PropCont, Power){
  N = ((PropTreat*(1-PropTreat)+PropCont*(1-PropCont))*
    ((qnorm(1-.05/2)+qnorm(1-(1-Power)))^2))/((PropTreat-PropCont)^2)
  return(ceiling(N))
}
SampleSize005 = function(PropTreat, PropCont, Power){
  N = ((PropTreat*(1-PropTreat)+PropCont*(1-PropCont))*
    ((qnorm(1-.005/2)+qnorm(1-(1-Power)))^2))/((PropTreat-PropCont)^2)
  return(ceiling(N))
}

pCon = c(0.5,0.4,0.3,0.2,0.1)
Reduced = c(0.02,0.04,0.06,0.08,0.1)
mSampleSiz05 = matrix(NA,nrow=5,ncol=5)
mSampleSiz005 = matrix(NA,nrow=5,ncol=5)
for (i in 1:5){
  for(j in 1:5){
    mSampleSiz05[i,j] = SampleSize05((pCon[i]-Reduced[j]),pCon[i],0.8)
    mSampleSiz005[i,j] = SampleSize005((pCon[i]-Reduced[j]),pCon[i],0.8)

  }
}
(Increased.Sample.Size = (mSampleSiz005-mSampleSiz05)/mSampleSiz05)
##           [,1]      [,2]      [,3]      [,4]      [,5]
## [1,] 0.6960424 0.6961145 0.6952909 0.6947195 0.6961039
## [2,] 0.6960249 0.6958406 0.6959526 0.6939502 0.6949153
## [3,] 0.6960505 0.6965552 0.6962617 0.6965812 0.6941581
## [4,] 0.6961564 0.6955017 0.6944444 0.6963190 0.6903553
## [5,] 0.6957334 0.6954103 0.6964286 0.6888889 0.6901408

As can be seen, all are very clost to 70% given different baste rate and different magnitude of effect. From 0.05 to 0.005, we do not need 10 times of the sample size.

Related