• Question: When conducting interim analyses, how do you use statistical methods to adjust for potential multiple testing issues and ensure that the trial's overall type I error rate is controlled, while still maintaining sufficient power to detect meaningful treatment effects?

    Asked by owen on 28 May 2025.
    • Photo: Connor Fitchett

      Connor Fitchett answered on 28 May 2025:


      It depends a lot on what and how we’re testing.

      Ideally, we do this via exact calculation. For example, if I am measuring height and weight at the same time, and I want to test whether the height and/or weight of the English are significantly different from the Scottish. Then, I’m going to have two (approximate) normal distributions, and it’s going to be possible to work out the probability of type one error exactly and increase sample size until we get the power we want (assuming we either assume independence or know the correlation between weight and height).

      Things get more difficult when we have a more complex test. For example, let’s say we’re running a platform trial. This is a clinical trial with lots of treatments being tested at the same time, and we have the ability to add or remove treatments as the trial progresses. This is more difficult, and even if we could calculate type one error precisely here, there are always more adaptations that make the trial more intractable to calculate.

      This is where simulation comes in. In most modern clinical trials, simulations are done as a minimum to check that the trial has acceptable type one error. Often, such as in my work, simulations are the only way you can work with the design, so I also use it to investigate power and expected sample size. Simulation just means coding up a trial in R (or something similar) and re-running the trial over and over again to see how it performs. If you reject the null hypothesis 5% of the time when you assume the null is true, then you have a 5% type one error. From here, you can change parameters to get the characteristics you want. This has the advantage that it works for any trial design, as long as the assumptions on how you generate your simulated data are correct. It has the downside that the coding itself can be difficult, and for complex trials it can take a long time to run enough simulations until your answers are sensible.

      There are steps in-between, for example Bonferroni correction is another way to account for multiple testing, but this is derived through making assumptions and then calculating an adjusted type one error exactly. If you go into trials from a statistics point of view, I would say that simulation via coding is the most likely way that you will control type one error in complex trials.

      Connor, Biostatistician

Comments