Validation Report for adoptr package
2024-02-19
1 Introduction
This work is licensed under the CC-BY-SA 4.0 license
1.1 Preliminaries
R package validation for regulatory environments can be a tedious endeavour. The authors firmly believe that under the current regulation, there is no such thing as a ‘validated R package’: validation is by definition a process conducted by the user. This validation report merely aims at facilitating validation of adoptr as much as possible. No warranty whatsoever as to the correctness of adoptr nor the completeness of the validation report are given by the authors.
We assume that the reader is familiar with the notation an theoretical background of adoptr. Otherwise, the following resources might be of help:
- adoptr online documentation at https://kkmann.github.io/adoptr/
- paper on the theoretical background of the core adoptr functionality (Pilz et al. 2019)
- a general overview on adaptive designs is given in (Bauer et al. 2015)
- a more extensive treatment of the subject in (Wassmer and Brannath 2016).
1.2 Scope
adoptr itself already makes extensive use of unittesting to ensure correctness of all implemented functions. Yet, due to constraints on the build-time for an R package, the range of scenarios covered in the unittests of adoptr is rather limited. Furthermore, the current R unittesting framework does not permit an easy generation of a human-readable report of the test cases to ascertain coverage and test quality.
Therefore, adoptr splits testing in two parts: technical correctness is ensured via an extensive unittesting suit in adoptr itself (aiming to maintain a 100% code coverage). The validation report, however, runs through a wide range of possible application scenarios and ensures plausibility of results as well as consistency with existing methods wherever possible. The report itself is implemented as a collection of Rmarkdown documents allowing to show both the underlying code as well as the corresponding output in a human-readable format.
The online version of the report is dynamically re-generated on a weekly basis based on the respective most current version of adoptr on CRAN. The latest result of these builds is available at https://kkmann.github.io/adoptr-validation-report/. To ensure early warning in case of any test-case failures, formal tests are implemented using the testthat package (Wickham, R Studio, and R Core Team 2018). I.e., the combination of using a unittesting framework, a continuous integration, and continuous deployment service leads to an always up-to-date validation report (build on the current R release on Linux). Any failure of the integrated formal tests will cause the build status of the validation report to switch from ‘passing’ to ‘failed’ and the respective maintainer will be notified immediately.
1.2.1 Validating a local installation of adoptr
Note that, strictly speaking, the online version of the validation report only provides evidence of the correctness on the respective Travis-CI cloud virtual machine infrastructure using the respective most recent release of R and the most recent versions of the dependencies available on CRAN. In some instances it might therefore be desireable to conduct a local validaton of adoptr.
To do so, one should install adoptr with the INSTALL_opts
option
to include tests and invoke the test suit locally via
install.packages("adoptr", INSTALL_opts = c("--install-tests"))
::testInstalledPackage("adoptr", types = c("examples", "tests")) tools
Upon passing the test suit successfully, the validation report can be build locally. To do so, first clone the entire source directory and switch to the newly created folder
git clone https://github.com/kkmann/adoptr-validation-report.git
cd adoptr-validation-report
Make sure that all packages required for building the report are
available, i.e., install all dependencies listed in the top-level
DESCRIPTION
file, e.g.,
install.packages(c(
"adoptr",
"tidyverse",
"bookdown",
"rpact",
"testthat",
"pwr" ) )
The book can then be build using the terminal command
Rscript -e 'bookdown::render_book("index.Rmd", output_format = "all")'
or directly from R via
::render_book("index.Rmd", output_format = "all") bookdown
This produces a new folder _book
with the html and pdf versions
of the report.
1.3 Validation Scenarios
1.3.1 Scenario I: Large effect, point prior
This is the default scenario.
- Data distribution: Two-armed trial with normally distributed test statistic
- Prior: \(\delta\sim\textbf{1}_{\delta=0.4}\)
- Null hypothesis: \(\mathcal{H}_0:\delta \leq 0\)
1.3.1.1 Variant I.1: Minimizing Expected Sample Size under the Alternative
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.4\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- All three adoptr variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points.
- Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
?
1.3.1.2 Variant I.2: Minimizing Expected Sample Size under the Null Hypothesis
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\color{red}{\delta=0.0}\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Validate constraint compliance by testing vs. simulated values of the power curve at respective points.
- \(n()\) of optimal design is monotonously increasing on continuation area.
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
?
1.3.1.3 Variant I.3: Conditional Power Constraint
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.4\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(CP := \color{red}{\boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.4, X_1 = x_1\big] \geq 0.7}\) for all \(x_1\in(c_1^f, c_1^e)\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Check \(Power\) and \(TOER\) constraints with simulation. Check \(CP\) constraint on 25 different values of \(x_1\) in \([c_1^f, c_1^e]\)
- Are the \(CP\) values at the 25 test-pivots obtained from simulation the
same as the ones obtained by using numerical integration via
adoptr::evaluate
? - Is \(ESS\) of optimal two-stage design with \(CP\) constraint higher than \(ESS\) of optimal two-stage design without this constraint?
1.3.2 Scenario II: Large effect, Gaussian prior
Similar scope to Scenario I, but with a continuous Gaussian prior on \(\delta\).
- Data distribution: Two-armed trial with normally distributed test statistic
- Prior: \(\delta\sim\mathcal{N}(0.4, .3)\)
- Null hypothesis: \(\mathcal{H}_0:\delta \leq 0\)
1.3.2.1 Variant II.1: Minimizing Expected Sample Size
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- All designs comply with type one error rate constraints (tested via simulation).
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
1.3.2.2 Variant II.2: Minimizing Expected Sample Size under the Null hypothesis
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\color{red}{\delta\leq 0}\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design comply with \(TOER\) constraint (via simulation)?
- Is \(ESS\) lower than expected sample size under the null hypothesis for the optimal two stage design from Variant II-1?
1.3.2.3 Variant II.3: Condtional Power Constraint
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta>0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(CP := \color{red}{\boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta> 0.0, X_1 = x_1\big] \geq 0.7}\) for all \(x_1\in(c_1^f, c_1^e)\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Check \(TOER\) constraint with simulation.
- Check \(CP\) constraint on three different values of \(x_1\) in \((c_1^f, c_1^e)\)
- Is \(ESS\) of optimal two-stage design with \(CP\) constraint higher than \(ESS\) of optimal two-stage design without the constraint?
1.3.3 Scenario III: Large effect, uniform prior
- Data distribution: Two-armed trial with normally distributed test statistic
- Prior: sequence of uniform distributions \(\delta\sim\operatorname{Unif}(0.4 - \Delta_i, 0.4 + \Delta_i)\) around \(0.4\) with \(\Delta_i=(3 - i)/10\) for \(i=0\ldots 3\). I.e., for \(\Delta_3=0\) reduces to a point prior on \(\delta=0.4\).
- Null hypothesis: \(\mathcal{H}_0:\delta \leq 0\)
1.3.3.1 Variant III.1: Convergence under Prior Concentration
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta>0.0\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Simulated type one error rate is compared to \(TOER\) constraint for each design.
- \(ESS\) decreases with prior variance.
Additionally, the designs are compared graphically. Inspect the plot to see convergence pattern.
1.3.4 Scenario IV: Smaller effect size, larger trials
1.3.4.1 Variant IV.1: Minimizing Expected Sample Size under the Alternative
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- All three adoptr variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points.
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?
1.3.4.2 Variant IV.2: Increasing Power
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq \color{red}{0.9}\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design respect all constraints (via simulation)?
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?
1.3.4.3 Variant IV.3: Increasing Maximal Type One Error Rate
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq \color{red}{0.05}\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Does the design respect all constraints (via simulation)?
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
? - Is \(n()\) of the optimal two-stage design monotonously decreasing on continuation area?
1.3.5 Scenario V: Single-arm design, medium effect size
- Data distribution: trial with normally distributed test statistic
- Prior: \(\delta\sim\delta_{0.3}\)
- Null hypothesis: \(\mathcal{H}_0:\delta \leq 0\)
1.3.5.1 Variant V.1: Sensitivity to Integration Order
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\color{red}{\delta=0.3}\big] \geq 0.8\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: integration order 5, 8, 11 two-stage designs.
- Formal tests:
- Do all designs converge within the respective iteration limit?
- Do all designs respect all constraints (via simulation)?
1.3.5.2 Variant V.2: Utility Maximization
- Objective: \(\lambda\, Power - ESS := \lambda\, \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] - \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big].\) for \(\lambda = 100\) and \(200\)
- Constraints:
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate constraint (via simulation)?
- Is the power of the design with larger \(\lambda\) larger?
1.3.5.3 Variant V.3: \(n_1\) penalty
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big] + \lambda \, n_1\) for \(\lambda = 0.05\) and \(0.2\).
- Constraints:
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] \geq 0.8\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate and power constraints (via simulation)?
- Is \(n_1\) for the optimal design smaller than the order-5 design in V.1?
1.3.5.4 Variant V.4: \(n_2\) penalty
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.3\big] + \lambda\)
AverageN2
for \(\lambda = 0.01\) and \(0.1\). - Constraints:
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.3\big] \geq 0.8\)
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- Do both designs respect the type one error rate and power constraints (via simulation)?
- Is the
AverageN2
for the optimal design smaller than for the order-5 design in V.1?
1.3.6 Scenario VI: Binomial distribution
This scenario investigates the implementation of the binomial distribution.
- Data distribution: Two-armed trial with binomial distributed outcomes. Thus \(\delta := p_E - p_C\) refers to the rate difference here. The control rate is assumed to equal \(p_C = 0.3\).
- Prior: \(\delta\sim\textbf{1}_{\delta=0.2}\)
- Null hypothesis: \(\mathcal{H}_0:\delta \leq 0\)
1.3.6.1 Variant VI.1: Minimizing Expected Sample Size under the Alternative
- Objective: \(ESS := \boldsymbol{E}\big[n(X_1)\,|\,\delta=0.2\big]\)
- Constraints:
- \(Power := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.2\big] \geq 0.9\)
- \(TOER := \boldsymbol{Pr}\big[c_2(X_1) < X_2\,|\,\delta=0.0\big] \leq 0.025\)
- Three variants: two-stage, group-sequential, one-stage.
- Formal tests:
- Number of iterations are checked against default maximum to ensure proper convergence.
- All three adoptr variants (two-stage, group-sequential, one-stage) comply with constraints. Internally validated by testing vs. simulated values of the power curve at respective points.
- \(ESS\) of optimal two-stage design is lower than \(ESS\) of optimal group-sequential one and that is in turn lower than the one of the optimal one-stage design.
- \(ESS\) of optimal group-sequential design is lower than \(ESS\) of externally computed group-sequential design using the rpact package.
- Are the \(ESS\) values obtained from simulation the same as the ones
obtained by using numerical integration via
adoptr::evaluate
?
1.4 Technical Setup
All scenarios are run in a single, shared R session.
Required packages are loaded here,
the random seed is defined and set centrally, and the default number
of iteration is increased to make sure that all scenarios
converge properly.
Additionally R scripts with convenience functions are sourced here as well.
There are three additional functions for this report.
rpact_design
creates a two-stage design via the package rpact (Wassmer and Pahlke 2018)
in the notation of adoptr.
sim_pr_reject
and sim_n
allow to simulate rejection probabilities
and expected sample sizes respectively by the adoptr routine simulate
.
Furthermore, global tolerances for the validation are set.
For error rates, a relative deviation of \(1\%\) from the target value is
accepted.
(Expected) Sample sizes deviations are more liberally accepted up to an
absolute deviation of \(0.5\).
library(adoptr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::n() masks adoptr::n()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rpact)
library(pwr)
library(testthat)
##
## Attaching package: 'testthat'
##
## The following object is masked from 'package:dplyr':
##
## matches
##
## The following object is masked from 'package:purrr':
##
## is_null
##
## The following objects are masked from 'package:readr':
##
## edition_get, local_edition
##
## The following object is masked from 'package:tidyr':
##
## matches
##
## The following object is masked from 'package:adoptr':
##
## expectation
library(tinytex)
# load custom functions in folder subfolder '/R'
for (nm in list.files("R", pattern = "\\.[RrSsQq]$"))
source(file.path("R", nm))
# define seed value
<- 42
seed
# define absolute tolerance for error rates
<- 0.01
tol
# define absolute tolerance for sample sizes
<- 0.5
tol_n
# define custom tolerance and iteration limit for nloptr
= list(
opts algorithm = "NLOPT_LN_COBYLA",
xtol_rel = 1e-5,
maxeval = 100000
)