Confidence interval methods for output of resampling

Methods for confint to compute confidence intervals on numerical vectors and numerical components of data frames.

# S3 method for numeric
confint(
  object,
  parm,
  level = 0.95,
  ...,
  method = "percentile",
  margin.of.error = "stderr" %in% method == "stderr"
)

# S3 method for do.tbl_df
confint(
  object,
  parm,
  level = 0.95,
  ...,
  method = "percentile",
  margin.of.error = "stderr" %in% method,
  df = NULL
)

# S3 method for do.data.frame
confint(
  object,
  parm,
  level = 0.95,
  ...,
  method = "percentile",
  margin.of.error = "stderr" %in% method,
  df = NULL
)

# S3 method for data.frame
confint(object, parm, level = 0.95, ...)

# S3 method for summary.lm
confint(object, parm, level = 0.95, ...)

Arguments

object: and R object
parm: a vector of parameters
level: a confidence level
...: additional arguments
method: a character vector of methods to use for creating confidence intervals. Choices are "percentile" (or "quantile") which is the default, "stderr" (or "se"), "bootstrap-t", and "reverse" (or "basic"))
margin.of.error: if true, report intervals as a center and margin of error.
df: degrees for freedom. This is required when object was produced using link{do} when using the standard error to compute the confidence interval since typically this information is not recorded in these objects. The default (Inf) uses a normal critical value rather than a one derived from a t-distribution.

Value

When applied to a data frame, returns a data frame giving the confidence interval for each variable in the data frame using t.test or binom.test, unless the data frame was produced using do, in which case it is assumed that each variable contains resampled statistics that serve as an estimated sampling distribution from which a confidence interval can be computed using either a central proportion of this distribution or using the standard error as estimated by the standard deviation of the estimated sampling distribution. For the standard error method, the user must supply the correct degrees of freedom for the t distribution since this information is typically not available in the output of do().

When applied to a numerical vector, returns a vector.

Details

The methods of producing confidence intervals from bootstrap distributions are currently quite naive. In particular, when using the standard error, assistance may be required with the degrees of freedom, and it may not be possible to provide a correct value in all situations. None of the methods include explicit bias correction. Let $q_a$ be the $a$ quantile of the bootstrap distribution, let $t_a, df$ be the $a$ quantile of the t distribution with $df$ degrees of freedom, let $SE_b$ be the standard deviation of the bootstrap distribution, and let $\hat{\theta}$ be the estimate computed from the original data. Then the confidence intervals with confidence level $1 - 2a$ are

quantile: $(q_a, q_{1-a})$
reverse: $( 2 \hat{\theta} - q_{1-a}, 2\hat{\theta} - q_{a} )$
stderr: $(\hat{\theta} - t_{1-a,df} SE_b, \hat{\theta} + t_{1-a,df} SE_b)$ . When df is not provided, at attempt is made to determine an appropriate value, but this should be double checked. In particular, missing data an lead to unreliable results.

References

Tim C. Hesterberg (2015): What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, The American Statistician, https://www.tandfonline.com/doi/full/10.1080/00031305.2015.1089789.

Examples

if (require(mosaicData)) {
  bootstrap <- do(500) * diffmean( age ~ sex, data = resample(HELPrct) )
  confint(bootstrap)
  confint(bootstrap, method = "percentile")
  confint(bootstrap, method = "boot")
  confint(bootstrap, method = "se", df = nrow(HELPrct) - 1)
  confint(bootstrap, margin.of.error = FALSE)
  confint(bootstrap, margin.of.error = TRUE, level = 0.99, 
    method = c("se", "perc") )
    
  # bootstrap t method requires both mean and sd
  bootstrap2 <- do(500) * favstats(resample(1:10)) 
  confint(bootstrap2, method = "boot")
}
#> Using parallel package.
#>   * Set seed with set.rseed().
#>   * Disable this message with options(`mosaic:parallelMessage` = FALSE)
#> Warning: confint: Unable to compute any of the desired CIs
#> Using parallel package.
#>   * Set seed with set.rseed().
#>   * Disable this message with options(`mosaic:parallelMessage` = FALSE)
#>   name    lower    upper level      method estimate
#> 1 mean 2.926496 7.933412  0.95 bootstrap-t      5.5
lm(width ~ length * sex, data = KidsFeet) |>
  summary() |>
  confint()
#>                   2.5 %    97.5 %
#> (Intercept)  0.09816469 7.6059964
#> length       0.06326171 0.3619858
#> sexG        -5.70117633 4.4534183
#> length:sexG -0.18914537 0.2207867