11.1.12 Confidence Bands for the Typical Value of y Given x
This section deals with computing a confidence band, sometimes called prediction bands, for , the typical value of y given x, that allows heteroscedasticity. More precisely, if the parameters and are estimated based on the random sample , the goal is to compute a confidence interval for () such that the simultaneous probability coverage is approximately . And there is the related goal of testing the n hypotheses
where is some specified constant. (For a review of methods based on the least squares estimator that assume normality and homoscedasticity, see Liu, Lin, & Piegorsch, 2008.)
The basic strategy mimics the approach used by the two-sample version of Students t test. Begin by assuming normality and homoscedasticity, determine an appropriate critical value based on the sample size and the regression estimator that is used in conjunction with an obvious test statistic, and then study the impact of non-normality and heteroscedasticity via simulations.
First consider a single value for the covariate, x. Let denote the squared standard error of , an estimate of , where and are estimates of and , respectively, based on some regression estimator to be determined. A basic percentile bootstrap method is used to estimate (e.g., Efron & Tibshirani, 1993). More precisely, generate a bootstrap sample by randomly sampling with replacement n pairs of points from yielding . Based on this bootstrap sample, estimate the intercept and slope and label the results and , which yields . Repeat this B times yielding , in which case an estimate of is
where . (In terms of controlling the probability of a Type I error, appears to suffice.) Then the hypothesis given by (11.7) can be tested with
once an appropriate critical value has been determined.
Momentarily assume that W has a standard normal distribution, in which case a p-value can be determined for each , . Denote the resulting p-values by and let . As is evident, if , the α quantile of , can be determined, the probability of one or more Type I errors can be controlled simply by rejecting ith hypothesis if and only if . And in addition, confidence intervals for each can be computed that have simultaneous probability coverage .
The distribution of is approximated in the following manner. Momentarily assume that both the error term ϵ and x have a standard normal distribution and consider the case . Then a simulation can be performed yielding an estimate of the α quantile of the distribution of . In effect, generate n pairs of observations from a bivariate normal distribution having correlation zero yielding . Compute and repeat this process A times yielding . Put these A values in ascending order yielding and let rounded to the nearest integer. Then the α quantile of , , is estimated with . Moreover, the simultaneous probability coverage among the n confidence intervals
is approximately , where z is the quantile of a standard normal distribution, and is the corresponding estimate of the standard error. Here are some estimates of when and when using the Theil–Sen (TS) estimator, the modification of Theil–Sen estimator based on the Harrell–Davis estimator (TSHD), OLS and the quantile regression estimator (QREG):
As can be seen, the value depends on the sample size when using least squares regression, as expected. In contrast, when using the robust regression estimators, the estimated values suggest that there is little or nor variation in the value of as a function of the sample size, at least when .
Of course, a crucial issue is how well the method performs when dealing with non-normality and heteroscedasticity. Simulations indicate that it performs well when testing at the 0.05 level and (Wilcox, 2016c). Even OLS performed tolerably well, but generally using the Theil–Sen estimator or the quantile regression estimator provides better control over the Type I error probability. (When using least squares regression, Faraway & Sun, 1995, derived an alternative method that allows heteroscedasticity.)