In regression analysis, bootstrapping is a method for statisticalinference, which focused on building a sampling distribution with the key ideaof resampling the originally observed data with replacement.

The termbootstrapping, proposed by Bradley Efron in his “Bootstrap methods:another look at the jackknife” published in 1979, is extracted from the clichéof ‘pulling oneself up by one’s bootstraps’. So, from the meaning of thisconcept, sample data is considered as a population and repeated samples aredrawn from the sample data, which is considered as a population, to generatethe statistical inference about the sample data. The essential bootstrap analogy states that “thepopulation is to the sample as the sample is to the bootstrap samples”.

The bootstrap falls into two types, parametric and nonparametric. Parametricbootstrapping assumes that the original data set is drawn from some specificdistributions, e.g. normal distribution. And the samples generally are pulled as the same sizeas the original data set.

Nonparametric bootstrapping is just the one describedin the beginning, which draws a portion of bootstrapping samples from theoriginal data. Bootstrapping is quite useful in non-linear regression andgeneralized linear models. For small sample size, the parametric bootstrappingmethod is highly preferred. In large sample size, nonparametric bootstrappingmethod would be preferably utilized. For a further clarification of nonparametricbootstrapping, a sample data set, A = {x1, x2, …

, xk} is randomly drawn froma population B = {X1, X2, …, XK} and K is much larger than k. The statistic T= t(A) is considered as an estimate of the corresponding population parameter P= t(B).

Nonparametric bootstrapping generates the estimate of the samplingdistribution of a statistic in an empirical way. No assumptions of the form of the populationis necessary. Next, a sample of size k is drawn from the elements of A with replacement,which represents as A?1 ={x?11,x?12,.

.., x?1k}.In the resampling, a * note is added to distinguish resampled data fromoriginal data. Replacement is mandatory and supposed to be repeated typically1000 or 10000 times, which is still developing since computation power develops,otherwise only original sample A would be generated. And for each bootstrap estimate of these samples, mean iscalculated to estimate the expectation of the bootstrapped statistics.

Mean minus T is the estimate of T’s bias. AndT?, the bootstrap variance estimate,estimates the sampling variance of thepopulation, P. Then bootstrap confidence intervals can be constructed usingeither bootstrap percentile interval approach or normal theory intervalapproach.

Confidence intervals by bootstrap percentile method is to use the empiricalquantiles of the bootstrap estimates, which is written as T?(lower) < P < T?(upper). More specifically, it can be written as Tˆ ?(Tˆ ?upper – T*ˆ) ? P ? Tˆ + (T*ˆ + Tˆ ?lower). Bootstrapping is an effectivemethod to doublecheck the stability of the model estimation results. It is muchbetter than the intervals calculated by sample variance with normalityassumption. And simplicity is bootstrapping's another important benefit.

Forcomplicated estimators, such as correlation coefficients, percentile points,for complex parameters in the distribution, it is a pretty simple way to generateestimates of confidence intervals and standard errors. However, simplicity can alsobring up disadvantage for bootstrapping, which makes the important assumptions forthe bootstrapping easy to neglect. And bootstrapping is often over-optimisticand doesn’t assure finite sample size. There are several types of bootstrapping schemes in the regressionproblems. One typical approach is to resample residuals in the regressionmodels. The main procedure is firstly fit the original data set with the model,and generate model estimates, ?ˆ and calculate residuals, ?ˆ; secondly randomlyand repeatedly sample the residuals (typically 1000 or 10000 times) to get Ksets residuals of size k and add each resampled residual to the originalequation, generating bootstrapped Y*; Finally use bootstrapped Y* to refit themodel and get bootstrap estimate ?ˆ?.

Another typical approach in the regression context is random-xresampling, which is also called case resampling. We can either apply MonteCarlo algorithm, which is to repeatedly resample the data of the same size asthe original data set with replacement, or identify any possible resampling ofthe data set. In our case, before fitting regression model with the original predictorvariable and response pairs (xi, yi), for i = 1, 2, . . ., k, these data pairsare resampled to get K new data pairs of size k.

Then the regression model isfit to each of these K new data sets. ?ˆ? is generated from K parameter estimates. In the next section, I’m going to review the nonparametric bootstrappingpackage in R with some examples in my research area—–populationpharmacokinetics analysis. In R, a package is called “boot”, which provides varioussources for bootstrapping either a single statistic or a vector. To run theboot function in the boot library, there are 3 necessary parameters: 1) data, which canbe a vector, matrix, or data frame for bootstrap resampling; 2) statistic, thefunction that produces the statistic for bootstrapping. This function shouldinclude the data set and an indices parameter, giving the selection of casesfor each resampling; 3) R, the number of resamplingtimes.The function boot() runs the statistic function for Rtimes. In each call, it generates a group of random indices with replacement toselect a sample.

Then calculated statistics for each sample are collected inthe bootobject function. So the function boot() is used as bootobject <- boot(data= , statistic= , R=,...). After seeing the satisfying plot, we use boot.ci(bootobject, conf=, type=) to get confidence intervals.

Bootstrapping is prevalently used inthe population analysis of clinical trials in pharmaceutical/biotechindustries. It is a pretty useful tool to assess and control the model analysisstability. A good example is how bootstrapping validates populationpharmacokinetic (PK) model for Triptan, a vasopressor used for the acutetreatment of migraine attack. A single oral dose of 50 mg was given to 26healthy Korean male subjects. Plasma data were obtained for pre-dose, 0.25,0.5, 0.

75, 1, 1.5, 2, 2.5, 3, 4, 6, 8, 10, and 12 h post-dose. Population PKanalysis of Triptan was performed using plasma concentration data by oursoftware called NONMEM building models using differential equations.

Total 364 observationsof plasma concentrations were successfully described by a one-compartment modelwith first-order of both absorption with lag time and elimination, and acombined transit compartment. The model scheme is shown as Figure 1 as below:Figure 1: The scheme of the final PK model of TriptanThe final model was validated through a 1000-timeresampling bootstrapping. The procedure was conducted with 1000 datasets resampledfrom the original dataset. The median and 90% confidence intervals ofparameters were shown in the Table 1 to compare with the final parameterestimates. Results from the visual prediction check with 1000 simulations Table 1: NONMEM estimated Parametersand Bootstrap Resultswere assessed by visual comparison of the gray area of 90%prediction interval from the simulated data with an overlay of the circled rawdata. Any excess of data going outside the gray area indicates that theestimates were not robust.Figure 2:Visual predictive check plot of the model from time 0 to 12 h after a singleoral administration of 50 mg Triptan.

Circles represent the raw data set: the90% confidence interval of the 1000 times simulations (gray area), and observedconcentration (solid line) of the 5th, median, and 95th percentiles. Our conclusion is that thefinal model and its estimated parameter were sufficiently robust and stable bythe assessment of the bootstrapping. All estimated parameter from the finalmodel were within the 95% bootstrap confidence intervals.