Perform the Hall, Horowitz, and Jing (1995) "HHJ" cross-validation algorithm to select the optimal block-length for a bootstrap on dependent data (block-bootstrap). Dependent data such as stationary time series are suitable for usage with the HHJ algorithm.

hhj(
series,
nb = 100L,
n_iter = 10L,
pilot_block_length = NULL,
sub_sample = NULL,
k = "two-sided",
bofb = 1L,
search_grid = NULL,
grid_step = c(1L, 1L),
cl = NULL,
verbose = TRUE,
plots = TRUE
)

## Arguments

series a numeric vector or time series giving the original data for which to find the optimal block-length for. an integer value, number of bootstrapped series to compute. an integer value, maximum number of iterations for the HHJ algorithm to compute. a numeric value, the block-length ($$l*$$ in HHJ) for which to perform initial block bootstraps. a numeric value, the length of each overlapping subsample, $$m$$ in HHJ. a character string, either "bias/variance", "one-sided", or "two-sided" depending on the desired object of estimation. If the desired bootstrap statistic is bias or variance then select "bias/variance" which sets $$k = 3$$ per HHJ. If the object of estimation is the one-sided or two-sided distribution function, then set k = "one-sided" or k = "two-sided" which sets $$k = 4$$ and $$k = 5$$, respectively. For the purpose of generating symmetric confidence intervals around an unknown parameter, k = "two-sided" (the default) should be used. a numeric value, length of the basic blocks in the block-of-blocks bootstrap, see m =  for tsbootstrap and Kunsch (1989). a numeric value, the range of solutions around $$l*$$ to evaluate within the $$MSE$$ function after the first iteration. The first iteration will search through all the possible block-lengths unless specified in grid_step = . a numeric value or vector of at most length 2, the number of steps to increment over the subsample block-lengths when evaluating the $$MSE$$ function. If grid_step = 1 then each block-length will be evaluated in the $$MSE$$ function. If grid_step > 1, the $$MSE$$ function will search over the sequence of block-lengths from 1 to m by grid_step. If grid_step is a vector of length 2, the first iteration will step by the first element of grid_step and subsequent iterations will step by the second element. a cluster object, created by package parallel, doParallel, or snow. If NULL, no parallelization will be used. a logical value, if set to FALSE then no interim messages are output to the console. Error messages will still be output. Default is TRUE. a logical value, if set to FALSE then no interim plots are output to the console. Default is TRUE.

## Value

an object of class 'hhj'

## Details

The HHJ algorithm is computationally intensive as it relies on a cross-validation process using a type of subsampling to estimate the mean squared error ($$MSE$$) incurred by the bootstrap at various block-lengths.

Under-the-hood, hhj() makes use of tsbootstrap, see Trapletti and Hornik (2020), to perform the moving block-bootstrap (or the block-of-blocks bootstrap by setting bofb > 1) according to Kunsch (1989).

Adrian Trapletti and Kurt Hornik (2020). tseries: Time Series Analysis and Computational Finance. R package version 0.10-48.

Kunsch, H. (1989) The Jackknife and the Bootstrap for General Stationary Observations. The Annals of Statistics, 17(3), 1217-1241. Retrieved February 16, 2021, from http://www.jstor.org/stable/2241719

Peter Hall, Joel L. Horowitz, Bing-Yi Jing, On blocking rules for the bootstrap with dependent data, Biometrika, Volume 82, Issue 3, September 1995, Pages 561-574, DOI: doi: 10.1093/biomet/82.3.561#'

## Examples

# \donttest{
# Generate AR(1) time series
sim <- stats::arima.sim(list(order = c(1, 0, 0), ar = 0.5),
n = 500, innov = rnorm(500))

# Calculate optimal block length for series
hhj(sim, sub_sample = 10)
#>  Pilot block length is: 3#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo #> Performing minimization may take some time#> Calculating MSE for each level in subsample: 10 function evaluations required.#>  Chosen block length: 22  After iteration: 1 #>  Converged at block length (l): 22 #> $Optimal Block Length #>  22 #> #>$Subsample block size (m)
#>  10
#>
#> $MSE Data #> Iteration BlockLength MSE #> 1 1 2 0.3392102 #> 2 1 4 0.3505841 #> 3 1 7 0.3549793 #> 4 1 9 0.3639882 #> 5 1 11 0.3567793 #> 6 1 13 0.3727141 #> 7 1 15 0.3616106 #> 8 1 17 0.3607994 #> 9 1 20 0.3435575 #> 10 1 22 0.3248534 #> 11 2 2 0.3442814 #> 12 2 4 0.3555621 #> 13 2 7 0.3639989 #> 14 2 9 0.3704648 #> 15 2 11 0.3637975 #> 16 2 13 0.3786977 #> 17 2 15 0.3682302 #> 18 2 17 0.3668415 #> 19 2 20 0.3475716 #> 20 2 22 0.3282302 #> #>$Iterations
#>  2
#>
#> $Series #>  "sim" #> #>$Call
#> hhj(series = sim, sub_sample = 10)
#>
#> attr(,"class")
#>  "hhj"

# Use parallel computing
library(parallel)

# Make cluster object with 2 cores
cl <- makeCluster(2)

# Calculate optimal block length for series
hhj(sim, cl = cl)
#>  Pilot block length is: 3#> Performing minimization may take some time#> Calculating MSE for each level in subsample: 12 function evaluations required.#>  Chosen block length: 25  After iteration: 1 #>  Converged at block length (l): 25 #> $Optimal Block Length #>  25 #> #>$Subsample block size (m)
#>  12
#>
#> $MSE Data #> Iteration BlockLength MSE #> 1 1 2 0.2825363 #> 2 1 4 0.2883449 #> 3 1 6 0.2982917 #> 4 1 8 0.3022531 #> 5 1 11 0.3045740 #> 6 1 13 0.3073589 #> 7 1 15 0.3190245 #> 8 1 17 0.3154190 #> 9 1 19 0.3038427 #> 10 1 21 0.2959165 #> 11 1 23 0.2826864 #> 12 1 25 0.2747927 #> 13 2 2 0.2843267 #> 14 2 4 0.2855251 #> 15 2 6 0.2951802 #> 16 2 8 0.3008596 #> 17 2 11 0.3028227 #> 18 2 13 0.3042713 #> 19 2 15 0.3198351 #> 20 2 17 0.3128201 #> 21 2 19 0.3064508 #> 22 2 21 0.2958046 #> 23 2 23 0.2830274 #> 24 2 25 0.2744703 #> #>$Iterations
#>  2
#>
#> $Series #>  "sim" #> #>$Call
#> hhj(series = sim, cl = cl)
#>
#> attr(,"class")
#>  "hhj"# }