This function computes a K fold cross-validation of a pre-specified machine learning model supported from the ReSurv package for a given grid of hyperparameters. The hyperparameters to be tested are provided in a list, namely hparameters_grid. Conversely, the parameters for the models run are provided separately as arguments and they are specific for each machine learning model support from.

ReSurvCV(
  IndividualDataPP,
  model,
  hparameters_grid,
  folds,
  random_seed,
  continuous_features_scaling_method = "minmax",
  print_every_n = 1L,
  nrounds = NULL,
  early_stopping_rounds = NULL,
  epochs = 1,
  parallel = F,
  ncores = 1,
  num_workers = 0,
  verbose = F,
  verbose.cv = F
)

Arguments

IndividualDataPP

IndividualDataPP object to use for the ReSurv fit cross-validation.

model

character, machine learning for cross validation.

hparameters_grid

list, grid of the hyperparameters to cross-validate.

folds

integer, number of folds (i.e. K).

random_seed

integer, random seed for making the code reproducible.

continuous_features_scaling_method

character, method for scaling continuous features.

print_every_n

integer, specific to the XGB approach, see xgboost::xgb.train documentation.

nrounds

integer, specific to XGB, max number of boosting iterations.

early_stopping_rounds

integer, specific to the XGB approach, see xgboost::xgb.train documentation.

epochs

integer, specific to the NN approach, epochs to be checked.

parallel

logical, specific to the NN approach, whether to use parallel computing.

ncores

integer, specific to NN, max number of cores used.

num_workers

numeric, number of workers for the NN approach, multi-process data loading with the specified number of loader worker processes.

verbose

logical, whether messages from the machine learning models must be printed.

verbose.cv

logical, whether messages from cross-validation must be printed.

Value

Best ReSurv model fit. The output is different depending on the machine learning approach that is required for cross-validation. A list containing:

  • out.cv: data.frame, total output of the cross-validation (all the input parameters combinations).

  • out.cv.best.oos: data.frame, combination with the best out of sample likelihood.

For XGB the columns in out.cv and out.cv.best.oos are the hyperparameters booster, eta, max_depth, subsample, alpha, lambda, min_child_weight. They also contain the metrics train.lkh, test.lkh, and the computational time time. For NN the columns in out.cv and out.cv.best.oos are the hyperparameters num_layers, optim, activation, lr, xi, eps, tie, batch_size, early_stopping, patience, node train.lkh test.lkh. They also contain the metrics train.lkh, test.lkh, and the computational time time.

References

Munir, H., Emil, H., & Gabriele, P. (2023). A machine learning approach based on survival analysis for IBNR frequencies in non-life reserving. arXiv preprint arXiv:2312.14549.