Trains a predictive model for the mark distribution of a spatio-temporal process.
data may be either (1) a data.frame containing columns x, y, size and time,
(2) a data.frame containing x, y, size (time will be derived via delta),
or (3) a ldmppr_fit object returned by estimate_process_parameters.
Allows the user to incorporate location specific information and competition indices as covariates in the mark model.
Usage
train_mark_model(
data,
raster_list = NULL,
scaled_rasters = FALSE,
model_type = "xgboost",
xy_bounds = NULL,
delta = NULL,
save_model = FALSE,
save_path = NULL,
parallel = TRUE,
n_cores = NULL,
include_comp_inds = FALSE,
competition_radius = 15,
edge_correction = "none",
selection_metric = "rmse",
cv_folds = 5,
tuning_grid_size = 200,
verbose = TRUE
)Arguments
- data
a data.frame or a
ldmppr_fitobject. See Description.- raster_list
a list of raster objects.
- scaled_rasters
TRUEorFALSEindicating whether the rasters have been scaled.- model_type
the machine learning model type (
"xgboost"or"random_forest").- xy_bounds
a vector of domain bounds (2 for x, 2 for y). If
datais anldmppr_fitandxy_boundsisNULL, defaults toc(0, b_x, 0, b_y)derived from fit.- delta
(optional) numeric scalar used only when
datacontains(x,y,size)but nottime. Ifdatais anldmppr_fitand time is missing, the function will infer thedeltavalue from the fit.- save_model
TRUEorFALSEindicating whether to save the generated model.- save_path
path for saving the generated model.
- parallel
TRUEorFALSEindicating whether to use parallelization in model training.- n_cores
number of cores to use in parallel model training (if
parallelisTRUE).- include_comp_inds
TRUEorFALSEindicating whether to generate and use competition indices as covariates.- competition_radius
distance for competition radius if
include_comp_indsisTRUE.- edge_correction
type of edge correction to apply (
"none","toroidal", or"truncation").- selection_metric
metric to use for identifying the optimal model (
"rmse","mae", or"rsq").- cv_folds
number of cross-validation folds to use in model training. If
cv_folds <= 1, tuning is skipped and the model is fit once with default hyperparameters.- tuning_grid_size
size of the tuning grid for hyperparameter tuning.
- verbose
TRUEorFALSEindicating whether to show progress of model training.
Examples
# Load the small example data
data(small_example_data)
# Load example raster data
raster_paths <- list.files(system.file("extdata", package = "ldmppr"),
pattern = "\\.tif$", full.names = TRUE
)
raster_paths <- raster_paths[!grepl("_med\\.tif$", raster_paths)]
rasters <- lapply(raster_paths, terra::rast)
# Scale the rasters
scaled_raster_list <- scale_rasters(rasters)
# Train the model
mark_model <- train_mark_model(
data = small_example_data,
raster_list = scaled_raster_list,
scaled_rasters = TRUE,
model_type = "xgboost",
xy_bounds = c(0, 25, 0, 25),
delta = 1,
parallel = FALSE,
include_comp_inds = FALSE,
competition_radius = 10,
edge_correction = "none",
selection_metric = "rmse",
cv_folds = 3,
tuning_grid_size = 2,
verbose = TRUE
)
#> Processing data...
#> Training XGBoost model...
#> Training complete!
print(mark_model)
#> <ldmppr_mark_model>
#> engine: xgboost
#> has fit_engine: TRUE
#> has xgb_raw: FALSE
#> n_features: 7