Trains a predictive model for the mark distribution of a spatio-temporal process.
data may be either (1) a data.frame containing columns x, y, size and time,
(2) a data.frame containing x, y, size (time will be derived via delta),
or (3) a ldmppr_fit object returned by estimate_process_parameters.
Allows the user to incorporate location specific information and competition indices as covariates in the mark model.
Usage
train_mark_model(
data,
raster_list = NULL,
scaled_rasters = FALSE,
model_type = "xgboost",
xy_bounds = NULL,
delta = NULL,
save_model = FALSE,
save_path = NULL,
parallel = FALSE,
num_cores = NULL,
include_comp_inds = FALSE,
competition_radius = 15,
edge_correction = "none",
selection_metric = "rmse",
cv_folds = 5,
tuning_grid_size = 200,
seed = 0,
verbose = TRUE
)Arguments
- data
a data.frame or a
ldmppr_fitobject. See Description.- raster_list
list of raster objects used for mark-model training.
- scaled_rasters
TRUEorFALSEindicating whether rasters are already scaled.- model_type
the machine learning model type (
"xgboost"or"random_forest").- xy_bounds
a vector of domain bounds (2 for x, 2 for y). If
datais anldmppr_fitandxy_boundsisNULL, defaults toc(0, b_x, 0, b_y)derived from fit.- delta
(optional) numeric scalar used only when
datacontains(x,y,size)but nottime. Ifdatais anldmppr_fitand time is missing, the function will infer thedeltavalue from the fit.- save_model
TRUEorFALSEindicating whether to save the generated model.- save_path
path for saving the generated model.
- parallel
TRUEorFALSE. IfTRUE, tuning is parallelized over resamples. For small datasets, parallel overhead may outweigh speed gains.- num_cores
number of workers to use when
parallel=TRUE. Ignored whenparallel=FALSE.- include_comp_inds
TRUEorFALSEindicating whether to generate and use competition indices as covariates.- competition_radius
positive numeric distance used when
include_comp_inds = TRUE.- edge_correction
type of edge correction to apply (
"none","toroidal", or"truncation").- selection_metric
metric to use for identifying the optimal model (
"rmse","mae", or"rsq").- cv_folds
number of cross-validation folds to use in model training. If
cv_folds <= 1, tuning is skipped and the model is fit once with default hyperparameters.- tuning_grid_size
size of the tuning grid for hyperparameter tuning.
- seed
integer seed for reproducible resampling/tuning/model fitting.
- verbose
TRUEorFALSEindicating whether to show progress of model training.
Examples
# Load the small example data
data(small_example_data)
# Load example raster data
raster_paths <- list.files(system.file("extdata", package = "ldmppr"),
pattern = "\\.tif$", full.names = TRUE
)
raster_paths <- raster_paths[!grepl("_med\\.tif$", raster_paths)]
rasters <- lapply(raster_paths, terra::rast)
# Scale the rasters
scaled_raster_list <- scale_rasters(rasters)
# Train the model
mark_model <- train_mark_model(
data = small_example_data,
raster_list = scaled_raster_list,
scaled_rasters = TRUE,
model_type = "xgboost",
xy_bounds = c(0, 25, 0, 25),
delta = 1,
parallel = FALSE,
include_comp_inds = FALSE,
competition_radius = 10,
edge_correction = "none",
selection_metric = "rmse",
cv_folds = 3,
tuning_grid_size = 2,
verbose = TRUE
)
#> [ldmppr::train_mark_model]
#> Training mark model
#> Model type: xgboost
#> Selection metric: rmse
#> CV folds: 3
#> Tuning grid size: 2
#> Include competition indices: no
#> Edge correction: none
#> Step 1/6: Preparing training data...
#> Rows: 121
#> Seed: 0
#> Done in 0.0s.
#> Step 2/6: Configuring parallel backend...
#> Parallel: off
#> Model engine threads: 1
#> Done in 0.0s.
#> Step 3/6: Extracting raster covariates...
#> Using pre-scaled rasters (scaled_rasters = TRUE).
#> Extracted 4 raster feature(s).
#> Done in 0.0s.
#> Step 4/6: Building model matrix (and competition indices if requested)...
#> Final training rows: 121
#> Final feature columns (incl x,y,time): 7
#> Done in 0.0s.
#> Step 5/6: Fitting model (with optional CV tuning)...
#> foreach backend: doSEQ | workers=1
#> Done in 2.4s.
#> Step 6/6: Finalizing output object...
#> Residual bootstrap stored (source=oos, transform=sqrt, bins=6, min/bin=8).
#> Done in 0.1s.
#> Training complete. Total time: 2.6s.
print(mark_model)
#> ldmppr Mark Model
#> engine: xgboost
#> has_fit_engine: TRUE
#> has_xgb_raw: FALSE
#> n_features: 7
#> n_rasters: 4
#> raster_names: Snodgrass_DEM_1m, Snodgrass_aspect_southness_1m,
#> Snodgrass_slope_1m, Snodgrass_wetness_index_1m
#> scaled_rasters: TRUE
#> comp_indices: FALSE