Title: | Forward Selection using Concordance/C-Index |
---|---|
Description: | Performs forward model selection, using the C-index/concordance in survival analysis models. |
Authors: | John Muschelli [aut, cre] , Andrew Leroux [aut] |
Maintainer: | John Muschelli <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2025-01-13 03:02:19 UTC |
Source: | https://github.com/muschellij2/cforward |
Forward Selection Based on C-Index/Concordance
cforward( data, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = NULL, included_variables = NULL, n_folds = 10, seed = 1989, max_model_size = 50, c_threshold = NULL, verbose = TRUE, cfit_args = list(), save_memory = FALSE, ... ) cforward_one( data, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables, included_variables = NULL, verbose = TRUE, cfit_args = list(), save_memory = FALSE, ... ) make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)
cforward( data, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = NULL, included_variables = NULL, n_folds = 10, seed = 1989, max_model_size = 50, c_threshold = NULL, verbose = TRUE, cfit_args = list(), save_memory = FALSE, ... ) cforward_one( data, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables, included_variables = NULL, verbose = TRUE, cfit_args = list(), save_memory = FALSE, ... ) make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)
data |
A data set to perform model selection and cross-validation. |
event_time |
Character vector of length 1 with event times, passed to
|
event_status |
Character vector of length 1 with event status, passed to
|
weight_column |
Character vector of length 1 with weights for
model. If no weights are available, set to |
variables |
Character vector of variables to perform selection.
Must be in |
included_variables |
Character vector of variables
forced to have in the model. Must be in |
n_folds |
Number of folds for Cross-validation. If you want to run on the full data, set to 1 |
seed |
Seed set before folds are created. |
max_model_size |
maximum number of variables in the model. Selection will stop if reached. Note, this does not correspond to the number of coefficients, due to categorical variables. |
c_threshold |
threshold for concordance. If the difference in the best concordance and this one does not reach a certain threshold, break. |
verbose |
print diagnostic messages |
cfit_args |
Arguments passed to |
save_memory |
save only a minimal amount of information, discard the fitted models |
... |
Additional arguments to pass to |
A list of lists, with elements of:
Concordance when fit on the full data
Cox model from full data set fit, stripped of large memory elements
Cross-validated Concordance
Variables included in the model, other than those being selection upon
variables = c("gender", "age_years_interview", "education_adult") res = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = variables, included_variables = NULL, n_folds = 5, c_threshold = 0.02, seed = 1989, max_model_size = 50, verbose = TRUE) conc = sapply(res, `[[`, "best_concordance") res = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = variables, included_variables = NULL, n_folds = 5, seed = 1989, max_model_size = 50, verbose = TRUE) conc = sapply(res, `[[`, "best_concordance") threshold = 0.01 included_variables = names(conc)[c(1, diff(conc)) > threshold] new_variables = c("diabetes", "stroke") second_level = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = new_variables, included_variables = included_variables, n_folds = 5, seed = 1989, max_model_size = 50, verbose = TRUE) second_conc = sapply(second_level, `[[`, "best_concordance") result = second_level[[which.max(second_conc)]] final_model = result$models[[which.max(result$cv_concordance)]]
variables = c("gender", "age_years_interview", "education_adult") res = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = variables, included_variables = NULL, n_folds = 5, c_threshold = 0.02, seed = 1989, max_model_size = 50, verbose = TRUE) conc = sapply(res, `[[`, "best_concordance") res = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = variables, included_variables = NULL, n_folds = 5, seed = 1989, max_model_size = 50, verbose = TRUE) conc = sapply(res, `[[`, "best_concordance") threshold = 0.01 included_variables = names(conc)[c(1, diff(conc)) > threshold] new_variables = c("diabetes", "stroke") second_level = cforward(nhanes_example, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", variables = new_variables, included_variables = included_variables, n_folds = 5, seed = 1989, max_model_size = 50, verbose = TRUE) second_conc = sapply(second_level, `[[`, "best_concordance") result = second_level[[which.max(second_conc)]] final_model = result$models[[which.max(result$cv_concordance)]]
Estimate Out-of-Sample Concordance
estimate_concordance( train, test = train, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", all_variables = NULL, cfit_args = list(), ... )
estimate_concordance( train, test = train, event_time = "event_time_years", event_status = "mortstat", weight_column = "WTMEC4YR_norm", all_variables = NULL, cfit_args = list(), ... )
train |
A data set to perform model training. |
test |
A data set to estimate concordance, from fit model with |
event_time |
Character vector of length 1 with event times, passed to
|
event_status |
Character vector of length 1 with event status, passed to
|
weight_column |
Character vector of length 1 with weights for
model. If no weights are available, set to |
all_variables |
Character vector of variables to put in the
model. All must be in |
cfit_args |
Arguments passed to |
... |
Additional arguments to pass to |
A list of concordance and the model fit with the training data
Example Data from National Health and Nutrition Examination Survey ('NHANES')
nhanes_example
nhanes_example
A data.frame
with 7 columns, which are:
ID of participant
mortality status, 1-died, 0 - censored
time observed
weights normalized for survey
gender
age in years at interview
educational status