Package 'cforward' reference manual

Title:	Forward Selection using Concordance/C-Index
Description:	Performs forward model selection, using the C-index/concordance in survival analysis models.
Authors:	John Muschelli [aut, cre] , Andrew Leroux [aut]
Maintainer:	John Muschelli <[email protected]>
License:	GPL-3
Version:	0.1.0
Built:	2025-01-13 03:02:19 UTC
Source:	https://github.com/muschellij2/cforward

Forward Selection Based on C-Index/Concordance

Description

Forward Selection Based on C-Index/Concordance

Usage

cforward(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables = NULL,
  included_variables = NULL,
  n_folds = 10,
  seed = 1989,
  max_model_size = 50,
  c_threshold = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

cforward_one(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables,
  included_variables = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)
cforward(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables = NULL,
  included_variables = NULL,
  n_folds = 10,
  seed = 1989,
  max_model_size = 50,
  c_threshold = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

cforward_one(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables,
  included_variables = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)

Arguments

`data`	A data set to perform model selection and cross-validation.
`event_time`	Character vector of length 1 with event times, passed to `Surv`
`event_status`	Character vector of length 1 with event status, passed to `Surv`
`weight_column`	Character vector of length 1 with weights for model. If no weights are available, set to `NULL`
`variables`	Character vector of variables to perform selection. Must be in `data`.
`included_variables`	Character vector of variables forced to have in the model. Must be in `data`
`n_folds`	Number of folds for Cross-validation. If you want to run on the full data, set to 1
`seed`	Seed set before folds are created.
`max_model_size`	maximum number of variables in the model. Selection will stop if reached. Note, this does not correspond to the number of coefficients, due to categorical variables.
`c_threshold`	threshold for concordance. If the difference in the best concordance and this one does not reach a certain threshold, break.
`verbose`	print diagnostic messages
`cfit_args`	Arguments passed to `concordancefit`. If `strata` is to be passed, set `strata_column` in this list.
`save_memory`	save only a minimal amount of information, discard the fitted models
`...`	Additional arguments to pass to `coxph`

Value

A list of lists, with elements of:

full_concordance: Concordance when fit on the full data
models: Cox model from full data set fit, stripped of large memory elements
cv_concordance: Cross-validated Concordance
included_variables: Variables included in the model, other than those being selection upon

Examples

variables = c("gender",
              "age_years_interview", "education_adult")

res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               c_threshold = 0.02,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")



res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")
threshold = 0.01
included_variables = names(conc)[c(1, diff(conc)) > threshold]

new_variables = c("diabetes", "stroke")
second_level = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = new_variables,
               included_variables = included_variables,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
second_conc = sapply(second_level, `[[`, "best_concordance")
result = second_level[[which.max(second_conc)]]
final_model = result$models[[which.max(result$cv_concordance)]]
variables = c("gender",
              "age_years_interview", "education_adult")

res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               c_threshold = 0.02,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")



res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")
threshold = 0.01
included_variables = names(conc)[c(1, diff(conc)) > threshold]

new_variables = c("diabetes", "stroke")
second_level = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = new_variables,
               included_variables = included_variables,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
second_conc = sapply(second_level, `[[`, "best_concordance")
result = second_level[[which.max(second_conc)]]
final_model = result$models[[which.max(result$cv_concordance)]]

Estimate Out-of-Sample Concordance

Description

Estimate Out-of-Sample Concordance

Usage

estimate_concordance(
  train,
  test = train,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  all_variables = NULL,
  cfit_args = list(),
  ...
)
estimate_concordance(
  train,
  test = train,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  all_variables = NULL,
  cfit_args = list(),
  ...
)

Arguments

`train`	A data set to perform model training.
`test`	A data set to estimate concordance, from fit model with `train`. Set to `train` if estimating on the same data
`event_time`	Character vector of length 1 with event times, passed to `Surv`
`event_status`	Character vector of length 1 with event status, passed to `Surv`
`weight_column`	Character vector of length 1 with weights for model. If no weights are available, set to `NULL`
`all_variables`	Character vector of variables to put in the model. All must be in `data`.
`cfit_args`	Arguments passed to `concordancefit`. If `strata` is to be passed, set `strata_column` in this list.
`...`	Additional arguments to pass to `coxph`

Value

A list of concordance and the model fit with the training data

Example Data from National Health and Nutrition Examination Survey ('NHANES')

Description

Example Data from National Health and Nutrition Examination Survey ('NHANES')

Usage

nhanes_example
nhanes_example

Format

A data.frame with 7 columns, which are:

SEQN: ID of participant
mortstat: mortality status, 1-died, 0 - censored
event_time_years: time observed
WTMEC4YR_norm: weights normalized for survey
gender: gender
age_years_interview: age in years at interview
education_adult: educational status

Package 'cforward'

Help Index

Forward Selection Based on C-Index/Concordance

Description

Usage

Arguments

Value

Examples

Estimate Out-of-Sample Concordance

Description

Usage

Arguments

Value

Example Data from National Health and Nutrition Examination Survey ('NHANES')

Description

Usage

Format