Package 'cforward'

Title: Forward Selection using Concordance/C-Index
Description: Performs forward model selection, using the C-index/concordance in survival analysis models.
Authors: John Muschelli [aut, cre] , Andrew Leroux [aut]
Maintainer: John Muschelli <[email protected]>
License: GPL-3
Version: 0.1.0
Built: 2024-11-14 03:43:11 UTC
Source: https://github.com/muschellij2/cforward

Help Index


Forward Selection Based on C-Index/Concordance

Description

Forward Selection Based on C-Index/Concordance

Usage

cforward(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables = NULL,
  included_variables = NULL,
  n_folds = 10,
  seed = 1989,
  max_model_size = 50,
  c_threshold = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

cforward_one(
  data,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  variables,
  included_variables = NULL,
  verbose = TRUE,
  cfit_args = list(),
  save_memory = FALSE,
  ...
)

make_folds(data, event_status = "mortstat", n_folds = 10, verbose = TRUE)

Arguments

data

A data set to perform model selection and cross-validation.

event_time

Character vector of length 1 with event times, passed to Surv

event_status

Character vector of length 1 with event status, passed to Surv

weight_column

Character vector of length 1 with weights for model. If no weights are available, set to NULL

variables

Character vector of variables to perform selection. Must be in data.

included_variables

Character vector of variables forced to have in the model. Must be in data

n_folds

Number of folds for Cross-validation. If you want to run on the full data, set to 1

seed

Seed set before folds are created.

max_model_size

maximum number of variables in the model. Selection will stop if reached. Note, this does not correspond to the number of coefficients, due to categorical variables.

c_threshold

threshold for concordance. If the difference in the best concordance and this one does not reach a certain threshold, break.

verbose

print diagnostic messages

cfit_args

Arguments passed to concordancefit. If strata is to be passed, set strata_column in this list.

save_memory

save only a minimal amount of information, discard the fitted models

...

Additional arguments to pass to coxph

Value

A list of lists, with elements of:

full_concordance

Concordance when fit on the full data

models

Cox model from full data set fit, stripped of large memory elements

cv_concordance

Cross-validated Concordance

included_variables

Variables included in the model, other than those being selection upon

Examples

variables = c("gender",
              "age_years_interview", "education_adult")

res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               c_threshold = 0.02,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")



res = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = variables,
               included_variables = NULL,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
conc = sapply(res, `[[`, "best_concordance")
threshold = 0.01
included_variables = names(conc)[c(1, diff(conc)) > threshold]

new_variables = c("diabetes", "stroke")
second_level = cforward(nhanes_example,
               event_time = "event_time_years",
               event_status = "mortstat",
               weight_column = "WTMEC4YR_norm",
               variables = new_variables,
               included_variables = included_variables,
               n_folds = 5,
               seed = 1989,
               max_model_size = 50,
               verbose = TRUE)
second_conc = sapply(second_level, `[[`, "best_concordance")
result = second_level[[which.max(second_conc)]]
final_model = result$models[[which.max(result$cv_concordance)]]

Estimate Out-of-Sample Concordance

Description

Estimate Out-of-Sample Concordance

Usage

estimate_concordance(
  train,
  test = train,
  event_time = "event_time_years",
  event_status = "mortstat",
  weight_column = "WTMEC4YR_norm",
  all_variables = NULL,
  cfit_args = list(),
  ...
)

Arguments

train

A data set to perform model training.

test

A data set to estimate concordance, from fit model with train. Set to train if estimating on the same data

event_time

Character vector of length 1 with event times, passed to Surv

event_status

Character vector of length 1 with event status, passed to Surv

weight_column

Character vector of length 1 with weights for model. If no weights are available, set to NULL

all_variables

Character vector of variables to put in the model. All must be in data.

cfit_args

Arguments passed to concordancefit. If strata is to be passed, set strata_column in this list.

...

Additional arguments to pass to coxph

Value

A list of concordance and the model fit with the training data


Example Data from National Health and Nutrition Examination Survey ('NHANES')

Description

Example Data from National Health and Nutrition Examination Survey ('NHANES')

Usage

nhanes_example

Format

A data.frame with 7 columns, which are:

SEQN

ID of participant

mortstat

mortality status, 1-died, 0 - censored

event_time_years

time observed

WTMEC4YR_norm

weights normalized for survey

gender

gender

age_years_interview

age in years at interview

education_adult

educational status