Logistic regression using purposeful selection — purposeful_selection

Uses purposeful selection algorithm to identify a regression model. Three steps exist: 1. Unadjusted logistic regression to identify candidate variables under a p-value threshold (entry_criteria) 2. Multivariable regression of candidate variables. Iterative process, starting from the value with the highest p-value, variables are retained if they fall under category of a significant variable (< retention_criteria) or confounding (i.e., effect size +/- confounding criteria). 3. All variables that failed step 1 are introduced to the model and retained if p-value < retention criteria (retention_criteria) Reference: https://scfbm.biomedcentral.com/articles/10.1186/1751-0473-3-17

Usage

purposeful_selection_algorithm(
  outcome,
  variables,
  dataset,
  entry_criteria = 0.2,
  retention_criteria = 0.1,
  confounding_criteria = 0.2
)

Arguments

outcome: Outcome of interest
variables: Exposure variables of interest. Must be numeric or one-hot encoded
dataset: Dataframe that contains the trait and exposure variables
entry_criteria: P-value criteria for entry into the model. Default = 0.2
retention_criteria: P-value criteria for retention into the model. Default = 0.1
confounding_criteria: Percent change of effect size. Set value to be very high (i.e., 1000) if testing for confounding is not desired. Default = 0.2.

Value

A list containing univariable results (ps_step_1), initial multivariable modeling (ps_step_2), model refinement with non-candidate variables (ps_step_3), and the final glm model (final_model) and table (final_model_table).