Phylogenetically aware regression of genome-influenced traits
Source:R/phyloaware_regression.R
phyloaware_regression.Rd
This function performs regression on our dataset
Usage
phyloaware_regression(
trait,
variables,
df,
first_present = NULL,
patient_id = NULL,
culture_date = NULL,
multivariable = NULL,
stepwise_direction = NULL,
entry_criteria = NULL,
retention_criteria = NULL,
confounding_criteria = NULL
)
Arguments
- trait
Trait of interest
- variables
Exposure variables of interest. Must be numeric or one-hot encoded
- df
Dataframe that contains the trait, exposure variables, and other requested variables.
- first_present
Boolean (i.e., TRUE, FALSE) whether to select a participant's first isolate with the trait.
- patient_id
Patient identifier variable stored in dataframe df. Required if first_present == TRUE.
- culture_date
Culture date used to select the participant's first isolate. Must be stored as a variable in the dataframe df. Required if first_present == TRUE.
- multivariable
Defines the multivariable selection strategy. Options include: 'purposeful', 'AIC', 'pvalue', and 'multivariable.'
- stepwise_direction
Direction of stepwise selection. Options include: 'both', 'backward', or 'forward'. For more information, see stats::step.
- entry_criteria
P-value for defining candidate variables for multivariable regression. Used in pvalue and purposeful selection. Suggestion: 0.2.
- retention_criteria
P-value for retaining candidate variables in model Used in pvalue and purposeful selection. Suggestion: 0.1.
- confounding_criteria
Percent change of effect size. Set value to be very high (i.e., 1000) if testing for confounding is not desired. Default = 0.2.
Details
Alongside univariable regression, multivariable options included in this implementation: 1. Multivariable: Standard multivariable regression of all variables 2. pvalue: A p-value informed logistic regression 3. AIC: Stepwise AIC 4. Purposeful selection: Iterative model selection accounting for p-value and confounding. Hosmer & Lemeshow 2000