Title: | Risk-adjusted Models, Standardized Measure Calculation, Hypothesis Testing and Visualization for Provider Profiling |
---|---|
Description: | Implements linear and generalized linear models for provider profiling, incorporating both fixed and random effects. For large-scale providers, the linear profiled-based method and the SerBIN method for binary data reduce the computational burden. Provides post-modeling features, such as indirect and direct standardization measures, hypothesis testing, confidence intervals, and post-estimation visualization. For more information, see Wu et al. (2022) <doi:10.1002/sim.9387>. |
Authors: | Xiaohan Liu [aut, cre], Lingfeng Luo [aut], Yubo Shao [aut], Xiangeng Fang [aut], Wenbo Wu [aut], Kevin He [aut] |
Maintainer: | Xiaohan Liu <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.1 |
Built: | 2024-12-13 05:50:47 UTC |
Source: | https://github.com/um-kevinhe/pprof |
Generate a bar plot for flagging percentage.
bar_plot( flag_df, group_num = 4, bar_colors = c("#66c2a5", "#fc8d62", "#8da0cb"), bar_width = 0.7, label_color = "black", label_size = 4 )
bar_plot( flag_df, group_num = 4, bar_colors = c("#66c2a5", "#fc8d62", "#8da0cb"), bar_width = 0.7, label_color = "black", label_size = 4 )
flag_df |
a data frame from |
group_num |
number of groups into which providers are divided based on their sample sizes. The default is 4. |
bar_colors |
a vector of colors used to fill the bars representing the categories. The default is c("#66c2a5", "#fc8d62", "#8da0cb"). |
bar_width |
width of the bars in the bar chart. The default is 0.7. |
label_color |
color of the text labels inside the bars. The default is "black". |
label_size |
size of the text labels inside the bars. The default is 4. |
This function generates a bar chart to visualize the percentage of flagging results based on provider sizes.
The input data frame test_df
must be the output from package pprof
's test
function.
Providers are grouped into a specified number of groups (group_num
) based on their sample sizes, where
the number of providers are approximately equal across groups. An additional "overall" group is
included to show the flagging results across all providers.
A ggplot object representing the bar chart of flagging results.
test.linear_fe
, test.linear_re
, test.logis_fe
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) test_linear <- test(fit_linear) bar_plot(test_linear) data(ExampleDataBinary) fit_logis <- logis_fe(Y = ExampleDataBinary$Y, Z = ExampleDataBinary$Z, ID = ExampleDataBinary$ID, message = FALSE) test_logis <- test(fit_logis) bar_plot(test_logis)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) test_linear <- test(fit_linear) bar_plot(test_linear) data(ExampleDataBinary) fit_logis <- logis_fe(Y = ExampleDataBinary$Y, Z = ExampleDataBinary$Z, ID = ExampleDataBinary$ID, message = FALSE) test_logis <- test(fit_logis) bar_plot(test_logis)
Generate a caterpillar plot for standardized measures from different models using a provided CI dataframe.
caterpillar_plot( CI, point_size = 2, point_color = "#475569", refline_value = NULL, refline_color = "#64748b", refline_size = 1, refline_type = "dashed", errorbar_width = 0, errorbar_size = 0.5, errorbar_alpha = 0.5, errorbar_color = "#94a3b8", use_flag = FALSE, flag_color = c("#E69F00", "#56B4E9", "#009E73") )
caterpillar_plot( CI, point_size = 2, point_color = "#475569", refline_value = NULL, refline_color = "#64748b", refline_size = 1, refline_type = "dashed", errorbar_width = 0, errorbar_size = 0.5, errorbar_alpha = 0.5, errorbar_color = "#94a3b8", use_flag = FALSE, flag_color = c("#E69F00", "#56B4E9", "#009E73") )
CI |
a dataframe from |
point_size |
size of the points in the caterpillar plot. The default value is 2. |
point_color |
color of the points in the plot. The default value is "#475569". |
refline_value |
value of the horizontal reference line, for which the standardized measures are compared. The default value is NULL. |
refline_color |
color of the reference line. The default value is "#64748b". |
refline_size |
size of the reference line. The default value is 1. |
refline_type |
line type for the reference line. The default value is "dashed". |
errorbar_width |
the width of the error bars (horizontal ends of the CI bars). The default value is 0. |
errorbar_size |
the thickness of the error bars. The default value is 0.5. |
errorbar_alpha |
transparency level for the error bars. A value between 0 and 1, where 0 is completely transparent and 1 is fully opaque. The default value is 0.5. |
errorbar_color |
color of the error bars. The default value is "#94a3b8". |
use_flag |
logical; if |
flag_color |
vector of colors used for flagging providers when |
This function creates a caterpillar plot to visualize the standardized measures (indirect or direct).
The input CI
must be a dataframe output from package pprof
's confint
function.
Each provider's standardized measure value is represented as a point, and a reference line is shown at the value specified by refline_value
(default is NULL).
If refline_value
is not specified, for linear FE or RE models with indirect or direct standardized differences, it will be set to 0;
for logistic FE models with indirect or direct ratios, it will be set to 1;
and for logistic FE with indirect or direct rates, it will be set to the population rate, which represents the average rate across all observations.
Confidence intervals (CI) are displayed as error bars: for alternative = "two.sided"
, two-sided confidence intervals are shown;
for alternative = "greater"
, error bars extend from the lower bound to the standardized measure values;
and for alternative = "less"
, they extend from the standardized measure values to the upper bound.
For cases where one side of the confidence interval is infinite, that side only extends to the standardized measure.
For example, in a logistic fixed effect model, if a provider has all 0s or all 1s, one side of the confidence interval will be infinite.
When use_flag = TRUE
, the plot will use colors specified by flag_color
to show the flags of providers.
Each error bar will be colored to reflect the flag, making it easy to identify providers with different performance levels.
When use_flag = FALSE
, all error bars will have the same color, specified by errorbar_color
.
This provides a simpler visualization without flagging individual providers.
A ggplot object which is a caterpillar plot for the standardized measures.
confint.linear_fe
, confint.linear_re
, confint.logis_fe
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) CI_linear <- confint(fit_linear) caterpillar_plot(CI_linear$CI.indirect, use_flag = TRUE, errorbar_width = 0.5, errorbar_size = 1) data(ExampleDataBinary) fit_logis <- logis_fe(Y = ExampleDataBinary$Y, Z = ExampleDataBinary$Z, ID = ExampleDataBinary$ID, message = FALSE) CI_logis <- confint(fit_logis) caterpillar_plot(CI_logis$CI.indirect_ratio, use_flag = TRUE, errorbar_width = 0.5, errorbar_size = 1)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) CI_linear <- confint(fit_linear) caterpillar_plot(CI_linear$CI.indirect, use_flag = TRUE, errorbar_width = 0.5, errorbar_size = 1) data(ExampleDataBinary) fit_logis <- logis_fe(Y = ExampleDataBinary$Y, Z = ExampleDataBinary$Z, ID = ExampleDataBinary$ID, message = FALSE) CI_logis <- confint(fit_logis) caterpillar_plot(CI_logis$CI.indirect_ratio, use_flag = TRUE, errorbar_width = 0.5, errorbar_size = 1)
linear_fe
objectProvide confidence intervals for provider effects or standardized measures from from a fixed effect linear model.
## S3 method for class 'linear_fe' confint( fit, parm, level = 0.95, option = "SM", stdz = "indirect", null = "median", alternative = "two.sided", ... )
## S3 method for class 'linear_fe' confint( fit, parm, level = 0.95, option = "SM", stdz = "indirect", null = "median", alternative = "two.sided", ... )
fit |
a model fitted from |
parm |
specify a subset of providers for which confidence intervals are given.
By default, all providers are included. The class of |
level |
the confidence level. The default value is 0.95. |
option |
a character string specifying whether the confidence intervals should be provided for provider effects or standardized measures:
|
stdz |
a character string or a vector specifying the standardization method
if |
null |
a character string or a number specifying the population norm for calculating standardized measures
if |
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
A list of data frames containing the confidence intervals based on the values of option
and stdz
.
CI.gamma |
Confidence intervals for provider effects if |
CI.indirect |
Confidence intervals for indirect standardized differences if |
CI.direct |
Confidence intervals for direct standardized differences if |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) confint(fit_linear)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) confint(fit_linear)
linear_re
objectProvide confidence intervals for provider effects or standardized measures from a random effect linear model.
## S3 method for class 'linear_re' confint( fit, parm, level = 0.95, option = "SM", stdz = "indirect", alternative = "two.sided", ... )
## S3 method for class 'linear_re' confint( fit, parm, level = 0.95, option = "SM", stdz = "indirect", alternative = "two.sided", ... )
fit |
a model fitted from |
parm |
specify a subset of providers for which confidence intervals are given.
By default, all providers are included. The class of |
level |
the confidence level. The default value is 0.95. |
option |
a character string specifying whether the confidence intervals should be provided for provider effects or standardized measures:
|
stdz |
a character string or a vector specifying the standardization method
if |
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
A list of data frames containing the confidence intervals based on the values of option
and stdz
.
CI.alpha |
Confidence intervals for provider effects if |
CI.indirect |
Confidence intervals for indirect standardized differences if |
CI.direct |
Confidence intervals for direct standardized differences if |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y ID <- ExampleDataLinear$ID covar <- ExampleDataLinear$Z fit_re <- linear_re(Y = outcome, Z = covar, ID = ID) confint(fit_re)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y ID <- ExampleDataLinear$ID covar <- ExampleDataLinear$Z fit_re <- linear_re(Y = outcome, Z = covar, ID = ID) confint(fit_re)
logis_fe
objectProvide confidence intervals for provider effects or standardized measures from a fixed effect logistic model.
## S3 method for class 'logis_fe' confint( fit, parm, level = 0.95, test = "exact", option = "SM", stdz = "indirect", null = "median", measure = c("rate", "ratio"), alternative = "two.sided", ... )
## S3 method for class 'logis_fe' confint( fit, parm, level = 0.95, test = "exact", option = "SM", stdz = "indirect", null = "median", measure = c("rate", "ratio"), alternative = "two.sided", ... )
fit |
a model fitted from |
parm |
specify a subset of providers for which confidence intervals are given.
By default, all providers are included. The class of |
level |
the confidence level. The default value is 0.95. |
test |
a character string specifying the type of testing method. The default is "exact".
|
option |
a character string specifying whether the confidence intervals should be provided for provider effects or standardized measures:
|
stdz |
a character string or a vector specifying the standardization method
if |
null |
a character string or a number defining the population norm if |
measure |
a character string or a vector indicating whether the output measure is "ratio" or "rate" if
|
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
The wald test is invalid for extreme providers (i.e. when provider effect goes to infinity). We suggest using score or exact test to generate confidence intervals.
A dataframe (option = "gamma"
) or a list of data frames (option = "SM"
) containing the point estimate, and lower and upper bounds of the estimate.
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) confint(fit_fe, option = "gamma") confint(fit_fe, option = "SM")
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) confint(fit_fe, option = "gamma") confint(fit_fe, option = "SM")
Conduct data quality check including checking missingness, variation, correlation and VIF of variables.
data_check(Y, Z, ID)
data_check(Y, Z, ID)
Y |
a numeric vector indicating the outcome variable. |
Z |
a matrix or data frame representing covariates. |
ID |
a numeric vector representing the provider identifier. |
The function performs the following checks:
Missingness: Checks for any missing values in the dataset and provides a summary of missing data.
Variation: Identifies covariates with zero or near-zero variance which might affect model stability.
Correlation: Analyzes pairwise correlation among covariates and highlights highly correlated pairs.
VIF: Computes the Variable Inflation Factors to identify covariates with potential multicollinearity issues.
If issues arise when using the model functions logis_fe
, linear_fe
and linear_re
,
this function can be called for data quality checking purposes.
No return value, called for side effects.
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID data_check(outcome, covar, ID)
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID data_check(outcome, covar, ID)
The Early Childhood Longitudinal Study (ECLS) dataset tracks more than 18,000 children from kindergarten through fifth grade, providing comprehensive student-level information. See Tourangeau et al. (2015) for details.
data(ecls_data)
data(ecls_data)
A data frame with 9,101 observations, including:
Unique identifier for each child in the dataset.
Identifier for each school in the dataset.
Continuous variable representing the consolidated math proficiency score.
Household income, categorized into 18 ordinal levels, from $5,000 or less to $200,000 or more. Treated as a continuous variable in the analysis.
Binary variable indicating the gender of each student.
The dataset includes fifth-grade cross-sectional data, focusing on students' mathematical assessment scores as the primary outcome measure. The mathematical assessment covers 18 topics such as number sense, properties, operations, measurement, geometry, data analysis, and algebra. These items evaluate students' competencies in conceptual knowledge, procedural knowledge, and problem-solving, consolidated into a single "Math score." A higher score indicates greater proficiency. The primary predictors are household income and gender, where gender is a categorical variable. Household income is categorized into 18 ordinal levels, ranging from $5,000 or less (level 1) to $200,000 or more (level 18). In the analysis, the income variable is treated as a continuous variable. The dataset removes all records with missing values and consists of 9,101 complete observations from 2,275 schools.
Available at the following website: https://nces.ed.gov/ecls/.
Tourangeau, K., Nord, C., Lê, T., Sorongon, A. G., Hagedorn, M. C., Daly, P.,
& Najarian, M. (2015). Early Childhood Longitudinal Study, Kindergarten Class of
2010-11 (ECLS-K:2011): User’s manual for the ECLS-K:2011 kindergarten data file
and electronic codebook, public version (NCES 2015-074). National Center for
Education Statistics.
data(ecls_data) formula_FE <- as.formula("Math_Score ~ Income + id(School_ID) + Child_Sex") fit_FE <- linear_fe(formula = formula_FE, data = ecls_data) formula_RE <- as.formula("Math_Score ~ Income + (1|School_ID) + Child_Sex") fit_RE <- linear_re(formula = formula_RE, data = ecls_data)
data(ecls_data) formula_FE <- as.formula("Math_Score ~ Income + id(School_ID) + Child_Sex") fit_FE <- linear_fe(formula = formula_FE, data = ecls_data) formula_RE <- as.formula("Math_Score ~ Income + (1|School_ID) + Child_Sex") fit_RE <- linear_re(formula = formula_RE, data = ecls_data)
A simulated data set containing 7994 observations, 5 continuous covariates and 100 providers.
data(ExampleDataBinary)
data(ExampleDataBinary)
A list containing the following elements:
a vector representing binary outcomes with 0 or 1.
Generated from a Bernoulli distribution with the probability of success (), which is
determined by applying the logistic function to the linear combination of provider effects and covariates.
a vector representing identifier for each provider. The number of individuals per provider is generated from a Poisson distribution with mean 80, with a minimum value of 11.
a data frame containing 5 continuous variables.
Generated from a multivariate normal distribution, where the mean is calculated as
with
and
,
and the correlation between covariates and provider effects is 0.1.
data(ExampleDataBinary) head(ExampleDataBinary$Y) head(ExampleDataBinary$ID) head(ExampleDataBinary$Z)
data(ExampleDataBinary) head(ExampleDataBinary$Y) head(ExampleDataBinary$ID) head(ExampleDataBinary$Z)
A simulated data set containing 7901 observations, 5 continuous covariates and 100 providers.
data(ExampleDataLinear)
data(ExampleDataLinear)
A list containing the following elements:
a vector representing continuous outcomes.
Generated as a linear combination of provider-specific effects, covariates, and a random error term (),
where
follows a normal distribution with mean 0 and standard deviation 1.
a vector representing identifier for each provider. The number of individuals per provider is generated from a Poisson distribution with mean 80.
a data frame containing 5 continuous variables.
Generated from a multivariate normal distribution, where the mean is calculated as
with
and
,
and the correlation between covariates and provider effects is 0.1.
data(ExampleDataLinear) head(ExampleDataLinear$Y) head(ExampleDataLinear$ID) head(ExampleDataLinear$Z)
data(ExampleDataLinear) head(ExampleDataLinear$Y) head(ExampleDataLinear$ID) head(ExampleDataLinear$Z)
Fit a fixed effect linear model via profile likelihood or dummy encoding.
linear_fe( formula = NULL, data = NULL, Y = NULL, Z = NULL, ID = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, method = "pl" )
linear_fe( formula = NULL, data = NULL, Y = NULL, Z = NULL, ID = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, method = "pl" )
formula |
a two-sided formula object describing the model to be fitted,
with the response variable on the left of a ~ operator and covariates on the right,
separated by + operators. The fixed effect of the provider identifier is specified using |
data |
a data frame containing the variables named in the |
Y |
a numeric vector representing the response variable. |
Z |
a matrix or data frame representing the covariates, which can include both numeric and categorical variables. |
ID |
a numeric vector representing the provider identifier. |
Y.char |
a character string specifying the column name of the response variable in the |
Z.char |
a character vector specifying the column names of the covariates in the |
ID.char |
a character string specifying the column name of the provider identifier in the |
method |
a character string specifying the method to fit the model.
|
This function is used to fit a fixed effect linear model of the form:
where is the continuous outcome for individual
in provider
,
is the provider-specific effect,
are the covariates, and
is the vector of coefficients for the covariates.
The default method for fitting the model is profile likelihood, but dummy encoding can also be used by specifying the appropriate method.
When the number of providers is very large, we recommend using the profile likelihood method, as it is computationally efficient and requires
less memory usage.
The function accepts three different input formats:
a formula and dataset, where the formula is of the form response ~ covariates + id(provider)
, with provider
representing the provider identifier;
a dataset along with the column names of the response, covariates, and provider identifier;
or the outcome vector , the covariate matrix or data frame
, and the provider identifier vector.
If issues arise during model fitting, consider using the data_check
function to perform a data quality check,
which can help identify missing values, low variation in covariates, high-pairwise correlation, and multicollinearity.
For datasets with missing values, this function automatically removes observations (rows) with any missing values before fitting the model.
A list of objects with S3 class "linear_fe"
:
coefficient |
a list containing the estimated coefficients:
|
variance |
a list containing the variance estimates:
|
sigma |
the residual standard error. |
fitted |
the fitted values of each individual. |
observation |
the original response of each individual. |
residuals |
the residuals of each individual, that is response minus fitted values. |
linear_pred |
the linear predictor of each individual. |
data_include |
the data used to fit the model, sorted by the provider identifier. For categorical covariates, this includes the dummy variables created for all categories except the reference level. |
char_list |
a list of the character vectors representing the column names for the response variable, covariates, and provider identifier. For categorical variables, the names reflect the dummy variables created for each category. |
method |
the method used for model fitting, either |
Loglkd |
log likelihood. |
AIC |
Akaike information criterion. |
BIC |
Bayesian information criterion. |
Hsiao, C. (2022). Analysis of panel data (No. 64). Cambridge university press.
R Core Team (2023). The R Stats Package: lm.
Available at: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lm.html
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ id(ID)")) # Fit fixed linear effect model using three input formats fit_fe1 <- linear_fe(Y = outcome, Z = covar, ID = ID) fit_fe2 <- linear_fe(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_fe3 <- linear_fe(formula, data)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ id(ID)")) # Fit fixed linear effect model using three input formats fit_fe1 <- linear_fe(Y = outcome, Z = covar, ID = ID) fit_fe2 <- linear_fe(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_fe3 <- linear_fe(formula, data)
Fit a random effect linear model via lmer
from the lme4
package.
linear_re( formula = NULL, data = NULL, Y = NULL, Z = NULL, ID = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, ... )
linear_re( formula = NULL, data = NULL, Y = NULL, Z = NULL, ID = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, ... )
formula |
a two-sided formula object describing the model to be fitted,
with the response variable on the left of a ~ operator and covariates on the right,
separated by + operators. The random effect of the provider identifier is specified using |
data |
a data frame containing the variables named in the |
Y |
a numeric vector representing the response variable. |
Z |
a matrix or data frame representing the covariates, which can include both numeric and categorical variables. |
ID |
a numeric vector representing the provider identifier. |
Y.char |
a character string specifying the column name of the response variable in the |
Z.char |
a character vector specifying the column names of the covariates in the |
ID.char |
a character string specifying the column name of the provider identifier in the |
... |
additional arguments passed to |
This function is used to fit a random effect linear model of the form:
where is the continuous outcome for individual
in provider
,
is the overall intercept,
is the random effect for provider
,
are the covariates, and
is the vector of coefficients for the covariates.
The model is fitted by overloading the lmer
function from the lme4
package.
Three different input formats are accepted:
a formula and dataset, where the formula is of the form response ~ covariates + (1 | provider)
, with provider
representing the provider identifier;
a dataset along with the column names of the response, covariates, and provider identifier;
or the outcome vector , the covariate matrix or data frame
, and the provider identifier vector.
In addition to these input formats, all arguments from the lmer
function can be modified via ...
,
allowing for customization of model fitting options such as controlling the optimization method or adjusting convergence criteria.
By default, the model is fitted using REML (restricted maximum likelihood).
If issues arise during model fitting, consider using the data_check
function to perform a data quality check,
which can help identify missing values, low variation in covariates, high-pairwise correlation, and multicollinearity.
For datasets with missing values, this function automatically removes observations (rows) with any missing values before fitting the model.
A list of objects with S3 class "random_re"
:
coefficient |
a list containing the estimated coefficients:
|
variance |
a list containing the variance estimates:
|
sigma |
the residual standard error. |
fitted |
the fitted values of each individual. |
observation |
the original response of each individual. |
residuals |
the residuals of each individual, that is response minus fitted values. |
linear_pred |
the linear predictor of each individual. |
data_include |
the data used to fit the model, sorted by the provider identifier. For categorical covariates, this includes the dummy variables created for all categories except the reference level. |
char_list |
a list of the character vectors representing the column names for the response variable, covariates, and provider identifier. For categorical variables, the names reflect the dummy variables created for each category. |
Loglkd |
the log-likelihood. |
AIC |
Akaike information criterion. |
BIC |
Bayesian information criterion. |
Bates D, Maechler M, Bolker B, Walker S (2015). Fitting Linear Mixed-Effects Models Using lme4.
Journal of Statistical Software, 67(1), 1-48.
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ (1|ID)")) # Fit random effect linear model using three input formats fit_re1 <- linear_re(Y = outcome, Z = covar, ID = ID) fit_re2 <- linear_re(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_re3 <- linear_re(formula, data)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ (1|ID)")) # Fit random effect linear model using three input formats fit_re1 <- linear_re(Y = outcome, Z = covar, ID = ID) fit_re2 <- linear_re(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_re3 <- linear_re(formula, data)
Fit a fixed effect logistic model via Serial blockwise inversion Newton (SerBIN) or block ascent Newton (BAN) algorithm.
logis_fe( formula = NULL, data = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, Y = NULL, Z = NULL, ID = NULL, method = "SerBIN", max.iter = 1000, tol = 1e-05, bound = 10, cutoff = 10, backtrack = TRUE, stop = "or", threads = 1, message = TRUE )
logis_fe( formula = NULL, data = NULL, Y.char = NULL, Z.char = NULL, ID.char = NULL, Y = NULL, Z = NULL, ID = NULL, method = "SerBIN", max.iter = 1000, tol = 1e-05, bound = 10, cutoff = 10, backtrack = TRUE, stop = "or", threads = 1, message = TRUE )
formula |
a two-sided formula object describing the model to be fitted,
with the response variable on the left of a ~ operator and covariates on the right,
separated by + operators. The fixed effect of the provider identifier is specified using |
data |
a data frame containing the variables named in the |
Y.char |
a character string specifying the column name of the response variable in the |
Z.char |
a character vector specifying the column names of the covariates in the |
ID.char |
a character string specifying the column name of the provider identifier in the |
Y |
a numeric vector representing the response variable. |
Z |
a matrix or data frame representing the covariates, which can include both numeric and categorical variables. |
ID |
a numeric vector representing the provider identifier. |
method |
a string specifying the algorithm to be used. The default value is "SerBIN".
|
max.iter |
maximum iteration number if the stopping criterion specified by |
tol |
tolerance used for stopping the algorithm. See details in |
bound |
a positive number to avoid inflation of provider effects. The default value is 10. |
cutoff |
An integer specifying the minimum number of observations required for providers.
Providers with fewer observations than the cutoff will be labeled as |
backtrack |
a Boolean indicating whether backtracking line search is implemented. The default is FALSE. |
stop |
a character string specifying the stopping rule to determine convergence.
The default value is |
threads |
a positive integer specifying the number of threads to be used. The default value is 1. |
message |
a Boolean indicating whether to print the progress of the fitting process. The default is TRUE. |
The function accepts three different input formats:
a formula and dataset, where the formula is of the form response ~ covariates + id(provider)
, with provider
representing the provider identifier;
a dataset along with the column names of the response, covariates, and provider identifier;
or the binary outcome vector , the covariate matrix or data frame
, and the provider identifier vector.
The default algorithm is based on Serial blockwise inversion Newton (SerBIN) proposed by Wu et al. (2022), but users can also choose to use the block ascent Newton (BAN) algorithm proposed by He et al. (2013) to fit the model. Both methodologies build upon the Newton-Raphson method, yet SerBIN simultaneously updates both the provider effect and covariate coefficient. This concurrent update necessitates the inversion of the whole information matrix at each iteration. In contrast, BAN adopts a two-layer updating approach, where the covariate coefficient is sequentially fixed to update the provider effect, followed by fixing the provider effect to update the covariate coefficient.
We suggest using the default "SerBIN"
option as it typically converges subsequently much faster for most datasets.
However, in rare cases where the SerBIN algorithm encounters second-order derivative irreversibility leading to an error,
users can consider using the "BAN"
option as an alternative.
For a deeper understanding, please consult the original article for detailed insights.
If issues arise during model fitting, consider using the data_check
function to perform a data quality check,
which can help identify missing values, low variation in covariates, high-pairwise correlation, and multicollinearity.
For datasets with missing values, this function automatically removes observations (rows) with any missing values before fitting the model.
A list of objects with S3 class "logis_fe"
:
coefficient |
a list containing the estimated coefficients:
|
variance |
a list containing the variance estimates:
|
linear_pred |
the linear predictor of each individual. |
prediction |
predicted probability of each individual |
observation |
the original response of each individual. |
Loglkd |
the log-likelihood. |
AIC |
Akaike info criterion. |
BIC |
Bayesian info criterion. |
AUC |
area under the ROC curve. |
char_list |
a list of the character vectors representing the column names for the response variable, covariates, and provider identifier. For categorical variables, the names reflect the dummy variables created for each category. |
data_include |
the data used to fit the model, sorted by the provider identifier.
For categorical covariates, this includes the dummy variables created for
all categories except the reference level. Additionally, it contains three extra columns:
|
He K, Kalbfleisch, J, Li, Y, and et al. (2013) Evaluating hospital readmission rates in dialysis providers; adjusting for hospital effects.
Lifetime Data Analysis, 19: 490-512.
Wu, W, Yang, Y, Kang, J, He, K. (2022) Improving large-scale estimation and inference for profiling health care providers.
Statistics in Medicine, 41(15): 2840-2853.
data(ExampleDataBinary) outcome <- ExampleDataBinary$Y covar <- ExampleDataBinary$Z ID <- ExampleDataBinary$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ id(ID)")) # Fit logistic linear effect model using three input formats fit_fe1 <- logis_fe(Y = outcome, Z = covar, ID = ID) fit_fe2 <- logis_fe(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_fe3 <- logis_fe(formula, data)
data(ExampleDataBinary) outcome <- ExampleDataBinary$Y covar <- ExampleDataBinary$Z ID <- ExampleDataBinary$ID data <- data.frame(outcome, ID, covar) covar.char <- colnames(covar) outcome.char <- colnames(data)[1] ID.char <- colnames(data)[2] formula <- as.formula(paste("outcome ~", paste(covar.char, collapse = " + "), "+ id(ID)")) # Fit logistic linear effect model using three input formats fit_fe1 <- logis_fe(Y = outcome, Z = covar, ID = ID) fit_fe2 <- logis_fe(data = data, Y.char = outcome.char, Z.char = covar.char, ID.char = ID.char) fit_fe3 <- logis_fe(formula, data)
linear_fe
object for institutional comparisonsCreates a funnel plot from a linear fixed effect model to compare provider performance.
## S3 method for class 'linear_fe' plot( x, null = "median", target = 0, alpha = 0.05, labels = c("lower", "expected", "higher"), point_colors = c("#E69F00", "#56B4E9", "#009E73"), point_shapes = c(15, 17, 19), point_size = 2, point_alpha = 0.8, line_size = 0.8, target_line_type = "longdash", ... )
## S3 method for class 'linear_fe' plot( x, null = "median", target = 0, alpha = 0.05, labels = c("lower", "expected", "higher"), point_colors = c("#E69F00", "#56B4E9", "#009E73"), point_shapes = c(15, 17, 19), point_size = 2, point_alpha = 0.8, line_size = 0.8, target_line_type = "longdash", ... )
x |
a model fitted from |
null |
a character string or a number specifying null hypotheses of fixed provider effects. The default is |
target |
a numeric value representing the target outcome. The default value is 0. |
alpha |
a number or a vector of significance levels. The default is 0.05. |
labels |
a vector of labels for the plot. |
point_colors |
a vector of colors representing different provider flags. The default is |
point_shapes |
a vector of shapes representing different provider flags. The default is |
point_size |
size of the points. The default is 2. |
point_alpha |
transparency level of the points. The default is 0.8. |
line_size |
size of all lines, including control limits and the target line. The default is 0.8. |
target_line_type |
line type for the target line. The default is "longdash". |
... |
additional arguments that can be passed to the function. |
This function generates a funnel plot from a linear fixed effect model. Currently, it only supports the indirect standardized difference.
The parameter alpha
is a vector used to calculate control limits at different significance levels.
The first value in the vector is used as the significance level for flagging each provider, utilizing the test.linear_fe
function.
A ggplot object representing the funnel plot.
linear_fe
, linear_fe
, linear_fe
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_fe <- linear_fe(Y = outcome, Z = covar, ID = ID) plot(fit_fe)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_fe <- linear_fe(Y = outcome, Z = covar, ID = ID) plot(fit_fe)
logis_fe
object for institutional comparisonsCreates a funnel plot from a logistic fixed effect model to compare provider performance.
## S3 method for class 'logis_fe' plot( x, null = "median", test = "score", target = 1, alpha = 0.05, labels = c("lower", "expected", "higher"), point_colors = c("#E69F00", "#56B4E9", "#009E73"), point_shapes = c(15, 17, 19), point_size = 2, point_alpha = 0.8, line_size = 0.8, target_line_type = "longdash", ... )
## S3 method for class 'logis_fe' plot( x, null = "median", test = "score", target = 1, alpha = 0.05, labels = c("lower", "expected", "higher"), point_colors = c("#E69F00", "#56B4E9", "#009E73"), point_shapes = c(15, 17, 19), point_size = 2, point_alpha = 0.8, line_size = 0.8, target_line_type = "longdash", ... )
x |
a model fitted from |
null |
a character string or a number specifying null hypotheses of fixed provider effects. The default is |
test |
a character string specifying the type of testing methods to be conducted. The default is "score". |
target |
a numeric value representing the target outcome. The default value is 1. |
alpha |
a number or a vector of significance levels. The default is 0.05. |
labels |
a vector of labels for the plot. |
point_colors |
a vector of colors representing different provider flags. The default is |
point_shapes |
a vector of shapes representing different provider flags. The default is |
point_size |
size of the points. The default is 2. |
point_alpha |
transparency level of the points. The default is 0.8. |
line_size |
size of all lines, including control limits and the target line. The default is 0.8. |
target_line_type |
line type for the target line. The default is "longdash". |
... |
additional arguments that can be passed to the function. |
This function generates a funnel plot from a logistic fixed-effect model. Currently, it only supports the indirect standardized ratio.
The parameter alpha
is a vector used to calculate control limits at different significance levels.
The first value in the vector is used as the significance level for flagging each provider, utilizing the test.logis_fe
function.
A ggplot object representing the funnel plot.
Wu, W., Kuriakose, J. P., Weng, W., Burney, R. E., & He, K. (2023). Test-specific funnel plots for healthcare provider profiling leveraging
individual- and summary-level information. Health Services and Outcomes Research Methodology, 23(1), 45-58.
logis_fe
, SM_output.linear_re
, test.logis_fe
data(ExampleDataBinary) outcome <- ExampleDataBinary$Y covar <- ExampleDataBinary$Z ID <- ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID) plot(fit_fe)
data(ExampleDataBinary) outcome <- ExampleDataBinary$Y covar <- ExampleDataBinary$Z ID <- ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID) plot(fit_fe)
SM_output
is an S3 generic function designed to calculate standardized
measures. It dispatches to the appropriate method based on the class of the
input (fit
), ensuring the correct method is applied for different types of models.
SM_output(fit, ...)
SM_output(fit, ...)
fit |
the input object, typically a fitted model, for which standardized measures are calculated. The method applied depends on the class of this object. |
... |
additional arguments that can be passed to specific methods. |
the return varies depending on the method implemented for the class of the input object.
linear_fe
objectProvide direct/indirect standardized differences for a fixed effect linear model.
## S3 method for class 'linear_fe' SM_output(fit, parm, stdz = "indirect", null = "median", ...)
## S3 method for class 'linear_fe' SM_output(fit, parm, stdz = "indirect", null = "median", ...)
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
stdz |
a character string or a vector specifying the standardization method(s). The possible values are:
|
null |
a character string or a number defining the population norm.
The default value is
|
... |
additional arguments that can be passed to the function. |
This function computes standardized differences for a fixed effect linear model
using either direct or indirect methods, or both when specified.
For each method, the population norm is determined by the null
argument.
The population norm can be the median of the estimates, their weighted mean
(with weights corresponding to provider sizes), or a user-defined numeric value.
A list containing the standardized differences based on the method(s) specified in stdz
,
as well as the observed and expected outcomes used to calculate the standardized measures:
indirect.difference |
indirect standardized differences, if |
direct.difference |
direct standardized differences, if |
OE |
a list of data frames containing the observed and expected outcomes used for calculating standardized measures. |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) SM_output(fit_linear) SM_output(fit_linear, stdz = "direct", null = "mean")
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) SM_output(fit_linear) SM_output(fit_linear, stdz = "direct", null = "mean")
linear_re
objectProvide direct/indirect standardized differences for a random effect linear model.
## S3 method for class 'linear_re' SM_output(fit, parm, stdz = "indirect", ...)
## S3 method for class 'linear_re' SM_output(fit, parm, stdz = "indirect", ...)
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
stdz |
a character string or a vector specifying the standardization method(s). The possible values are:
|
... |
additional arguments that can be passed to the function. |
This function computes standardized differences for a random effect linear model using either direct or indirect methods, or both when specified. The function returns both the standardized differences and the observed and expected outcomes used for their calculation.
A list containing the standardized differences based on the method(s) specified in stdz
,
as well as the observed and expected outcomes used to calculate the standardized measures:
indirect.difference |
indirect standardized differences, if |
direct.difference |
direct standardized differences, if |
OE |
a list of data frames containing the observed and expected outcomes used for calculating standardized measures. |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_re(Y = outcome, Z = covar, ID = ID) SM_output(fit_linear)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_re(Y = outcome, Z = covar, ID = ID) SM_output(fit_linear)
logis_fe
objectProvide direct/indirect standardized ratios/rates for a fixed effect logistic model.
## S3 method for class 'logis_fe' SM_output( fit, parm, stdz = "indirect", measure = c("rate", "ratio"), null = "median", threads = 2, ... )
## S3 method for class 'logis_fe' SM_output( fit, parm, stdz = "indirect", measure = c("rate", "ratio"), null = "median", threads = 2, ... )
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
stdz |
a character string or a vector specifying the standardization method(s). The possible values are:
|
measure |
a character string or a vector indicating whether the output measure is "ratio" or "rate"
|
null |
if |
threads |
an integer specifying the number of threads to use. The default value is 2. |
... |
additional arguments that can be passed to the function. |
A list contains standardized measures, as well as the observed and expected outcomes used for calculation,
depending on the user's choice of standardization method (stdz
) and measure type (measure
).
indirect.ratio |
standardization ratio using indirect method if |
direct.ratio |
standardization ratio using direct method if |
indirect.rate |
standardization rate using indirect method if |
direct.rate |
standardization rate using direct method if |
OE |
a list of data frames containing the observed and expected outcomes used for calculating standardized measures. |
He, K. (2019). Indirect and direct standardization for evaluating transplant centers. Journal of Hospital Administration, 8(1), 9-14.
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) SR <- SM_output(fit_fe, stdz = "direct", measure = "rate") SR$direct.rate
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) SR <- SM_output(fit_fe, stdz = "direct", measure = "rate") SR$direct.rate
linear_fe
or linear_re
objectProvide the summary statistics for the covariate estimates for a fixed/random effect linear model.
## S3 method for class 'linear_fe' summary(object, parm, level = 0.95, null = 0, alternative = "two.sided", ...) ## S3 method for class 'linear_re' summary(object, parm, level = 0.95, null = 0, alternative = "two.sided", ...)
## S3 method for class 'linear_fe' summary(object, parm, level = 0.95, null = 0, alternative = "two.sided", ...) ## S3 method for class 'linear_re' summary(object, parm, level = 0.95, null = 0, alternative = "two.sided", ...)
object |
a model fitted from |
parm |
Specifies a subset of covariates for which the result summaries should be output. By default, all covariates are included. |
level |
the confidence level during the hypothesis test, meaning a significance level of |
null |
a number defining the null hypothesis for the covariate estimates. The default value is |
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
A data frame containing summary statistics for covariate estimates, with the following columns:
Estimate |
the estimates of covariate coefficients. |
Std.Error |
the standard error of the estimate. |
Stat |
the test statistic. |
p value |
the p-value for the hypothesis test. |
CI.upper |
the lower bound of the confidence interval. |
CI.lower |
the upper bound of the confidence interval. |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_fe <- linear_fe(Y = outcome, Z = covar, ID = ID) summary(fit_fe) data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_re <- linear_fe(Y = outcome, Z = covar, ID = ID) summary(fit_re)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_fe <- linear_fe(Y = outcome, Z = covar, ID = ID) summary(fit_fe) data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_re <- linear_fe(Y = outcome, Z = covar, ID = ID) summary(fit_re)
logis_fe
objectProvide the summary statistics for the covariate estimates for a fixed effect logistic model.
## S3 method for class 'logis_fe' summary( object, parm, level = 0.95, test = "wald", null = 0, alternative = "two.sided", ... )
## S3 method for class 'logis_fe' summary( object, parm, level = 0.95, test = "wald", null = 0, alternative = "two.sided", ... )
object |
a model fitted from |
parm |
Specifies a subset of covariates for which the result summaries should be output. By default, all covariates are included. |
level |
the confidence level during the hypothesis test, meaning a significance level of |
test |
a character string specifying the type of testing method. The default is "wald".
|
null |
a number defining the null hypothesis for the covariate estimates. The default value is |
alternative |
a character string specifying the alternative hypothesis when |
... |
additional arguments that can be passed to the function. |
A data frame containing summary statistics for covariate estimates, with the following columns:
Estimate |
the estimates of covariate coefficients. |
Std.Error |
the standard error of the estimate, included only when |
Stat |
the test statistic. |
p value |
the p-value for the hypothesis test. |
CI.upper |
the lower bound of the confidence interval, included only when |
CI.lower |
the upper bound of the confidence interval, included only when |
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) summary.wald <- summary(fit_fe, level = 0.95, test = "wald") summary.wald
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) summary.wald <- summary(fit_fe, level = 0.95, test = "wald") summary.wald
test
is an S3 generic function used to conduct hypothesis tests on provider effect coefficients
and detect outlying provider. The function dispatches to the appropriate method
based on the class of the input model (fit
).
test(fit, ...)
test(fit, ...)
fit |
the input object, typically a fitted model, for which provider effects are tested. The method applied depends on the class of this object. |
... |
additional arguments that can be passed to specific methods. |
the return depends on the method implemented for the class of the input object, typically including statistical outputs for provider effect coefficients and identification of outlier providers.
linear_fe
objectConduct hypothesis tests on provider effects and identify outlying providers for a fixed effect linear model.
## S3 method for class 'linear_fe' test(fit, parm, level = 0.95, null = "median", alternative = "two.sided", ...)
## S3 method for class 'linear_fe' test(fit, parm, level = 0.95, null = "median", alternative = "two.sided", ...)
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
level |
the confidence level during the hypothesis test, meaning a significance level of |
null |
a character string or a number defining the null hypothesis for the provider effects.
The default value is
|
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
The function identifies outlying providers based on hypothesis test results.
For two-sided tests, 1
indicates performance significantly higher than expected, -1
indicates lower,
For one-sided tests, 1
(right-tailed) or -1
(left-tailed) flags are used.
Providers whose performance falls within the central range are flagged as 0
.
Outlying providers are determined by the test statistic falling beyond the threshold based on the significance level .
A data frame containing the results of the hypothesis test, with the following columns:
flag |
a flagging indicator where |
p-value |
the p-value of the hypothesis test. |
stat |
the test statistic. |
Std.Error |
the standard error of the provider effect estimate. |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) test(fit_linear)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y covar <- ExampleDataLinear$Z ID <- ExampleDataLinear$ID fit_linear <- linear_fe(Y = outcome, Z = covar, ID = ID) test(fit_linear)
linear_re
objectConduct hypothesis tests on provider effects and identify outlying providers for a random effect linear model.
## S3 method for class 'linear_re' test(fit, parm, level = 0.95, null = 0, alternative = "two.sided", ...)
## S3 method for class 'linear_re' test(fit, parm, level = 0.95, null = 0, alternative = "two.sided", ...)
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
level |
the confidence level during the hypothesis test, meaning a significance level of |
null |
a number defining the null hypothesis for the provider effects. The default value is 0. |
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
The function identifies outlying providers based on hypothesis test results.
For two-sided tests, 1
indicates performance significantly higher than expected, -1
indicates lower,
For one-sided tests, 1
(right-tailed) or -1
(left-tailed) flags are used.
Providers whose performance falls within the central range are flagged as 0
.
Outlying providers are determined by the test statistic falling beyond the threshold based on the significance level .
A data frame containing the results of the hypothesis test, with the following columns:
flag |
a flagging indicator where |
p-value |
the p-value of the hypothesis test. |
stat |
the test statistic. |
Std.Error |
the standard error of the provider effect estimate. |
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y ID <- ExampleDataLinear$ID covar <- ExampleDataLinear$Z fit_re <- linear_re(Y = outcome, Z = covar, ID = ID) test(fit_re)
data(ExampleDataLinear) outcome <- ExampleDataLinear$Y ID <- ExampleDataLinear$ID covar <- ExampleDataLinear$Z fit_re <- linear_re(Y = outcome, Z = covar, ID = ID) test(fit_re)
logis_fe
objectConduct hypothesis tests on provider effects and identify outlying providers for a fixed effect logistic model.
## S3 method for class 'logis_fe' test( fit, parm, level = 0.95, test = "exact.poisbinom", score_modified = TRUE, null = "median", n = 10000, threads = 1, alternative = "two.sided", ... )
## S3 method for class 'logis_fe' test( fit, parm, level = 0.95, test = "exact.poisbinom", score_modified = TRUE, null = "median", n = 10000, threads = 1, alternative = "two.sided", ... )
fit |
a model fitted from |
parm |
specifies a subset of providers for which confidence intervals are to be given.
By default, all providers are included. The class of |
level |
the confidence level during the hypothesis test, meaning a significance level of |
test |
a character string specifying the type of testing method to be conducted. The default is "exact.poisbinom".
|
score_modified |
a logical indicating whether to use the modified score test
ignoring the randomness of covariate coefficient for score teat ( |
null |
a character string or a number specifying null hypotheses of fixed provider effects. The default is |
n |
resample size for bootstrapping when ( |
threads |
an integer specifying the number of threads to use. The default value is 1. |
alternative |
a character string specifying the alternative hypothesis, must be one of
|
... |
additional arguments that can be passed to the function. |
By default, the function uses the "exact.poisbinom"
method.
The wald test is invalid for extreme providers (i.e. when provider effect goes to infinity).
For the score test, consider that when the number of tested providers is large,
refitting the models to get the restricted MLEs will take a long time.
Therefore, we use unrestricted MLEs to replace the restricted MLEs during the testing procedure by default.
However, the user can specify score_modified = FALSE
to perform a standard score test.
A data frame containing the results of the hypothesis test, with the following columns:
flag |
a flagging indicator where |
p-value |
the p-value of the hypothesis test. |
stat |
the test statistic. |
Std.Error |
The standard error of the provider effect estimate, included only when |
Wu, W, Yang, Y, Kang, J, He, K. (2022) Improving large-scale estimation and inference for profiling health care providers.
Statistics in Medicine, 41(15): 2840-2853.
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) test(fit_fe, test = "score")
data(ExampleDataBinary) outcome = ExampleDataBinary$Y covar = ExampleDataBinary$Z ID = ExampleDataBinary$ID fit_fe <- logis_fe(Y = outcome, Z = covar, ID = ID, message = FALSE) test(fit_fe, test = "score")