The unreliability of two publicly reported outcome quality measures for characterizing health care quality within the Veterans Health Administration
Abstract:Objective: To estimate the reliability of two outcome quality measures in Veterans Health Administration (VHA) data using three different methods. Study Setting and Design: We created two cohorts of VHA patients meeting criteria for two measures: (1) risk-standardized complication rates following elective primary total hip arthroplasty and/or total knee arthroplasty (THA/TKA), and (2) risk-standardized mortality rates following acute myocardial infarction hospitalization (AMI). We fit hierarchical logistic regression models and calculated facility-level risk-standardized rates. We estimated entity-level reliability using three commonly applied methods: (1) delta method approximation; (2) latent scale model; (3) split-sample method. Data Sources and Analytic Sample: For each measure, we extracted risk adjustment and outcome data from the VHA Corporate Data Warehouse for patients meeting eligibility criteria in fiscal years 2021 and 2022. Principal Findings: Most facilities had complication rates following total hip and/or knee arthroplasty and mortality rates following hospitalization for acute myocardial infarction that, statistically, were no different from the national average. Reliability estimates based on delta method approximation (0.14 for THA/TKA; 0.12 for AMI) and the split-sample method (0.12 for THA/TKA; 0.19 for AMI) were very low for both measures. As we varied the sample sizes, we found that much higher sample sizes would be needed to reliably differentiate quality of care across facilities. On the other hand, reliability estimates based on the latent scale model were substantially higher than the other two methods (0.64 for THA/TKA; 0.41 for AMI), suggesting that there is substantially more between-facility variation in latent quality than manifests in observed outcomes. Conclusions: Reliability estimates based on the latent scale approach are not numerically or conceptually interchangeable with estimates based on the other two approaches. Given that health outcomes are generally reported using observed outcomes, reliability estimation based on the latent scale approach should not be used without a strong rationale.