March 2025

Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large Veterans affairs population

Abstract: Objective: Suicide risk assessment has historically relied heavily on clinical evaluations and patient self-reports. Natural language processing (NLP) of electronic health records (EHRs) provides an alternative approach for extracting risk predictors from clinical notes. Modeling NLP variables, however, is challenging because of zero inflation and skewed distributions. Therefore, we evaluated whether an adaptive-mixture-categorization (AMC) method could optimize the suicide risk predictive capacity of NLP data extracted from Veterans Affairs (VA) EHR notes. Methods: NLP variables for 25,342 patients were analyzed using the SÉANCE python package. The AMC method was employed to categorize NLP measures into distinct groups to maximize the between-category variance. Associations between suicide outcomes and AMC-categorized NLP variables were compared to those between the original and quantile-categorized NLP variables. Results: AMC-categorized variables showed stronger associations with suicide risk than other approaches did in the full cohort analysis and sensitivity analyses by subsampling bootstrapping. Additionally, over 90 % of the NLP variables were significantly associated with suicide risk in univariate analyses, indicating the relevance of clinical notes in suicide prevention. Conclusion: AMC-based categorization substantially enhanced the suicide predictive capacity of NLP variables extracted from clinical text. Transforming skewed NLP data with the AMC method holds promise for improving risk prediction models.

Read the full article

Report a problem with this article

Related articles

Improving Service Family Engagement with the Transition to Civilian Life: Insights and Behaviour Change Interventions

Related articles

Preprocessing of natural language proces...

Report a problem with this article

Improving Service Family Engagement with the Transition to Civilian Life: Insights and Behaviour Change Interventions

Preprocessing of natural language process variables using a data-driven method improves the association with suicide risk in a large Veterans affairs population

Related articles

Improving Service Family Engagement with the Transition to Civilian Life: Insights and Behaviour Change Interventions

Sign up for the latest research evidence and updates from the Centre for Evidence for the Armed Forces Community