Using natural language processing to inform targeted rural and urban Hispanic U.S. Department of Veterans Affairs suicide prediction models

Abstract: Rural Hispanic veterans experience elevated suicide rates when compared to urban counterparts. Group differences remain poorly understood. This study evaluates a rurality-stratified sample of Hispanic Veterans Affairs patients, leveraging unstructured electronic health record data to refine population-specific suicide risk prediction metrics. The study utilized a rural and urban Hispanic Veterans Affairs patient data set, including all suicide decedents from 2015 to 2018 (cases). Each case was matched with four patients who shared demographics and treatment year and remained alive (controls). After extracting and preprocessing all unstructured electronic health record text data, the corpus was analyzed using the 500+ variable semantic analysis package. Least absolute shrinkage and selection operator and logistic regression were used to develop prediction models, and the area under receiver operating characteristic curve was used to examine the models’ predictive accuracy. The final data sets included 39 rural cases and 148 controls, alongside 273 urban cases and 1,090 controls. The predictive models offered considerable accuracy (rural area under receiver operating characteristic curve = 0.86; urban area under receiver operating characteristic curve = 0.67). While rural models emphasized dislocation from community and communal resources, urban models emphasized alienation and identity challenges. This study enhances understanding about rural and urban Hispanic suicide decedents and could inform suicide prediction and preventive services.

Read the full article
Report a problem with this article

Related articles