Using natural language processing to extract carotid stenosis severity from clinical notes to create a nationwide Veteran cohort

Abstract:Objective: The prevalence of moderate to severe asymptomatic carotid stenosis (i.e., atherosclerotic narrowing of the extracranial carotid arteries) is generally ∼6% and ∼2%, respectively. Most prior studies of carotid stenosis risk factors have been small. This study describes the development and validation of a natural language processing (NLP) tool to identify carotid stenosis and uses it to identify significant risk factors, presence, and severity of carotid stenosis. Methods: We created an NLP tool to extract the ratio of peak systolic velocity of the internal carotid artery to the common carotid artery (ICA/CCA ratio) in Veterans receiving carotid duplex ultrasounds in the VHA from 2001 to 2020. Among those who had at least one valid ICA/CCA ratio, we identified carotid stenosis severity (<50%, 50-69%, ≥70%) based on ICA/CCA ratio (<2, ≥2 to <4, ≥4) and assessed the association between presence and severity of carotid stenosis and clinical and demographic characteristics, including age, sex, self-identified race and ethnicity, smoking status, BMI, systolic and diastolic blood pressures, indicator variables for pre-existing hypertension, coronary heart disease, and type 2 diabetes, and selected laboratory measures (i.e., hemoglobin A1c [HbA1c], low-density lipoprotein cholesterol [LDL-c], high-density lipoprotein cholesterol [HDL-c], triglyceride, and creatinine). Results: The harmonic F1 score of the NLP tool was 0.907 for right value, 0.882 for left value, and 0.920 for max value. Among the 290,517 Veterans in the cohort, the median age was 68.2 years. Black patients had 16% decreased risk of more severe carotid stenosis (OR 0.84, 95% CI 0.81–0.87, p<0.001). All patient-level risk factors except high-density lipoprotein cholesterol were significantly associated with carotid stenosis severity. Conclusions: The NLP tool performed well, and the study performed with our NLP-created cohort largely validates the risk factors identified by previous smaller studies, demonstrating the utility of big data and NLP in carotid stenosis research.

Read the full article
Report a problem with this article

Related articles