Abstract: BACKGROUND: Early prediction of Alzheimer's disease is important for timely intervention and treatment. We examine whether machine learning on longitudinal electronic health record notes can improve early prediction of Alzheimer's disease. METHODS: From Veterans Health Administration records (2000 to 2022), we studied 61,537 individuals diagnosed with Alzheimer's disease and 234,105 without, aged 45-103 years, 98.4% were male. From clinical notes, we quantified the frequency of subjective cognitive decline and Alzheimer's disease-related keywords, and applied statistical machine learning models to assess their ability to predict future diagnosis. RESULTS: Here we show that Alzheimer's-related keywords (e.g., "concentration," "speaking"), occur more often in notes of individuals who later develop Alzheimer's disease than in controls. In the 15 years preceding diagnosis, cases demonstrate an exponential increase in keyword mentions (from 9.4 to 57.7 per year), whereas controls show a slower, linear increase (8.2 to 20.3). These trends are consistent across demographic subgroups. Random forest models using these keywords for prediction achieve an area under receiver operating characteristic curve from 0.577 at ten years before diagnosis to 0.861 one day before diagnosis, consistently outperforming models using only structured data. CONCLUSIONS: Signs and symptoms of early Alzheimer's disease are reported in clinical notes many years before a clinical diagnosis is made and the frequency of these signs and symptoms, approximated by keywords, increases the closer one is to the diagnosis. A simple keyword-based approach can capture these signals and can help identify individuals at high risk of future Alzheimer's disease.