Machine Learning Alternatives for the Diagnosis of Adhd from Functional Connectivity and Phenotypic Information

Authors: Amrit Baveja

Current estimates are that 5-10% of school age children (including the author) suffer from ADHD, costing the US healthcare system alone over $36B. However, factors such as a revenue-motivated healthcare system and researcher confirmation bias make ADHD overdiagnosis a very real issue, and today, there continues to be no reliable technique for automated ADHD diagnosis in clinical use. This work’s objective was to improve upon previous attempts to develop machine learning models for automated ADHD diagnosis. For this project, the author used the ADHD200 dataset which was generated for an international competition in 2011. The author extended the competition’s approach in several ways: First, the author combined high coverage phenotypic features with the functional connectome features used previously. Next, the author used a random 20% test/train split cross-validated five times to avoid overfitting, rather than the previously used fixed test/train split. Third, the author used a broad range of newer models such as neural networks and deep forest. Finally, the author used newer hyperparameter optimization techniques to identify the best model parameters. The best model explored was the more recent gcForest model with automated optimization -- it improved the previous best ADHD F1 from 0.32 to 0.52, a substantial improvement in binary diagnostic performance.

