Prognostic Utility of Single-Nucleotide Polymorphisms in Inflammatory Arthritis
Callear A¹, Foth W¹, Strenn R², Brilliant M¹, Schrodi S¹
¹Center for Human Genetics and ²Biomedical Informatics Research Center
Research area: Genetics
Background: The inflammatory arthritis conditions ankylosing spondylitis (AS), psoriatic arthritis (PsA), and rheumatoid arthritis (RA) are difficult to diagnose, and this delay is associated with increased disease burden. A significant portion of single-nucleotide polymorphisms (SNPs) that confer susceptibility to these conditions have been identified, sparking interest in their potential diagnostic use. The purpose of this study was to investigate the prognostic utility of a panel of these genetic markers by using machine learning models to classify confirmed inflammatory arthritis cases as AS, PsA, or RA.
Methods: Seventy-three participants of the Personalized Medicine Research Project (PMRP) with inflammatory arthritis (AS: n=26, PsA: n=7, RA: n=40) were genotyped for 37 SNPs found to be associated with AS, PsA, or RA in genome-wide association studies. Using Weka, a machine learning software package, classification models were first developed on a training set of 10,000 samples created from SNP frequencies reported in the literature. The performance of several classification models, including neural networks, decision trees, support vector machines, and Naïve Bayes, was assessed, with feature selection performed for each. The best model, Naïve Bayes applied to 29 SNPs, was evaluated on the test set of PMRP participants. Predictive performance was assessed according to the accuracy and the area under the ROC curve (AUC).
Results: The accuracy of the Naïve Bayes model on the training set was 78.05% and the AUC was 0.884. The accuracy of the model applied to the test set of PMRP participants was 61.64% and the average AUC was 0.635 (AS: AUC=0.608, PsA: AUC=0.665, RA: AUC=0.648), perhaps reflecting poor concordance between test and training sets. The AUC is statistically significant (p≤0.05).
Conclusion: Despite statistical significance, this SNP panel was not highly effective in classifying inflammatory arthritis types in this population. Use of a larger SNP array could potentially enhance the predictive performance.