Background
There is growing recognition that a large proportion of SARS-CoV-2 infected individuals continue to experience a broad range of symptoms after recovering from the initial bout of the COVID-19 illness. These patients are colloquially referred to as “COVID long haulers” and the illness as “Long COVID.” There are over 100 million COVID survivors in the United States and 10 million estimated in Texas. Long COVID is a national—and even a global—public health issue. The National Institute of Health recently gave “Long COVID” a formal name, Post-Acute Sequelae of SARs-CoV-2 infection (PASC) but has not yet formally characterized the illness; this highlights the substantial knowledge gap between the widespread prevalence of this public health issue and its understanding in the medical community.
Approach
This project was a collaborative effort between Southwest Research Institute, University of Texas Health San Antonio (UT Health), and UT San Antonio (UTSA). The project used the UT Health COVID-19 Infectious Diseases Outpatient Clinic cohort (>12,000 patients diagnosed with acute COVID-19) by systematic deep phenotyping of clinical characteristics from different data sources (data warehouse, clinical chart review, unstructured clinical and radiology notes using Natural Language Processing, and patient-reported symptoms). The goal of this study was to correlate PASC symptoms with radiomic and radiopathomic information by analyzing image data using computer vision artificial intelligence and machine learning. The outcome of this project was a predictive tool via turnkey implementation of Electronic Medical Record based app for quick scaling across most academic and many non-academic clinical settings.
Accomplishments
Our results demonstrated that PASC, in most cases, can be predicted from data from other healthcare encounters. On the withheld testing data, the model showed good predictive power, with a Receiver Operator Characteristic (ROC) curve Area Under Curve (AUC) of 69%. Approximately 50% of PASC cases are predictable from other data at a false positive rate of approximately 10%. Additionally, our feature importance analysis identified a large number of related conditions, offering helpful insights for hypothesis generation and further study.