Machine learning analysis suggests that there are four sub-phenotypes of long COVID

In a current research printed in Nature Medicine, researchers recognized PASC [post-acute sequelae of coronavirus disease 2019 (COVID-19)] sub-phenotypes relying on circumstances recognized inside 1 to three months of acute an infection by extreme acute respiratory syndrome coronavirus 2 (SARS-CoV-2). 

Study: Data-driven identification of post-acute SARS-CoV-2 an infection subphenotypes. Image Credit: males_design/Shutterstock


Studies have examined PASC circumstances individually with out offering proof of co-occurring circumstances. The sun-phenotypes or co-incident patterns, the diploma to which PASC circumstances and signs are co-incident or disproportionately developed amongst specific sufferers, might most likely help in revealing PASC pathophysiology.

About the research

In the current research, researchers recognized PASC sub-phenotypes by a data-driven method primarily based on machine learning.

EHR (digital well being file) knowledge of two huge CRNs (scientific analysis networks) from the nationwide PCORnet (patient-centred CRN), i.e., the INSIGHT CRN and the OneFlorida+ CRN. The INSIGHT CRN includes 12 million NYC (New York City) residents, whereas the OneFlorida+ CRN includes 19 million people residing in Georgia, Alabama, and Georgia.

The INSIGHT and OneFlorida+ CRN people comprised the developmental cohort (n=20,881) and validation cohort (n=13,724), respectively. The research comprised SARS-CoV-2-positive people, for whom circumstances developed between 30 days and 180 days of reported COVID-19 prognosis have been assessed.

COVID-19 prognosis was primarily based on optimistic SARS-CoV-2 antigen take a look at or nucleic acid amplification take a look at stories between March 2020 and November 2021. Incidence for 137 possible PASC situation CCSR (scientific classifications software program refined) classes, outlined by the ICD-10 (International Classification of Diseases, tenth revision) codes, was assessed.

The TM (subject modeling) method was used to establish co-incident patterns of the PASC circumstances, relying on which PASC sub-phenotypes have been decided. After acquiring high-dimensional binary representations of PASC circumstances (step 1), the algorithm realized PASC matters (T) (step 2) and inferred the affected person representations within the low-dimensional PASC subject house (step 3) through the topic-modelling method. PASC sub-phenotypes have been decided primarily based on affected person clusters representing PASC matters (step 4).

PASC co-incidence patterns of SARS-CoV-2-positive and SARS-CoV-2-negative people have been in contrast primarily based on the generated warmth maps, and the entropy of each subject vector was calculated. The robustness of the recognized PASC sub-phenotypes was evaluated primarily based on propensity rating (PS) changes. Further, the crew quantitatively in contrast the matters. The unique set of matters realized from the 137 PASC circumstances with cosine similarity and related matters realized from the 2 CRN cohorts have been quantitatively evaluated.


Four PASC sub-phenotypes have been recognized. Sub-phenotype 1 comprised 7,047 (34%) sufferers and was predominated by renal-associated, circulation-associated, and cardiac-associated sicknesses (T-3, 8, 10), akin to kidney failure, circulatory and cardiac issues, and fluid and electrolyte imbalance. The median affected person age was 65 years, and 49% of them have been males. The sufferers had excessive acute COVID-19 severity [hospitalization (61%), mechanical ventilator wants (5.0%), and important care admissions (10%).

The sub-phenotype had the best proportion of SARS-CoV-2-positive sufferers (37%) through the preliminary COVID-19 wave (between March and June 2020). The sub-phenotype people had an elevated burden of comorbidities and have been largely prescribed for anemia, circulatory issues, and endocrine issues.

Sub-phenotype 2 was dominated by sleep, nervousness, and respiratory issues. The sub-phenotype comprised 6,838 (33%) sufferers and was predominated by pulmonary issues (T-4,7,9), nervousness, sleep issues, chest ache, and complications. The median age of the sufferers was 51 years, and 63% of them have been feminine, with 31% acute COVID-19 hospitalizations.

The sub-phenotype had the best fraction (65%) of sufferers recognized with COVID-19 between November 2020 and November 2021. Sub-phenotype 2 people have been largely prescribed anti-allergy, anti-inflammatory, and anti-asthma medicines, akin to inhaled steroids, montelukast, and levalbuterol.

Sub-phenotype 3 comprised 23% (n=4,879) of people with issues of the nervous and musculoskeletal techniques (T-1,5,6), together with ache of musculoskeletal origin, sleep issues, and complications. The median affected person age was 57 years, and 61% of them have been feminine. The sub-phenotype comprised the best proportion of people with >5.0 outpatient setting visits earlier than COVID-19 (78%). The sub-phenotype people have been principally prescribed with analgesic medicines (akin to ketorolac and ibuprofen).

Sub-phenotype 4 comprised 10% (n=2,117) of people with primarily respiratory and digestive issues (T-2, 4, 8). The median affected person age was 54 years, and 62% of them have been feminine, with the best charges for zero visits to emergency departments (57.0%) and the least mechanical ventilator use charges (one p.c) and admissions to essential care models (three p.c) throughout acute COVID-19. The sub-phenotype people have been largely prescribed digestive system dysfunction medicines.

The matters realized from SARS-CoV-2-negative people confirmed better entropy values than SARS-CoV-2-positive sufferers. Cosine similarity findings confirmed the robustness of the PASC sub-phenotype classification, and the patterns of co-incidence noticed for the 2 CRN cohorts have been related for SARS-CoV-2-positive people. On the opposite, the matters for uninfected people have been dissimilar to these realized from SARS-CoV-2-positive people with lesser focus patterns.


Overall, the research findings highlighted four reproducible data-driven PASC sub-phenotypes recognized by machine learning. The findings might help well being authorities in enhancing PASC administration.

Recommended For You