COVID – 19
The severe acute respiratory coronavirus 2 (SARS-CoV-2) has been implicated for the global Coronavirus 2019 (COVID-19) pandemic. As of September, 2020, there are over 25 million cases with over, 800 thousand lives claimed worldwide. To the general population, infection of SARS-CoV-2 manifests mild flu-like symptoms including but not limited to fever, dry cough, and lethargy. Although these patients can recover without hospitalization, they can be highly infectious which poses a threat to vulnerable populations. Vulnerable populations, including the elderly and the immunocompromised, often have underlying health conditions which makes them more susceptible to severe symptoms such as ARDS. In fact, it has been reported that 95% of SARS-CoV-2 fatalities have an underlying condition. In severe cases, an immunological response known as “cytokine storm” is observed with excessive levels of circulating inflammatory cytokines that become unregulatable and harmful to the patient. As a result, there has been a global effort to understand COVID-19 pathology in hopes of developing treatments and strategies to mitigate its spread.
Current understanding of SARS-CoV-2 infection of host cells is cellular entry through the angiotensin converting enzyme 2 (ACE2) receptor and the transmembrane serine protease 2 (TMPRSS2) protein. Despite the emphasis on COVID-19’s respiratory symptoms, ACE2 is expressed in many tissues and SARS-CoV-2 infection has been shown to manifest into non-respiratory symptoms including cardiac damage. Interestingly, SARS-CoV-2 is detectable before symptoms appear leading to asymptomatic carriers of the disease. Therefore, there is an interest to evaluate an individual’s immune response for early signs of infection and predict disease severity. The relationship between SARS-CoV-2 infection and patient immune response has not been fully characterized due to its complex nature. Immune response to COVID-19 has been shown to vary drastically between individuals based on disease progression, genetic differences, and environmental factors. Therefore, currently published papers seeking to study an individual’s immune response may be limited due to lack of sampling diversity.
To capture a holistic immunological reaction against SARS-CoV-2, we have adapted multiple datasets from publicly available single cell RNA sequencing libraries. Peripheral blood mononuclear cells (PBMCs) from individuals in varying disease states (Healthy, Mild, Severe, Early Recover Stage, Late Recovery Stage) were previously collected and sequenced by Wen et al and Lee et al. (1,2). We processed these single cell RNA libraries with 10x Genomics Cell Ranger Software and identified immune cell populations in respective patient groups. Various computational analyses were performed such as Seurat/ Monocle3 for preprocessing and cell clustering. We are also one of the first to utilize scVelo’s RNA velocity estimation to predict future immune responses throughout COVID-19 disease progression. Our study demonstrates computational processes’ capabilities in performing viral research, which can be further implemented for other infectious diseases.
A novel and powerful technology to probe the immune system is the use of RNA velocity from single-cell RNA sequencing (scRNA-seq) data. scRNA-seq is relatively new itself, with the ability to sequence the transcriptome of individual cells. However, a simple analysis of scRNA-seq will only give us information at a single point of time – merely a snapshot of the gene expression profile of when the sample was obtained.
As an example of scRNA-seq, the image above shows the clustering of each single cell from a patient. This clustering is based on similarities in gene expression profiles (i.e. cells with a similar transcriptome are colored the same). Since we are dealing with hundreds of thousands of genes, we use a dimension reduction method (UMAP) to project the data onto two axes (UMAP-1 and UMAP-2).
With the advent of RNA velocity (link paper), we are able to retrieve dynamic information as well. But how does one obtain dynamic information from a snapshot? The answer is analogous to a snapshot of people walking:
From looking at this picture, you can gather a general sense that the man with the briefcase is walking to the right in a hurried fashion whereas the blonde woman has perhaps stopped to check her phone. Likewise, with RNA velocity, we can determine the ‘speed’ and ‘direction’ of each cell at a particular instance, that is, the instantaneous ‘velocity’. As the name suggests, this velocity is derived from the RNA information provided by scRNA-seq. The “direction” of each cell indicates the direction the cell is predicted to differentiate towards, based on its transcriptome and the transcriptome of its ‘future cell’.
For RNA velocity, this transcriptome information is specifically the ratio of unspliced and spliced RNA. When RNA is transcribed from DNA, the RNA immediately formed is unspliced with “redundant” information; the unspliced DNA is then spliced as it is subsequently sent for protein synthesis. One can determine the activity of the cell by examining the number of spliced and unspliced genes. We demonstrate this in these phase diagrams (source: La Manno G, Soldatov R, Zeisel A, et al. RNA velocity of single cells. Nature. 2018;560(7719):494-498. doi:10.1038/s41586-018-0414-6):
The phase diagram has its axes representing the proportion of spliced or unspliced RNA for an arbitrary gene. If the cell has a certain proportion of unspliced more than spliced RNA, then that would mean that the gene is becoming more active (induction), and is pushed along the red arrow onto steady state. If there is a certain proportion of spliced over the unspliced RNA, then the gene is becoming less active (repression) and is pushed down the blue arrow. This directionality can be applied to many genes and therefore a single velocity can be extrapolated as it points toward cells that fit its trajectory (source: Sten Linnarsson; SIB Bioinformatics Awards 2019):
We can then superimpose arrows representing the RNA velocity onto the clusters:
In this example, we observe cell states of cluster 0 may be transitioning into that of cluster 6 (which could be potentially driven by differentiation) …
…and we are done! Note that RNA velocity can provide further analysis which will be discussed later.
Additional Resources
Video: SIB Bioinformatics Awards 2019: RNA Velocity and the Velocyto tool; https://www.youtube.com/watch?v=EPTgF4EA2zY
scVelo dynamical modeling; https://scvelo.readthedocs.io/index.html
Computational modeling
Published, COVID-19 patient single-cell RNA sequencing data were collected and reanalyzed using RNA velocity tools. For preprocessing, the 10X Genomics CellRanger software was used to generate count matrices from raw sequencing reads (fastq files directly available or preprocessed from SRA data format using SRA toolkit). Next, the velocyto pipeline was used to generate counts for spliced/ unspliced RNA molecules from the CellRanger reads, which are stored as ‘.loom’ files.
Using R packages Seurat and SeuratWrapper the loom files were read and filtering based on quality control metrics were performed on cells detected. The data was also normalized, logarithmized, and clustered before being converted to ‘.h5ad’ file format.
Subsequently, the data were read into the scVelo python package. The scVelo package is a powerful tool that efficiently computes RNA velocity information from scRNA-seq reads. The computation and storage of the data were handled on ComputeCanada’s clusters.
Dataset classification
Datasets collected were classified into 5 groups representing varying stages of disease progression as in original publication. They are defined as non-infected/healthy, mild, severe COVID-19 patients (requires ventilation), early recovery patients (<7 days after COVID-19-negative), and late recovery patients (>14 days after COVID-19-negative). The scRNA-seq data for healthy controls, early recovery stage, and late recovery stage were obtained from studies conducted by Wen, W. et al. while mild and severe patient data were obtained from studies done by Lee, J. S. et al.
scVelo can be used to analyze velocity vectors of PBMC’s genes. For this study, we aimed to identify distinct differentiation patterns of immune cell populations across Covid-19 patients with varying disease severity. PBMCs of patients in five disease categories (healthy, mild, severe, early recovery stage (ERS), late recovery stage (LRS)) have previously been sequenced by Wen, W. et al and Lee, J. S. et al. The sequencing data in fastq file formats were accessible online, and have been adopted for use in this paper.
Figure 1. Cell Clusters of Varying Diseased Patients
Figure Caption: Seurat clustering and annotation of immune cell populations based on expression for marker genes
Figure 2. Velocity analysis results on single-cell resolution
Figure Caption: RNA velocity projects visualized on single-cell level.
Cell clusterings were processed by two step Seurat preprocessing for quality control and Moncle3/Garnett for cell clustering (Figure 2). Healthy, ERS, and LRS clusters are better defined with higher cell densities while Mild and Severe clusters are sparse with lower cell densities. These clustering distinctions are in part due to varying sequencing methods performed by the two papers these scRNA libraries were adapted from. Varying methods could result in more or less cells to be filtered and qualify for clustering.
Figure 3. Velocity Stream of Cell Clusters
Figure Caption: Annotated numerics represent clusters based on cell similarities determined by k-NN clustering via Seurat.
Figure 4. Estimated Latent Time of Cell Clusters
Figure Caption: Cell clusters are labeled with arbitrary values between 0 to 1, with 0 representing least differentiated cells or the relative amount of time a cell has experienced throughout its differentiation.
scVelo can also determine latent times across cell clusters, which estimates a cell’s stage of differentiation based on transcriptional data. More differentiation similarly implies higher cell activity. Latent time of PBMC clusters are visualized for patients across the 5 disease states (Fig 4). It is important to note that latent timing is a measure relative to cells within a cluster and cannot be compared between disease categories. For example, in Figure 1, healthy patients exhibited differentiated cellular states among most immune populations. Such activity, however, is only relative to the most inactive or undifferentiated cell types of that cluster (in this case, B cells) and not relative to cell types of symptomatic patients (mild, severe, ERS, or LRS).
As patients progressed from mild to severe, immune cell populations had increased cellular differentiation implicating heightened cellular activity (Fig 4 B&C). Mild patients initially experienced higher innate immune cell activity, including populations of monocytes, dendritic cells (DC), natural killer cells (NK), and neutrophils. This supports the classic immune reaction model where innate immune cells act as the first line of defense. Severe patients, on the other hand, exhibited high activity among both innate and adaptive populations, which additionally includes CD4+ T helper cells, CD8+ T cells, and B cells. Monocyte activity in severe patients is notably less persistent. The high cellular differentiation across most immune cell types suggests SARS-CoV-2’s dynamic interaction with the immune system, resulting in severe disease symptoms. If left unresolved, patients often suffer bystander effects of inflammation and cytokine storm. Damage of tissues and organs have previously been observed in severe patients. We illustrate the first evidence of innate and adaptive cells’ role in SARS-CoV-2 infection, where innate cell’s initial prominence is accompanied by adaptive cells upon severe disease progression.
For recovering patients, we similarly identified patterns in immune cell differentiation based on latent time clustering. During initial recovery, patients maintained relatively moderate innate immune cell activity and heightened B cell activity (Fig 4D). In later stages of recovery, B cells were the only cell types with persistently high cellular differentiation (Fig 4E). This resembles the immune homeostasis model where upon viral resolution, immune activity is dampened to relieve inflammation and prevent immune-mediated tissue or organ damage. The unsuppressed B cell activity suggests adaptive immunity’s role in preventing reinfection via antibody production. Furthermore, B cells were the only subset that maintained activity throughout the entire duration of SARS-CoV-2 infection, which suggests its multifactorial role in patients (Fig 4B-E). B cell activity in mild patients, which represents the majority of COVID-19 infected patients worldwide, are likely dominated by effector plasma cells which rapidly produce viral-specific antibodies. This would support the use of antibody-based detection for testing Covid-19 positivity as opposed to PCR-based methods which reportedly have high false negative rates. Conversely, B cells of recovering patients, especially in LRS, would likely be composed of memory subtypes as the virus is cleared and plasma cells undergo apoptosis. Memory cells persist to survey the host and maintain immunity against secondary infection.
Sample code used in our analysis is available at: https://github.com/Wong-Lab-Student-Engagement
We have established a robust workflow to determine the general single cell RNA velocity landscape of COVID-19 patients.We have produced an abundance of valuable datasets which could be further characterized using bioinformatic tools. We hope to continue the investigation of the transcriptome dynamics between cell subtypes and individual genes. We are additionally interested in evaluating and identifying differential genetic markers throughout SARS-CoV2’s disease progression in humans.
We would like to acknowledge the University of Toronto for funding our research through the U of T COVID-19 Student Engagement Award. We also thank Compute Canada for their services that were essential for running computations. Lastly, we are grateful for Dr. Amy Wong who provided us direction and guidance since project conception and provided us access to computational resources necessary for data processing.
- Wen, W. et al. Immune cell profiling of COVID-19 patients in the recovery stage by single-cell sequencing. Cell Discovery 6, (2020).
- Lee, J. S. et al. Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19. Science Immunology 5, (2020).
- La Manno G, Soldatov R, Zeisel A, et al. RNA velocity of single cells. Nature. 560, (2018).
- Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol. (2020)