What is Trajectory Inference (TI)?

Studying cellular dynamic processes including the cell cycle, cell differentiation, and cell activation is now possible because to single-cell omics data, which includes transcriptomics, proteomics, and epigenomics data.

Using trajectory inference (TI) approaches, also known as pseudotime analysis, which arrange cells along a trajectory based on similarity in their expression patterns, such dynamic processes can be computationally modelled.

The resulting trajectories are most often linear, bifurcating or tree-shaped, but more recent methods also identify more complex trajectory topologies.

TI methods offer an unbiased and transcriptome-wide understanding of a dynamic process, thereby allowing the objective identification of new (primed) subsets of cells, delineation of a differentiation tree and inference of regulatory interactions responsible for one or more bifurcations.

Current applications of TI focus on specific subsets of cells,but ongoing efforts to construct transcriptomic catalogs of whole organisms underline the urgency for accurate, scalable and user-friendly TI methods.

Cell type annotation

The first step for TI analysis is subset a group cells for further analysis. In this study, the group cells called Epithelial cells were collected in 6 Pancreatic Ductal Adenocarcinoma (PDAC) samples (Kai Chen et al. (2023))

These cells were then re-clustered and marked with unique marker genes (see table below). The grouped and annotated cells in the UMAP plots display the outcome of this stage.

Table 1: List of marker genes for annotation


The UMAP plots show the result of this step with clustered and annotated cells

The UMAP plots show the result of this step with clustered and annotated cells

Trajectory Analysis Methods

Slingshot Method

Introduction

The goal of slingshot is to use clusters of cells to uncover global structure and convert this structure into smooth lineages represented by one-dimensional variables, called “pseudotime.” It provides tools for learning cluster relationships in an unsupervised or semi-supervised manner and constructing smooth curves representing each lineage, along with visualization methods for each step.

Slingshot consists of two main stages:

    1. The inference of the global lineage structure
    1. The inference of pseudotime variables for cells along each lineage

Trajectory Inference Analysis Results

The plot show the results of trajectory analysis including four plots:

  • Cell Clusters plot: shows the clusters of the data which each point is a cell and is colored according to its cluster label.
  • Cell annotation plot: shows the cell types of the data which each point is a cell and colored according to its cell type label.
  • Minimum Spanning Tree plot: shows the the cluster-based minimum spanning tree of the data where each point is a cell and is colored according to its pseudotime value.
  • Trajectory curves plot: shows the fitted principal curve of the data where each point is a cell and is colored according to its pseidotime value.

Pseudotime following the clusters

With the pseudotime value on the x-axis and the cluster label on the y-axis, the boxplot displays the pseudotime of all cells that follow clusters. The cells’ development ranged from a low to a high pseudotime value.

Boxplot show the pseudotime value following cluters

Genes that change their expression over the course of development

After running slingshot, we are often interested in finding genes that change their expression over the course of development. We will demonstrate this type of analysis using the tradeSeq package Van den Berge et al. 2020.

For each gene, they will be fit a general additive model (GAM) using a negative binomial noise distribution to model the (potentially nonlinear) relationshipships between gene expression and pseudotime. It will be tested for significant associations between expression and pseudotime.

Table 2:The table show the expression value oftop 20 genes which were associated with the pseudotime value.


UMAP show the expression of top 4 genes which were asociate with pseudotime

UMAP show the expression of top 4 genes which were asociate with pseudotime

Monocle3 Method

Introduction

Monocleintroduced the strategy of using RNA-Seq for single-cell trajectory analysis. Rather than purifying cells into discrete states experimentally, Monocle uses an algorithm to learn the sequence of gene expression changes each cell must go through as part of a dynamic biological process. Once it has learned the overall “trajectory” of gene expression changes, Monocle can place each cell at its proper position in the trajectory. You can then use Monocle’s differential analysis toolkit to find genes regulated over the course of the trajectory, as described in the section Finding genes that change as a function of pseudotime . If there are multiple outcomes for the process, Monocle will reconstruct a “branched” trajectory. These branches correspond to cellular “decisions”, and Monocle provides powerful tools for identifying the genes affected by them and involved in making them. You can see how to analyze branches in the section Analyzing branches in single-cell trajectories.

Analysis Result

The plot show the results of trajectory analysis including four plots:

  • Cell Clusters plot: shows the clusters of the data which each point is a cell and is colored according to its cluster label.
  • Cell annotation plot: shows the cell types of the data which each point is a cell and colored according to its cell type label.
  • Minimum Spanning Tree plot: shows the the cluster-based minimum spanning tree of the data where each point is a cell and is colored according to its pseudotime value.
  • Trajectory curves plot: shows the fitted principal curve of the data where each point is a cell and is colored according to its pseudotime value..


Pseudotime following the clusters

With the pseudotime value on the x-axis and the cluster label on the y-axis, the boxplot displays the pseudotime of all cells that follow clusters. The cells’ development ranged from a low to a high pseudotime value.

Boxplot show the pseudotime value following cluters

Genes that change their expression over the course of development

Table 3: Table show which genes have change the expression following the pseudotime (g-value < 0.05)

UMAP show the expression of top 4 genes which is associated with pseudotime

UMAP show the expression of top 4 genes which is associated with pseudotime

Software catalog

Software Version Reference
Slingshot 2.8.0 Kelly Street et al. (2018)
tradeSeq 1.14.0 Koen Van den Berge et al. (2020)
monocle3 1.3.1 Junyue Cao et al.(2019)
annotation Marker genes Protein Atlas, Palloma Porto Almeida et al. (2020), Quan Shen et al.(2018)