Pseudo-bulk Differential Expression Analysis

What is Pseudo-bulk?

In single-cell RNA sequencing, transcriptomes of individual cells are sequenced in a single experiment. This enables us to understand the heterogeneity of cells within a given tissue or condition but the high granularity of scRNA-sed data also poses a challenge when the goal is to compare the expression levels across different conditions or individuals. Pseudo-bulk is applied to address this challenge. In Pseudo-bulk analysis, single-cell data from a group of cells from the same condition or individual are aggregated into a single expression profile hence the name “Pseudo-bulk”. This aggregation can be done simply by summing or averaging the expression values of the cells in the group.

Pseudo-bulk Differential Expression Analysis workflow

Several key steps include:

Data Preprocessing: involves quality control steps such as removing low-quality cells, normalizing data, annotating cell types based on gene expression patterns.
Pseudo-bulk Generation: single cell data from a group of cells from the same condition or individual are aggregated into single expression profile using various methods such as summing or averaging.
Differential Expression Analysis: applies differential analysis methods similar to the traditional RNA-seq analysis such as edgeR or DESeq2 to the Pseudo-bulk samples.
Result Interpretation: differential expression results are then interpreted in the context of the experiment. typically, the results are gene lists that are upregulated or downregulated.

Quality Control - sample level

PCA plot

Principal Component Analysis (PCA) is a method used to simplify complex data by finding its most important patterns. It transforms correlated variables into new, uncorrelated ones called “principal components.” These components capture the largest variations in the data. By using PCA, we can reduce the data’s dimensionality while keeping essential information, making it easier to visualize and analyze.

PCA plot between two groups.

Hierarchical clustering

Hierarchical clustering is a method to group similar data points together based on their similarities. It creates a tree-like structure (dendrogram) where similar items are joined at different levels. It helps identify clusters and relationships within the data without the need to specify the number of clusters beforehand. Similar to PCA, hierarchical clustering is another, complementary method for identifying strong patterns in a dataset and potential outliers.

Dispersion estimates

We can check the fit of the DESeq2 model to our data by looking at the plot of dispersion estimates. In this example, the dispersion plot looks encouraging, since we expect our dispersions to decrease with increasing mean and follow the line of best fit (in red).

Table of results for significant genes

Filter our table to extract only the significant genes using a p-adjusted threshold of 0.05.

Significant genes visualization

Scatter Plot

This plot is a good check to make sure that we are interpreting our fold change values correctly

Heatmap of all significant genes

Volcano plot

Gene Ontology Analysis

Up regulated

Down regulated

Software Catalog

Analysis	Software	Version
Pseudo-bulk Differential Expression Analysis	DESeq2	1.34.0
Gene Ontology Analysis	gprofiler2	0.2.2

Pseudo-bulk Differential Expression Analysis - PI0004

What is Pseudo-bulk?