Fast, sensitive and accurate integration of single-cell data with Harmony

Abstract

The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.

Results

Figure1

Figure1 shows overview of Harmony algorithm.

“a, Harmony uses fuzzy clustering to assign each cell to multiple clusters, while a penalty term ensures that the diversity of datasets within each cluster is maximized. b, Harmony calculates a global centroid for each cluster, as well as dataset-specific centroids for each cluster. c, Within each cluster, Harmony calculates a correction factor for each dataset based on the centroids. d, Finally, Harmony corrects each cell with a cell-specific factor: a linear combination of dataset correction factors weighted by the cell’s soft cluster assignments made in step a. Harmony repeats steps a to d until convergence. The dependence between cluster assignment and dataset diminishes with each round. Datasets are represented with colors, cell types with different shapes.” (Korsunsky 等, 2019, p. 1290)

Figure2

“the local inverse Simpson’s Index (LISI” (Korsunsky 等, 2019, p. 1290)

Figure2 use deformation of Simpson’s Diversity Index to measure integration ability.

See more details on Simpson’s Diversity Index: https://www.statology.org/simpsons-diversity-index/

Figure3

Figure3 compares the computational efficiency between different methods.

Figure4

Figure4 shows fine-grained subpopulation identification in PBMCs across technologies.

Figure5

Figure5 indicates the integration performance by donor and technology.

Figure6

Figure6 shows harmony integrate spatially resolved transcriptomics with scRNA-seq data.

But I don’t think Plot b is informative. If they presents the loctaion overlap of scRNA and ST, which would be convincible.

Reference