各位童鞋,大家好,又到周一了,今天我们来分享一下一个批次矫正的方法-----liger,在这个之前呢,我分享过Seurat多样本整合去批次的原理,文章在Seurat包其中的FindIntegrationAnchors函数解析,分享了一些去批次的软件,文章在批次效应,单细胞数据用Harmony算法进行批次矫正, 今天我们来分享另外一个批次去除的方法,liger,文章在Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity, 2019年6月发表与cell(顶级期刊了),而同时在另一篇文献中A benchmark of batch-effect correction methods for single-cell RNA sequencing data比较了很多种批次去除批次方法,对Seurat,harmony,liger三种方法的评价最高,但是其中Seurat矫正存在严重的过矫正问题,而harmony目前已经普遍运用,我们今天分享的liger,它的独到之处又是什么呢?为什么可以发到cell,今天我们来参透它,还是原来的方法,先分享文章,后示例代码。
SUMMARY
Defining cell types requires integrating diverse single-cell measurements from multiple experiments and biological contexts(这个不用多介绍了,一个样本发文章的时代早就过去了). To flexibly model singlecell datasets, we developed LIGER, an algorithm that delineates shared and dataset-specific features of cell identity. We applied it to four diverse and challenging analyses of human and mouse brain cells.
(1) First, we defined region-specific and sexually dimorphic gene expression in the mouse bed nucleus of the stria terminalis.(这个地方用到了形态学方法方面的辅助,以检验整合结果的优劣)
(2)Second, we analyzed expression in the human substantia nigra, comparing cell states in specific donors and relating cell types to those in the mouse.(跨物种之间的整合结果检验)
(3)Third, we integrated in situ and singlecell expression data to spatially locate fine subtypes of cells present in the mouse frontal cortex.(原位和单细胞共同的分析检验)。
Finally, we jointly defined mouse cortical cell types using single-cell RNA-seq and DNA methylation profiles(DNA甲基化,这个不是我们今天的重点), revealing putative mechanisms of cell-type-specific epigenomic regulation(表观调控). Integrative analyses using LIGER promise to accelerate investigations of celltype definition, gene regulation, and disease states(让我们拭目以待)。
INTRODUCTION
The function of the mammalian brain is dependent upon the coordinated activity of highly specialized cell types.(第一句话就很重要,强调了细胞空间位置的重要性,这也是为什么现在推出10X空间转录组的原因)。单细胞技术have provided an unprecedented opportunity to systematically identify these cellular specializations,across multiple regions,in the context of perturbations,and in related species(每次读到这里,都会想空间转录组如果也是单细胞精度就非常完美了),Furthermore, new technologies can now measure DNA methylation(甲基化的结果也是非常的重要,大家可以深入的学习,这个方面你的大牛是汤富筹(不知道名字打对了没)),chromatin accessibility(这个就是ATAC),and in situ expression(原位杂交),in thousands to millions of cells.(庞大的单细胞数据目前也是一个大问题,其中张泽民团队研究的新冠文章细胞数量达到恐怖的百万级)Each of these experimental contexts and measurement modalities provides a different glimpse into cellular identity.
Integrative computational tools that can flexibly combine individual single-cell datasets into a unified, shared analysis offer many exciting biological opportunities.(整合分析的必要性),The major challenge of
integrative analysis lies in reconciling the immense heterogeneity observed across individual datasets.(现在不止免疫的个体异质性了,很多都设及到批次)。However, in many kinds of analysis, both dataset similarities and differences are biologically important, such as when we seek to compare and contrast scRNA-seq data from healthy and disease-affected individuals。
To address these challenges, we developed a new computational method called LIGER (linked inference of genomic experimental relationships). We show here that LIGER enables the identification of shared cell types across individuals, species, and multiple modalities (gene expression, epigenetic, or spatial data), as well as dataset-specific features, offering a unified analysis of heterogeneous single-cell datasets.(在这里我们只关注样本的差异去除,至于物种可以了解一下)。
Result1 Comparing and Contrasting Single-Cell Datasets with Shared and Dataset-Specific Factors
LIGER takes as input multiple single-cell datasets, which may be scRNA-seq experiments from different individuals, time points, or species—or measurements from different molecular modalities, such as single-cell epigenome data or spatial gene expression data(个体,物种,技术)
LIGER then employs integ