msigdb软件包在R可访问对象中提供分子特征数据库(MSigDB)。分子特征集存储在GSEABase包的GeneSet类对象中,整个数据库存储在GeneSetCollection对象中。然后,这些数据将托管在ExperimentHub上。本文件包中使用的数据来自Broad Institute的MSigDB。每个基因集的元数据与基因集一起存储在基因集类对象中。
H | hallmark gene sets are coherently expressed signatures derived by aggregating many MSigDB gene sets to represent well-defined biological states or processes. |
---|
C1 | positional gene sets for each human chromosome and cytogenetic band. |
---|
C2 | curated gene sets from online pathway databases, publications in PubMed, and knowledge of domain experts. |
---|
C3 | regulatory target gene sets based on gene target predictions for microRNA seed sequences and predicted transcription factor binding sites. |
---|
C4 | computational gene sets defined by mining large collections of cancer-oriented microarray data. |
---|
C5 | ontology gene sets consist of genes annotated by the same ontology term. |
---|
C6 | oncogenic signature gene sets defined directly from microarray gene expression data from cancer gene perturbations. |
---|
C7 | immunologic signature gene sets represent cell states and perturbations within the immune system. |
---|
C8 | cell type signature gene sets curated from cluster markers identified in single-cell sequencing studies of human tissue. |
---|
library(msigdb)
library(GSEABase)
ls("package:msigdb")
# Download molecular signatures database."hs" for human and "mm" for mouse
msigdb.hs = getMsigdb(org = 'hs',id = c("SYM", "EZID"))
# Downloading and integrating KEGG gene sets
msigdb.hs = appendKEGG(msigdb.hs)
length(msigdb.hs)
# 可以根据需求选择子基因集
listCollections(msigdb.hs)
#hallmarks = subsetCollection(msigdb.hs, 'h')
#c3<- subsetCollection(msigdb.hs, 'c3')
#listSubCollections(msigdb.hs)
# 基因集列表
msigdb_ids <- geneIds(msigdb.hs) # GSEABase::geneIds
class(msigdb_ids) # list