MSigDB Collections

The 10295 gene sets in the Molecular Signatures Database (MSigDB) are divided into 7 major collections, and several subcollections. See the table below for a description of each. For more information, see also the MSigDB Statistics and the MSigDB Release Notes in the Documentation section.

Click on the "browse gene sets" links in the table to view the gene sets in a collection. Or download the gene sets in a collection by clicking on the links below the "Download GMT Files" headings. For a description of the GMT file format see the Data Formats in the Documentation section. The gene sets can be downloaded as Entrez Gene Identifiers, HUGO Gene Symbols, or the identifiers specified by the original source of the gene set (e.g. microarray probe set identifiers). An XML file containing all the MSigDB gene sets is available on the Downloads page.

C1: positional gene sets
(browse 326 gene sets)
Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene. (Cytogenetic locations were parsed from HUGO, October 2006, and Unigene, build 197. When there were conflicts, the Unigene entry was used.) These gene sets are helpful in identifying effects related to chromosomal deletions or amplifications, dosage compensation, epigenetic silencing, and other regional effects. Download GMT Files
original identifiers
gene symbols
entrez genes ids
C2: curated gene sets
(browse 4722 gene sets)
Gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts. The gene set page for each gene set lists its source. details Download GMT Files
original identifiers
gene symbols
entrez genes ids
CGP: chemical and genetic perturbations
(browse 3402 gene sets)
Gene sets represent expression signatures of genetic and chemical perturbations. A number of these gene sets come in pairs: an xxx_UP (xxx_DN) gene set representing genes induced (repressed) by the perturbation. The gene set page for each gene set lists the PubMed citation on which it is based. Download GMT Files
original identifiers
gene symbols
entrez genes ids
CP: Canonical pathways
(browse 1320 gene sets)
Gene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts. details Download GMT Files
original identifiers
gene symbols
entrez genes ids
CP:BIOCARTA: BioCarta gene sets
(browse 217 gene sets)
Gene sets derived from the BioCarta pathway database (http://www.biocarta.com/genes/index.asp). Download GMT Files
original identifiers
gene symbols
entrez genes ids
CP:KEGG: KEGG gene sets
(browse 186 gene sets)
Gene sets derived from the KEGG pathway database (http://www.genome.jp/kegg/pathway.html). Download GMT Files
original identifiers
gene symbols
entrez genes ids
CP:REACTOME: Reactome gene sets
(browse 674 gene sets)
Gene sets derived from the Reactome pathway database (http://www.reactome.org/). Download GMT Files
original identifiers
gene symbols
entrez genes ids
C3: motif gene sets
(browse 836 gene sets)
Gene sets that contain genes that share a cis-regulatory motif that is conserved across the human, mouse, rat, and dog genomes. The motifs are catalogued (Xie et al. 2005) and represent known or likely regulatory elements in promoters and 3'-UTRs. These gene sets make it possible to link changes in a microarray experiment to a conserved, putative cis-regulatory element. Download GMT Files
original identifiers
gene symbols
entrez genes ids
MIR: microRNA targets
(browse 221 gene sets)
Gene sets that contain genes that share a 3'-UTR microRNA binding motif. Download GMT Files
original identifiers
gene symbols
entrez genes ids
TFT: transcription factor targets
(browse 615 gene sets)
Gene sets that contain genes that share a transcription factor binding site defined in the TRANSFAC (version 7.4, http://www.gene-regulation.com/) database. Each of these gene sets is annotated by a TRANSFAC record. Download GMT Files
original identifiers
gene symbols
entrez genes ids
C4: computational gene sets
(browse 858 gene sets)
Computational gene sets defined by mining large collections of cancer-oriented microarray data. Download GMT Files
original identifiers
gene symbols
entrez genes ids
CGN: cancer gene neighborhoods
(browse 427 gene sets)
Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes (Brentani, Caballero et al. 2003). This collection is identical to that previously reported in (Subramanian, Tamayo et al. 2005). details Download GMT Files
original identifiers
gene symbols
entrez genes ids
CM: cancer modules
(browse 431 gene sets)
Gene sets defined by Segal et al. 2004. Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, GO, and others. By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions. Download GMT Files
original identifiers
gene symbols
entrez genes ids
C5: GO gene sets
(browse 1454 gene sets)
Gene sets are named by GO term and contain genes annotated by that term. GSEA users: Gene set enrichment analysis identifies gene sets consisting of co-regulated genes; GO gene sets are based on ontologies and do not necessarily comprise co-regulated genes. details Download GMT Files
original identifiers
gene symbols
entrez genes ids
BP: GO biological process
(browse 825 gene sets)
Gene sets derived from the Biological Process Ontology (http://www.geneontology.org/GO.process.guidelines.shtml). Download GMT Files
original identifiers
gene symbols
entrez genes ids
CC: GO cellular component
(browse 233 gene sets)
Gene sets derived from the Cellular Component Ontology (http://www.geneontology.org/GO.component.guidelines.shtml). Download GMT Files
original identifiers
gene symbols
entrez genes ids
MF: GO molecular function
(browse 396 gene sets)
Gene sets derived from the Molecular Function Ontology (http://www.geneontology.org/GO.function.guidelines.shtml). Download GMT Files
original identifiers
gene symbols
entrez genes ids
C6: oncogenic signatures
(browse 189 gene sets)
Gene sets represent signatures of cellular pathways which are often dis-regulated in cancer. The majority of signatures were generated directly from microarray data from NCBI GEO or from internal unpublished profiling experiments which involved perturbation of known cancer genes. In addition, a small number of oncogenic signatures were curated from scientific publications. Download GMT Files
original identifiers
gene symbols
entrez genes ids
C7: immunologic signatures
(browse 1910 gene sets)
Gene sets that represent cell states and perturbations within the immune system. The signatures were generated by manual curation of published studies in human and mouse immunology. For each study, pairwise comparisons of relevant classes were made and genes ranked by mutual information. Gene sets correspond to top or bottom genes (FDR < 0.25 or maximum of 200 genes) for each comparison. This resource is generated as part of the Human Immunology Project Consortium (HIPC; http://www.immuneprofiling.org/). details Download GMT Files
original identifiers
gene symbols
entrez genes ids