We will show here a typical pipeline for analysis of gene expression data. This tutorial is based on the use of Monocytes and Macrophage data from the following paper:

van de Laar L, Saelens W, De Prijck S, Martens L et al. Yolk Sac Macrophages, Fetal Liver, and Adult Monocytes Can Colonize an Empty Niche and Develop into Functional Tissue-Resident Macrophages. Immunity 2016 Apr 19;44(4):755-68. PMID: 26992565

This tutorial contains the following steps:

  1. Download the data from GEO
  2. Process the expression table for analyses
  3. DE Analysis by Limma
  4. PCA (Principal component analysis)
  5. Clustering by Heatmap (only top 250 genes)
  6. GO (Gene Ontology) Enrichment Analysis
  7. GSEA (Gene Set Enrichment Analysis)
  8. Combining other dataset for PCA and clustering

You can find 3 files given to you under handouts directory:

File Description
handout.pdf Step-by-step explanation of the tasks.
Imun_gen.ex.Rdump Preprocessed object from GEO for loading in the analysis.
pheno.table A table storing the phenotype data for comparison.
mouse_hallmark_genesets.rdata Hallmark mouse gene sets for GSEA

Set the working directory

Before you start to work on this handout, please set your working directory to the place where you save the directory. For example,

setwd("/home/james/Documents/D3_Gene_Expression/practice")
# Please use your path instead of the above example

Loading the libraries

rm(list=ls())
if("mogene10sttranscriptcluster.db" %in% rownames(installed.packages()) == FALSE) {
  BiocManager::install(c("mogene10sttranscriptcluster.db"))
}
if("msigdbr" %in% rownames(installed.packages()) == FALSE) {
  BiocManager::install("msigdbr")
}

library(Biobase)
library(GEOquery)
library(limma)
library(mogene10sttranscriptcluster.db)
library(RColorBrewer)
library(gProfileR)
library(genefilter)
library(openxlsx)
library(piano)
library(gplots)
library(sva)
library(msigdbr)

Biobase contains base functions for Bioconductor; GEOquery is the bridge between GEO and BioConductor; limma is a library for the analysis of gene expression microarray data, especially the use of linear models for analysing designed experiments and the assessment of differential expression; mogene10sttranscriptcluster.db contains the mouse gene annotation of mapping between different systems; RColorBrewer can make colorful palettes; gProfileR is the package to perform gene enrichment analysis; openxlsx is for creating excel table in R; piano is the package for GSEA; gplots contains many plotting tools; and sva (ComBat) is used here to remove known batch effects from microarray data.

1 Download the data from GEO

After you find a dataset, which you want to explore, you can search in GEO using their GEO number. All the details of the protocal, platform and processing steps can be found in the GEO webpage.

Please take a look of this example: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76999

First, you need to download the data from GEO by the GEO number in R. We use the function getGEO (from package GEOquery) to download a list of all the available data.

# Load the data from GEO for GSE76999
GEO_List <- getGEO(GEO="GSE76999")
# Check the type of this variable 
class(GEO_List)
## [1] "list"
# Check the length of this list
length(GEO_List)
## [1] 1
# An element of a list always comes together with a name, right?
names(GEO_List)
## [1] "GSE76999_series_matrix.txt.gz"
# Since there is only one ExpressionSet, let's assign it into another variable
eset <- GEO_List[[1]]

ExpressionSet (eset here) is a Container for high-throughput assays and experimental metadata.

In order to understand ExpressionSet, type:

eset # Get an overview
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 35556 features, 36 samples 
##   element names: exprs 
## protocolData: none
## phenoData
##   sampleNames: GSM2042244 GSM2042245 ... GSM2042279 (36 total)
##   varLabels: title geo_accession ... tissue:ch1 (40 total)
##   varMetadata: labelDescription
## featureData
##   featureNames: 10338001 10338002 ... 10608724 (35556 total)
##   fvarLabels: ID GB_LIST ... category (12 total)
##   fvarMetadata: Column Description labelDescription
## experimentData: use 'experimentData(object)'
##   pubMedIds: 26992565 
## Annotation: GPL6246
varLabels(eset) # To see what kinds of labels inside
##  [1] "title"                   "geo_accession"          
##  [3] "status"                  "submission_date"        
##  [5] "last_update_date"        "type"                   
##  [7] "channel_count"           "source_name_ch1"        
##  [9] "organism_ch1"            "characteristics_ch1"    
## [11] "characteristics_ch1.1"   "characteristics_ch1.2"  
## [13] "treatment_protocol_ch1"  "growth_protocol_ch1"    
## [15] "molecule_ch1"            "extract_protocol_ch1"   
## [17] "label_ch1"               "label_protocol_ch1"     
## [19] "taxid_ch1"               "hyb_protocol"           
## [21] "scan_protocol"           "description"            
## [23] "data_processing"         "data_processing.1"      
## [25] "data_processing.2"       "platform_id"            
## [27] "contact_name"            "contact_email"          
## [29] "contact_department"      "contact_institute"      
## [31] "contact_address"         "contact_city"           
## [33] "contact_zip/postal_code" "contact_country"        
## [35] "supplementary_file"      "data_row_count"         
## [37] "relation"                "age:ch1"                
## [39] "strain:ch1"              "tissue:ch1"
eset$title # To see the names of the experiments
##  [1] "Monocyte extracted from adult (wk6-12) Bone Marrow, biological replicate 1"                                     
##  [2] "Monocyte extracted from adult (wk6-12) Bone Marrow, biological replicate 2"                                     
##  [3] "Monocyte extracted from adult (wk6-12) Bone Marrow, biological replicate 3"                                     
##  [4] "Monocyte extracted from adult (wk6-12) Bone Marrow, biological replicate 4"                                     
##  [5] "Monocyte extracted from E15.5 Fetal Liver, biological replicate 1"                                              
##  [6] "Monocyte extracted from E15.5 Fetal Liver, biological replicate 2"                                              
##  [7] "Monocyte extracted from E15.5 Fetal Liver, biological replicate 3"                                              
##  [8] "Monocyte extracted from E15.5 Fetal Liver, biological replicate 4"                                              
##  [9] "Macrophage extracted from E12.5 Yolk Sac, biological replicate 1"                                               
## [10] "Macrophage extracted from E12.5 Yolk Sac, biological replicate 2"                                               
## [11] "Macrophage extracted from E12.5 Yolk Sac, biological replicate 3"                                               
## [12] "Macrophage extracted from E12.5 Yolk Sac, biological replicate 4"                                               
## [13] "Alveolar Macrophage extracted from adult (wk6-12) mice via Bronchoalveolar lavage, biological replicate 1"      
## [14] "Alveolar Macrophage extracted from adult (wk6-12) mice via Bronchoalveolar lavage, biological replicate 2"      
## [15] "Alveolar Macrophage extracted from adult (wk6-12) mice via Bronchoalveolar lavage, biological replicate 3"      
## [16] "Alveolar Macrophage extracted from adult (wk6-12) mice via Bronchoalveolar lavage, biological replicate 4"      
## [17] "Alveolar Macrophage derived from transferred Bone Marrow Monocytes, 6 wks post-transfer, biological replicate 1"
## [18] "Alveolar Macrophage derived from transferred Bone Marrow Monocytes, 6 wks post-transfer, biological replicate 2"
## [19] "Alveolar Macrophage derived from transferred Bone Marrow Monocytes, 6 wks post-transfer, biological replicate 3"
## [20] "Alveolar Macrophage derived from transferred Bone Marrow Monocytes, 6 wks post-transfer, biological replicate 4"
## [21] "Alveolar Macrophage derived from transferred Fetal Liver Monocytes, 6 wks post-transfer, biological replicate 1"
## [22] "Alveolar Macrophage derived from transferred Fetal Liver Monocytes, 6 wks post-transfer, biological replicate 2"
## [23] "Alveolar Macrophage derived from transferred Fetal Liver Monocytes, 6 wks post-transfer, biological replicate 3"
## [24] "Alveolar Macrophage derived from transferred Fetal Liver Monocytes, 6 wks post-transfer, biological replicate 4"
## [25] "Alveolar Macrophage derived from transferred Yolk Sac Macrophages, 6 wks post-transfer, biological replicate 1" 
## [26] "Alveolar Macrophage derived from transferred Yolk Sac Macrophages, 6 wks post-transfer, biological replicate 2" 
## [27] "Alveolar Macrophage derived from transferred Yolk Sac Macrophages, 6 wks post-transfer, biological replicate 3" 
## [28] "Alveolar Macrophage derived from transferred Yolk Sac Macrophages, 6 wks post-transfer, biological replicate 4" 
## [29] "Alveolar Macrophage derived from transferred Alveolar macrophages, 6 wks post-transfer, biological replicate 1" 
## [30] "Alveolar Macrophage derived from transferred Alveolar macrophages, 6 wks post-transfer, biological replicate 2" 
## [31] "Alveolar Macrophage derived from transferred Alveolar macrophages, 6 wks post-transfer, biological replicate 3" 
## [32] "Alveolar Macrophage derived from transferred Alveolar macrophages, 6 wks post-transfer, biological replicate 4" 
## [33] "Alveolar Macrophage extracted from 6 wk old wild type mice via Bronchoalveolar lavage, biological replicate 1"  
## [34] "Alveolar Macrophage extracted from 6 wk old wild type mice via Bronchoalveolar lavage, biological replicate 2"  
## [35] "Alveolar Macrophage extracted from 6 wk old wild type mice via Bronchoalveolar lavage, biological replicate 3"  
## [36] "Alveolar Macrophage extracted from 6 wk old wild type mice via Bronchoalveolar lavage, biological replicate 4"
eset$organism_ch1 # Show the organism of the experiments
##  [1] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
##  [6] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [11] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [16] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [21] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [26] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [31] "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus" "Mus musculus"
## [36] "Mus musculus"

2 Process the expression table for analysis

Here we select only the first 12 samples for downstream analysis and eliminate others. These 12 smaples are BM-MOs (bone marrow monocytes), FL-MOs (fetal liver monocytes), and YS-Macs (yolk sac-derived macrophages), with 4 replicates for each.

# Select the samples we need
sel <- 1:12
eset <- eset[ ,sel]
# We also need to modify its expression matrix
exprs(eset) <- exprs(eset)[ ,sel]
# Create labels for later steps
labels <- c("BM_MOs","FL_MOs","YS_Macs")
sel_groups <- factor(rep(labels, each=4))

# Lots of colors you can choose: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
# But let's try the most simple ones
# palette function is used to set the customized colors
palette(c("red","blue", "green"))

Before we do any further analysis, we need to do quality control of the data. We use boxplot to check whether we need further normalization on the data.

boxplot(exprs(eset), col=sel_groups)

How is it? Are you satisfied? I am not. In order to learn all the parameters you can adjust for this boxplot, you can read the help message by typing:

?boxplot

I will shown a few parameters that can be used to improve this plot. Let’s rotate the labels on x axis!!

boxplot(exprs(eset), col=sel_groups, las=2)