Authors: Belinda Phipson, Anna Trigos, Matt Ritchie, Shian Su, Maria Doyle, Harriet Dashnow, Charity Law

Data files and Resources

Data files are available from: https://figshare.com/s/1d788fd384d33e913a2a. You should download the files listed below and place them into a folder called data in your working directory.

Data files:

  • GSE60450_Lactation-GenewiseCounts.txt
  • SampleInfo.txt
  • SampleInfo_Corrected.txt

Data files were originally obtained from:
ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE60nnn/GSE60450/suppl/GSE60450_Lactation-GenewiseCounts.txt.gz
http://bioinf.wehi.edu.au/MSigDB/v7.1/Mm.c2.all.v7.1.entrez.rds
http://bioinf.wehi.edu.au/MSigDB/v7.1/Mm.h.all.v7.1.entrez.rds

This material has been inspired by the following resources:
http://www.statsci.org/smyth/pubs/QLedgeRPreprint.pdf (Lun, Chen, and Smyth 2016)
http://monashbioinformaticsplatform.github.io/RNAseq-DE-analysis-with-R/RNAseq_DE_analysis_with_R.html

Bioconductor/R Packages

Packages used:

  • limma
  • edgeR
  • Glimma
  • org.Mm.eg.db
  • gplots
  • RColorBrewer

To install the packages you can:

  • Install the latest release of R. This version of the tutorial uses R 4.0.

  • Get the latest version of Bioconductor and packages by starting R and entering the commands:

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install(c("limma", "edgeR", "Glimma", "org.Mm.eg.db", "gplots", "RColorBrewer", "NMF", "BiasedUrn"))

Overview

  • Reading in table of counts
  • Adding annotation
  • Filtering lowly expressed genes
  • Quality control
  • Normalisation for composition bias
  • Differential expression analysis
  • Testing relative to a threshold
  • Visualisation
  • Gene set testing

Introduction and data import

Measuring gene expression on a genome-wide scale has become common practice over the last two decades or so, with microarrays predominantly used pre-2008. With the advent of next generation sequencing technology in 2008, an increasing number of scientists use this technology to measure and understand changes in gene expression in often complex systems. As sequencing costs have decreased, using RNA-Seq to simultaneously measure the expression of tens of thousands of genes for multiple samples has never been easier. The cost of these experiments has now moved from generating the data to storing and analysing it.

There are many steps involved in analysing an RNA-Seq experiment. Analysing an RNAseq experiment begins with sequencing reads. These are aligned to a reference genome, then the number of reads mapped to each gene can be counted. This results in a table of counts, which is what we perform statistical analyses on in R. While mapping and counting are important and necessary tasks, today we will be starting from the count data and getting stuck into analysis.

First, let’s load all the packages we will need to analyse the data.

library(edgeR)
library(limma)
library(Glimma)
library(org.Mm.eg.db)
library(gplots)
library(RColorBrewer)
library(NMF)

Mouse mammary gland dataset

The data for this tutorial comes from a Nature Cell Biology paper, EGF-mediated induction of Mcl-1 at the switch to lactation is essential for alveolar cell survival (Fu et al. 2015). Both the raw data (sequence reads) and processed data (counts) can be downloaded from Gene Expression Omnibus database (GEO) under accession number GSE60450.

This study examines the expression profiles of basal stem-cell enriched cells (B) and committed luminal cells (L) in the mammary gland of virgin, pregnant and lactating mice. Six groups are present, with one for each combination of cell type and mouse status. Each group contains two biological replicates. We will first use the counts file as a starting point for our analysis. This data has already been aligned to the mouse genome. The command line tool featureCounts (Liao, Smyth, and Shi 2014) was used to count reads mapped to mouse genes from Refseq annotation (see the paper for details).

Reading in the data

Set up an RStudio project specifying the directory where you have saved the /data directory. Download and read in the data.

# Read the data into R
seqdata <- read.delim("data/GSE60450_Lactation-GenewiseCounts.txt", stringsAsFactors = FALSE)
# Read the sample information into R
sampleinfo <- read.delim("data/SampleInfo.txt", stringsAsFactors = TRUE)

Let’s take a look at the data. You can use the head command to see the first 6 lines. The dim command will tell you how many rows and columns the data frame has.

head(seqdata)
  EntrezGeneID Length MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1
1       497097   3634                               438
2    100503874   3259                                 1
3    100038431   1634                                 0
4        19888   9747                                 1
5        20671   3130                               106
6        27395   4203                               309
  MCL1.DH_BC2CTUACXX_CAGATC_L002_R1 MCL1.DI_BC2CTUACXX_ACAGTG_L002_R1
1                               300                                65
2                                 0                                 1
3                                 0                                 0
4                                 1                                 0
5                               182                                82
6                               234                               337
  MCL1.DJ_BC2CTUACXX_CGATGT_L002_R1 MCL1.DK_BC2CTUACXX_TTAGGC_L002_R1
1                               237                               354
2                                 1                                 0
3                                 0                                 0
4                                 0                                 0
5                               105                                43
6                               300                               290
  MCL1.DL_BC2CTUACXX_ATCACG_L002_R1 MCL1.LA_BC2CTUACXX_GATCAG_L001_R1
1                               287                                 0
2                                 4                                 0
3                                 0                                 0
4                                 0                                10
5                                82                                16
6                               270                               560
  MCL1.LB_BC2CTUACXX_TGACCA_L001_R1 MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1
1                                 0                                 0
2                                 0                                 0
3                                 0                                 0
4                                 3                                10
5                                25                                18
6                               464                               489
  MCL1.LD_BC2CTUACXX_GGCTAC_L001_R1 MCL1.LE_BC2CTUACXX_TAGCTT_L001_R1
1                                 0                                 0
2                                 0                                 0
3                                 0                                 0
4                                 2                                 0
5                                 8                                 3
6                               328                               307
  MCL1.LF_BC2CTUACXX_CTTGTA_L001_R1
1                                 0
2                                 0
3                                 0
4                                 0
5                                10
6                               342
dim(seqdata)
[1] 27179    14

The seqdata object contains information about genes (one gene per row), the first column has the Entrez gene id, the second has the gene length and the remaining columns contain information about the number of reads aligning to the gene in each experimental sample. There are two replicates for each cell type and time point (detailed sample info can be found in file “GSE60450_series_matrix.txt” from the GEO website). The sampleinfo file contains basic information about the samples that we will need for the analysis today.

sampleinfo
                            FileName SampleName CellType   Status
1  MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1    MCL1.DG  luminal   virgin
2  MCL1.DH_BC2CTUACXX_CAGATC_L002_R1    MCL1.DH    basal   virgin
3  MCL1.DI_BC2CTUACXX_ACAGTG_L002_R1    MCL1.DI    basal pregnant
4  MCL1.DJ_BC2CTUACXX_CGATGT_L002_R1    MCL1.DJ    basal pregnant
5  MCL1.DK_BC2CTUACXX_TTAGGC_L002_R1    MCL1.DK    basal  lactate
6  MCL1.DL_BC2CTUACXX_ATCACG_L002_R1    MCL1.DL    basal  lactate
7  MCL1.LA_BC2CTUACXX_GATCAG_L001_R1    MCL1.LA    basal   virgin
8  MCL1.LB_BC2CTUACXX_TGACCA_L001_R1    MCL1.LB  luminal   virgin
9  MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1    MCL1.LC  luminal pregnant
10 MCL1.LD_BC2CTUACXX_GGCTAC_L001_R1    MCL1.LD  luminal pregnant
11 MCL1.LE_BC2CTUACXX_TAGCTT_L001_R1    MCL1.LE  luminal  lactate
12 MCL1.LF_BC2CTUACXX_CTTGTA_L001_R1    MCL1.LF  luminal  lactate

We will be manipulating and reformatting the counts matrix into a suitable format for downstream analysis. The first two columns in the seqdata dataframe contain annotation information. We need to make a new matrix containing only the counts, but we can store the gene identifiers (the EntrezGeneID column) as rownames. We will add more annotation information about each gene later on in the workshop.

Format the data

Let’s create a new data object, countdata, that contains only the counts for the 12 samples.

# Remove first two columns from seqdata
countdata <- seqdata[,-(1:2)]
# Look at the output
head(countdata)
  MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1 MCL1.DH_BC2CTUACXX_CAGATC_L002_R1
1                               438                               300
2                                 1                                 0
3                                 0                                 0
4                                 1                                 1
5                               106                               182
6                               309                               234
  MCL1.DI_BC2CTUACXX_ACAGTG_L002_R1 MCL1.DJ_BC2CTUACXX_CGATGT_L002_R1
1                                65                               237
2                                 1                                 1
3                                 0                                 0
4                                 0                                 0
5                                82                               105
6                               337                               300
  MCL1.DK_BC2CTUACXX_TTAGGC_L002_R1 MCL1.DL_BC2CTUACXX_ATCACG_L002_R1
1                               354                               287
2                                 0                                 4
3                                 0                                 0
4                                 0                                 0
5                                43                                82
6                               290                               270
  MCL1.LA_BC2CTUACXX_GATCAG_L001_R1 MCL1.LB_BC2CTUACXX_TGACCA_L001_R1
1                                 0                                 0
2                                 0                                 0
3                                 0                                 0
4                                10                                 3
5                                16                                25
6                               560                               464
  MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1 MCL1.LD_BC2CTUACXX_GGCTAC_L001_R1
1                                 0                                 0
2                                 0                                 0
3                                 0                                 0
4                                10                                 2
5                                18                                 8
6                               489                               328
  MCL1.LE_BC2CTUACXX_TAGCTT_L001_R1 MCL1.LF_BC2CTUACXX_CTTGTA_L001_R1
1                                 0                                 0
2                                 0                                 0
3                                 0                                 0
4                                 0                                 0
5                                 3                                10
6                               307                               342
# Store EntrezGeneID as rownames
rownames(countdata) <- seqdata[,1]

Take a look at the output

head(countdata)
          MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1 MCL1.DH_BC2CTUACXX_CAGATC_L002_R1
497097                                  438                               300
100503874                                 1                                 0
100038431                                 0                                 0
19888                                     1                                 1
20671                                   106                               182
27395                                   309                               234
          MCL1.DI_BC2CTUACXX_ACAGTG_L002_R1 MCL1.DJ_BC2CTUACXX_CGATGT_L002_R1
497097                                   65                               237
100503874                                 1                                 1
100038431                                 0                                 0
19888                                     0                                 0
20671                                    82                               105
27395                                   337                               300
          MCL1.DK_BC2CTUACXX_TTAGGC_L002_R1 MCL1.DL_BC2CTUACXX_ATCACG_L002_R1
497097                                  354                               287
100503874                                 0                                 4
100038431                                 0                                 0
19888                                     0                                 0
20671                                    43                                82
27395                                   290                               270
          MCL1.LA_BC2CTUACXX_GATCAG_L001_R1 MCL1.LB_BC2CTUACXX_TGACCA_L001_R1
497097                                    0                                 0
100503874                                 0                                 0
100038431                                 0                                 0
19888                                    10                                 3
20671                                    16                                25
27395                                   560                               464
          MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1 MCL1.LD_BC2CTUACXX_GGCTAC_L001_R1
497097                                    0                                 0
100503874                                 0                                 0
100038431                                 0                                 0
19888                                    10                                 2
20671                                    18                                 8
27395                                   489                               328
          MCL1.LE_BC2CTUACXX_TAGCTT_L001_R1 MCL1.LF_BC2CTUACXX_CTTGTA_L001_R1
497097                                    0                                 0
100503874                                 0                                 0
100038431                                 0                                 0
19888                                     0                                 0
20671                                     3                                10
27395                                   307                               342

Now take a look at the column names

colnames(countdata)
 [1] "MCL1.DG_BC2CTUACXX_ACTTGA_L002_R1" "MCL1.DH_BC2CTUACXX_CAGATC_L002_R1"
 [3] "MCL1.DI_BC2CTUACXX_ACAGTG_L002_R1" "MCL1.DJ_BC2CTUACXX_CGATGT_L002_R1"
 [5] "MCL1.DK_BC2CTUACXX_TTAGGC_L002_R1" "MCL1.DL_BC2CTUACXX_ATCACG_L002_R1"
 [7] "MCL1.LA_BC2CTUACXX_GATCAG_L001_R1" "MCL1.LB_BC2CTUACXX_TGACCA_L001_R1"
 [9] "MCL1.LC_BC2CTUACXX_GCCAAT_L001_R1" "MCL1.LD_BC2CTUACXX_GGCTAC_L001_R1"
[11] "MCL1.LE_BC2CTUACXX_TAGCTT_L001_R1" "MCL1.LF_BC2CTUACXX_CTTGTA_L001_R1"

These are the sample names which are pretty long so we’ll shorten these to contain only the relevant information about each sample. We will use the substr command to extract the first 7 characters and use these as the colnames.

# using substr, you extract the characters starting at position 1 and stopping at position 7 of the colnames
colnames(countdata) <- substr(colnames(countdata),start=1,stop=7)

Take a look at the output.

head(countdata)
          MCL1.DG MCL1.DH MCL1.DI MCL1.DJ MCL1.DK MCL1.DL MCL1.LA MCL1.LB
497097        438     300      65     237     354     287       0       0
100503874       1       0       1       1       0       4       0       0
100038431       0       0       0       0       0       0       0       0
19888           1       1       0       0       0       0      10       3
20671         106     182      82     105      43      82      16      25
27395         309     234     337     300     290     270     560     464
          MCL1.LC MCL1.LD MCL1.LE MCL1.LF
497097          0       0       0       0
100503874       0       0       0       0
100038431       0       0       0       0
19888          10       2       0       0
20671          18       8       3      10
27395         489     328     307     342

Note that the column names are now the same as SampleName in the sampleinfo file. This is good because it means our sample information in sampleinfo is in the same order as the columns in countdata.

table(colnames(countdata)==sampleinfo$SampleName)

TRUE 
  12 

Convert counts to DGEList object

Next we’ll create a DGEList object. This is an object used by edgeR to store count data. It has a number of slots for storing various parameters about the data.

y <- DGEList(countdata)
# have a look at y
y
An object of class "DGEList"
$counts
          MCL1.DG MCL1.DH MCL1.DI MCL1.DJ MCL1.DK MCL1.DL MCL1.LA MCL1.LB
497097        438     300      65     237     354     287       0       0
100503874       1       0       1       1       0       4       0       0
100038431       0       0       0       0       0       0       0       0
19888           1       1       0       0       0       0      10       3
20671         106     182      82     105      43      82      16      25
          MCL1.LC MCL1.LD MCL1.LE MCL1.LF
497097          0       0       0       0
100503874       0       0       0       0
100038431       0       0       0       0
19888          10       2       0       0
20671          18       8       3      10
27174 more rows ...

$samples
        group lib.size norm.factors
MCL1.DG     1 23227641            1
MCL1.DH     1 21777891            1
MCL1.DI     1 24100765            1
MCL1.DJ     1 22665371            1
MCL1.DK     1 21529331            1
7 more rows ...
# See what slots are stored in y
names(y)
[1] "counts"  "samples"
# Library size information is stored in the samples slot
y$samples
        group lib.size norm.factors
MCL1.DG     1 23227641            1
MCL1.DH     1 21777891            1
MCL1.DI     1 24100765            1
MCL1.DJ     1 22665371            1
MCL1.DK     1 21529331            1
MCL1.DL     1 20015386            1
MCL1.LA     1 20392113            1
MCL1.LB     1 21708152            1
MCL1.LC     1 22241607            1
MCL1.LD     1 21988240            1
MCL1.LE     1 24723827            1
MCL1.LF     1 24657293            1

We can also store the groups for the samples in the DGEList object.

group <- paste(sampleinfo$CellType,sampleinfo$Status,sep=".")
# Take a look
group
 [1] "luminal.virgin"   "basal.virgin"     "basal.pregnant"   "basal.pregnant"  
 [5] "basal.lactate"    "basal.lactate"    "basal.virgin"     "luminal.virgin"  
 [9] "luminal.pregnant" "luminal.pregnant" "luminal.lactate"  "luminal.lactate" 
# Convert to factor
group <- factor(group)
# Take another look.
group
 [1] luminal.virgin   basal.virgin     basal.pregnant   basal.pregnant  
 [5] basal.lactate    basal.lactate    basal.virgin     luminal.virgin  
 [9] luminal.pregnant luminal.pregnant luminal.lactate  luminal.lactate 
6 Levels: basal.lactate basal.pregnant basal.virgin ... luminal.virgin
# Add the group information into the DGEList
y$samples$group <- group
y$samples
                   group lib.size norm.factors
MCL1.DG   luminal.virgin 23227641            1
MCL1.DH     basal.virgin 21777891            1
MCL1.DI   basal.pregnant 24100765            1
MCL1.DJ   basal.pregnant 22665371            1
MCL1.DK    basal.lactate 21529331            1
MCL1.DL    basal.lactate 20015386            1
MCL1.LA     basal.virgin 20392113            1
MCL1.LB   luminal.virgin 21708152            1
MCL1.LC luminal.pregnant 22241607            1
MCL1.LD luminal.pregnant 21988240            1
MCL1.LE  luminal.lactate 24723827            1
MCL1.LF  luminal.lactate 24657293            1

Adding annotation

The only annotation we can see is the Entrez Gene ID, which is not very informative. We would like to add some annotation information. There are a number of ways to do this. We will demonstrate how to do this using the org.Mm.eg.db package

First we need to decide what information we want. In order to see what we can extract we can run the columns function on the annotation database.

columns(org.Mm.eg.db)
 [1] "ACCNUM"       "ALIAS"        "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS"
 [6] "ENTREZID"     "ENZYME"       "EVIDENCE"     "EVIDENCEALL"  "GENENAME"    
[11] "GO"           "GOALL"        "IPI"          "MGI"          "ONTOLOGY"    
[16] "ONTOLOGYALL"  "PATH"         "PFAM"         "PMID"         "PROSITE"     
[21] "REFSEQ"       "SYMBOL"       "UNIGENE"      "UNIPROT"     

We definitely want gene symbols and perhaps the full gene name. Let’s build up our annotation information in a separate data frame using the select function.

ann <- select(org.Mm.eg.db,keys=rownames(y$counts),columns=c("ENTREZID","SYMBOL","GENENAME"))
'select()' returned 1:1 mapping between keys and columns
# Have a look at the annotation
head(ann)
   ENTREZID  SYMBOL                              GENENAME
1    497097    Xkr4     X-linked Kx blood group related 4
2 100503874 Gm19938                 predicted gene, 19938
3 100038431 Gm10568                  predicted gene 10568
4     19888     Rp1        retinitis pigmentosa 1 (human)
5     20671   Sox17 SRY (sex determining region Y)-box 17
6     27395  Mrpl15   mitochondrial ribosomal protein L15

Let’s double check that the ENTREZID column matches exactly to our y$counts rownames.

table(ann$ENTREZID==rownames(y$counts))

 TRUE 
27179 

We can slot in the annotation information into the genes slot of y. (Please note that if the select function returns a 1:many mapping then you can’t just append the annotation to the y object. An alternative way to get annotation will be discussed during the analysis of the second dataset.)

y$genes <- ann

Filtering lowly expressed genes

Genes with very low counts across all libraries provide little evidence for differential expression and they interfere with some of the statistical approximations that are used later in the pipeline. They also add to the multiple testing burden when estimating false discovery rates, reducing power to detect differentially expressed genes. These genes should be filtered out prior to further analysis.

There are a few ways to filter out lowly expressed genes. When there are biological replicates in each group, in this case we have a sample size of 2 in each group, we favour filtering on a minimum counts per million threshold present in at least 2 samples. Two represents the smallest sample size for each group in our experiment. In this dataset, we choose to retain genes if they are expressed at a counts-per-million (CPM) above 0.5 in at least two samples.

We’ll use the cpm function from the edgeR library (M. D. Robinson, McCarthy, and Smyth 2010) to generate the CPM values and then filter. Note that by converting to CPMs we are normalising for the different sequencing depths for each sample.

# Obtain CPMs
myCPM <- cpm(countdata)
# Have a look at the output
head(myCPM)
              MCL1.DG     MCL1.DH     MCL1.DI     MCL1.DJ   MCL1.DK    MCL1.DL
497097    18.85684388 13.77543859  2.69700983 10.45648006 16.442685 14.3389690
100503874  0.04305215  0.00000000  0.04149246  0.04412017  0.000000  0.1998463
100038431  0.00000000  0.00000000  0.00000000  0.00000000  0.000000  0.0000000
19888      0.04305215  0.04591813  0.00000000  0.00000000  0.000000  0.0000000
20671      4.56352843  8.35709941  3.40238163  4.63261775  1.997275  4.0968483
27395     13.30311589 10.74484210 13.98295863 13.23605071 13.469996 13.4896224
             MCL1.LA    MCL1.LB    MCL1.LC     MCL1.LD    MCL1.LE    MCL1.LF
497097     0.0000000  0.0000000  0.0000000  0.00000000  0.0000000  0.0000000
100503874  0.0000000  0.0000000  0.0000000  0.00000000  0.0000000  0.0000000
100038431  0.0000000  0.0000000  0.0000000  0.00000000  0.0000000  0.0000000
19888      0.4903857  0.1381969  0.4496078  0.09095771  0.0000000  0.0000000
20671      0.7846171  1.1516411  0.8092940  0.36383085  0.1213404  0.4055595
27395     27.4615975 21.3744588 21.9858214 14.91706476 12.4171715 13.8701357
# Which values in myCPM are greater than 0.5?
thresh <- myCPM > 0.5
# This produces a logical matrix with TRUEs and FALSEs
head(thresh)
          MCL1.DG MCL1.DH MCL1.DI MCL1.DJ MCL1.DK MCL1.DL MCL1.LA MCL1.LB
497097       TRUE    TRUE    TRUE    TRUE    TRUE    TRUE   FALSE   FALSE
100503874   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
100038431   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
19888       FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE   FALSE
20671        TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE
27395        TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE    TRUE
          MCL1.LC MCL1.LD MCL1.LE MCL1.LF
497097      FALSE   FALSE   FALSE   FALSE
100503874   FALSE   FALSE   FALSE   FALSE
100038431   FALSE   FALSE   FALSE   FALSE
19888       FALSE   FALSE   FALSE   FALSE
20671        TRUE   FALSE   FALSE   FALSE
27395        TRUE    TRUE    TRUE    TRUE
# Summary of how many TRUEs there are in each row
# There are 11433 genes that have TRUEs in all 12 samples.
table(rowSums(thresh))

    0     1     2     3     4     5     6     7     8     9    10    11    12 
10857   518   544   307   346   307   652   323   547   343   579   423 11433 
# we would like to keep genes that have at least 2 TRUES in each row of thresh
keep <- rowSums(thresh) >= 2
summary(keep)
   Mode   FALSE    TRUE 
logical   11375   15804 

A CPM of 0.5 is used as it corresponds to a count of 10-15 for the library sizes in this data set. If the count is any smaller, it is considered to be very low, indicating that the associated gene is not expressed in that sample. A requirement for expression in two or more libraries is used as each group contains two replicates. This ensures that a gene will be retained if it is only expressed in one group. Smaller CPM thresholds are usually appropriate for larger libraries. As a general rule, a good threshold can be chosen by identifying the CPM that corresponds to a count of 10, which in this case is about 0.5. You should filter with CPMs rather than filtering on the counts directly, as the latter does not account for differences in library sizes between samples.

# Let's have a look and see whether our threshold of 0.5 does indeed correspond to a count of about 10-15
# We will look at the first sample
plot(myCPM[,1],countdata[,1])

# Let us limit the x and y-axis so we can actually look to see what is happening at the smaller counts
plot(myCPM[,1],countdata[,1],ylim=c(0,50),xlim=c(0,3))
# Add a vertical line at 0.5 CPM
abline(v=0.5)