Find us on GitHub

RNAseq analysis in R

Jul 6-7, 2017

9:00 am - 5:00 pm

Instructors: Belinda Phipson, Jovana Maksimovic

Helpers: Zac Gerring

COMBINE


Sponsors:

AGTA    

General Information

In this workshop, you will be learning how to analyse RNA-seq count data, using R. This will include reading the data into R, quality control and performing differential expression analysis and gene set testing, with a focus on the limma-voom analysis workflow. You will learn how to generate common plots for analysis and visualisation of gene expression data, such as boxplots and heatmaps. This workshop is aimed at biologists interested in learning how to perform differential expression analysis of RNA-seq data when reference genomes are available.

Who: The course is aimed at graduate students and other researchers. Some basic R knowledge is assumed. This is not an introduction to R course. If you are not familiar with the R statistical programming language we strongly encourage you to work through an introductory R course before you attend this workshop. We recommend the Software Carpentry R for Reproducible Scientific Analysis lessons up to and including vectorisation (topic 9).

Where: Advanced Engineering Building 49, room 313A, University of Queensland, Brisbane. Get directions with OpenStreetMap or Google Maps.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Contact: Please mail combine@combine.org.au for more information.

Registration: register here. Registration is $45 for students, and $55 for non-students.


Schedule

Note: this is a preliminary schedule. There may be changes to the timing and content.

If you have any trouble installing the software or packages, please arrive at 9am on the first day so we can help before the workshop starts.

Day 1

09:00 Installation and set up
09:30 Introduction to RNA-seq theory
10:15 R for RNAseq
11:00 Break
11:30 Quality control and visualisation
12:30 Lunch
13:30 Quality control and visualisation continued
15:00 Break
15:30 Quality control and visualisation continued
17:00 Wrap-up

Day 2

09:00 Differential expression theory
09:45 Differential expression analysis
11:00 Break
11:30 Differential expression analysis continued
12:30 Lunch
13:30 Differential expression analysis continued
15:00 Break
15:30 Gene set testing
17:00 Wrap-up

Etherpad: http://pad.software-carpentry.org/2017-07-07-RNAseq.
We will use this Etherpad for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

You can find all of the lesson notes for this workshop here.

Data

Please download these data files and save them to your laptop before the workshop:

Setup

To participate in this workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

Even if you already have R installed, it is important that you have the latest version because some of the packages we will be using will not work with earlier versions of R. The latest version (available through the links below) is 3.4.0. For Mac 10.6-10.8 the latest version is 3.2.1 (available from here) and for that version you will also need to download this gfortran package gfortran package).

Software Carpentry maintains a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows

Video Tutorial

Install R by downloading and running this .exe file from CRAN.

Also, please install the RStudio IDE.

Mac OS X

Video Tutorial

Install R by downloading and running this .pkg file from CRAN.

Also, please install the RStudio IDE.

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R).

Also, please install the RStudio IDE.

R packages

You will also need to install the following R packages: limma, edgeR, gplots, org.Mm.eg.db, org.Dm.eg.db, RColorBrewer, GO.db, BiasedUrn, DESeq2, Glimma.

If you are bringing your own data to analyse on Day 2, you should also install the annotation package for your species. To find the name of the package you will need, look on the Bioconductor website (Search for "org." to see the organism packages).

These can all be obtained from Bioconductor.

Open RStudio and run the following commands to install packages from Bioconductor:

> source("http://bioconductor.org/biocLite.R")

> biocLite("limma")

Repeat this for each package.

> biocLite("edgeR")

> biocLite("Glimma")

Common problems:

  • If the source command doesn't work, try https instead of http.
  • Make sure you are directly connected to the internet and not using a proxy.
  • Make sure the package name is in quotes.
If you have any trouble please see the Instructions for installing Bioconductor packages. You can also arrive early on the first day of the workshop for help with set up.