logo

Exploratory Analysis of Biological Data Using R 2015

Workshop pages for students


Laptop Setup Instructions

Instructions for setting up your laptop can be found here: Laptop Setup Instructions


Difference Between R and R Studio

RStudio doesn’t know where libraries are installed, when they are not installed through the RStudio package manager. To tell RStudio the location, you can define the path in a startup file. Create a file called .Renviron . Inside there:

R_LIBS=<R Library Path of other installed packages>

That was the problem when students installed things in R Studio at the command line using the R command install.package().

… or you could use the package manger to install libraries.

Syntax highlighting

… of scripts in the R editor does not seem to work under Windows. If you want highlighted syntax, use RStudio instead. Or (seriously), get a Mac.

Pre-Workshop Tutorials

1) R Preparation tutorials: You need to be familiar with the material covered in the Introduction to R tutorial, below. The tutorial should be very accessible even if you have never used R before.


Day 1


Welcome

*Faculty: Michelle Brazas*

Data Sets for Workshop Modules 1 – 5

Table_S3.csv
GvHD.txt
hiv.raw.data.24h.txt
logcho_237_4class.txt
GSE26922.dat
TsneRef.dat


Module 1: Exploratory Data Analysis

*Faculty: Boris Steipe*

Lecture:

Stats2015_Module1.pdf
Stats2015_Module1.ppt
Stats2015_Module1.mp4

Scripts:

Code Additions to Script

Resources:

Helpful links



Module 2: Regression Analysis

*Faculty: Boris Steipe*

Lecture:

Stats2015_Module2.pdf
Stats2015_Module2.ppt
Stats2015_Module2.mp4

Scripts:

Links:


Module 3: Dimension Reduction

*Faculty: Boris Steipe*

Lecture:

Stats2015_Module3.pdf
Stats2015_Module3.ppt
Stats2015_Module3.mp4

Scripts:

2015_EDA_Module_3_DimensionReduction.R


Integrated Assignment

*Faculty: Catalina Anghel and David Shih*

Part 1:

Assignment Part 1:

Stats2015_IntegratedAssignment_Part1.pdf

Questions in R - Part 1:

Stats2015_IntegratedAssignment_Part1_Questions.R

Answer Key - Part 1:

Stats2015_IntegratedAssignment_Part1_AnswerKey.R

Data Set:

R package: CCLE_0.1.1.tar.gz (RStudio users on any platform), CCLE_0.1.1.zip (non-RStudio users on Windows) (Note: Please right-click and select “Save link as…” or “Save target as…”)

For your reference:

Preprocessing scripts: CCLE_preprocess.zip
Stats2015_IntegratedAssignment_Heatmap.R


Day 2


Module 4: Clustering Analysis

*Faculty: Boris Steipe*

Lecture:

Stats2015_Module4.pdf
Stats2015_Module4.ppt
Stats2015_Module4.mp4

Scripts:

Links:

Dataset:

If you load using this file:

load("gset.RData")

on the command line. (Check that ‘gset’ is actually lower case in the folder. You might need a capital letter at the start.)

load("platf.RData")`

R object file: GSE26922.rds
Read with:

   gset <- readRDS("GSE26922.rds")
   # do not run the following line:
   gset <- gset [ [ idx ] ]

Module 5: Hypothesis Testing for EDA

*Faculty: Boris Steipe*

Lecture:

Stats2015_Module5.pdf
Stats2015_Module5.ppt
Stats2015_Module5.mp4

Script

2015_EDA_Module_5_HypothesisTesting_Corrected.R

Links:


Integrated Assignment - Part 2

*Faculty: Catalina Anghel and David Shih*

Assignment Part 2:

Stats2015_IntegratedAssignment_Part2.pdf

Questions in R - Part 2:

Stats2015_IntegratedAssignment_Part2_Questions.R

Answer Key - Part 2:

Stats2015_IntegratedAssignment_Part2_AnswerKey.R



Other Readings

Other (more advanced) resources:

Manuals:

More detailed introduction to R. Not a basic tutorial, this is for people who really want to know more about R.

http://cran.r-project.org/doc/manuals/R-intro.html

Books:

1) “Introductory Statistics with R” by Peter Dalgaard. It is not required for this workshop but if you are interested in buying a good book and/or want to know more, you might want to consider getting a copy. The UofT library has an online version.

2) Statistics for Biology and Health by Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry and Sandrine Dudoit

3) Building Bioinformatics Solutions with Perl, R and MySQL by Conrad Bessant, Ian Shadforth and Darren Oakley

Post-workshop Readings

View on GitHub