Microarray Data Analysis 2013

Workshop pages for students

Laptop Setup Instructions

Instructions for setting up your laptop can be found here: Laptop Setup Instructions

Pre-Workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

2) UNIX Preparation tutorials:

R teaching material for Sunday June 23

Pre-Workshop Readings

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements
Microarray data analysis: from disarray to consolidation and consensus

R Review Session

R review Module
R review Scripts

Data Sets:

  • .zip CEL files containing:

    • GSM429557_ST486_Fox1.CEL
    • GSM429558_ST486_Fox2.CEL
    • GSM429559_ST486_Fox3.CEL
    • GSM429560_ST486_MYB1.CEL
    • GSM429561_ST486_MYB2.CEL
    • GSM429562_ST486_MYB3.CEL
    • GSM429563_ST486_NT1.CEL
    • GSM429564_ST486_NT2.CEL
    • GSM429565_ST486_NT3.CEL
  • Phenodata.txt

Day 1


*Faculty: Michelle Brazas*

Module 1: Introduction to Microarrays and R

*Faculty: Paul Boutros*


Module 1 pdf
Module 1 ppt
Module 1 mp4

Lab Practical:

Modules 1-3 Lab questions

Module 2: Quality Control of Microarrays

*Faculty: Paul Boutros*


Module 2 pdf‎
Module 2‎ ppt
Module 2‎ mp4

Lab Practical:

Modules 1-3 Lab questions
Day 1 analysis script

Integrated Assignment

*Faculty: Nicholas Harding*

Note: You will have to create your own phenotype data .txt file, using the sample annotations in the links.

phenotypedata.txt Many people had issues with creating the phenotype data file. The phenotype data must be: - TAB delimited - Must contain a header, the header has one fewer column than the other rows. The header also contains a preceding tab. This is because the first column, i.e. the file names are read in as rownames. For differences between the rownames of a data frame and a column, check the dataframe documentation. - Beware of spaces- as the file is tab delimited, any trailing/leading spaces will be incorporated into the cells. Be careful, as ‘Control ‘ is not the same as ‘Control’. Hint: some text editors have options that displays whitespace characters.

remember you can check your file has been read in correctly using the:


function, which returns your phenotype annotation as a dataframe.

For further help see


This points you to another function that loads the dataframe, and tells you exactly what it is expecting.


You can troubleshoot any problems with your phenotype data using this function directly.

Integrated Assignment Data

For annotations:

Note: The 11 samples above are the same as in the link below. Only this time they are part of a larger set. Use the link below to prepare PhenoData file:

For CDF file: Download for alternative-CDF package from: http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/17.1.0/entrezg.asp


  • .zip file of rat CEL files containing:
    • GSM273072.CEL
    • GSM273073.CEL
    • GSM273074.CEL
    • GSM273075.CEL
    • GSM273076.CEL
    • GSM273077.CEL
    • GSM273078.CEL
    • GSM273079.CEL


  • .zip file of mouse CEL files containing:

    • GSM254871.CEL
    • GSM254872.CEL
    • GSM254873.CEL
    • GSM254877.CEL
    • GSM254878.CEL
    • GSM254879.CEL
    • GSM254880.CEL
    • GSM254881.CEL
    • GSM254882.CEL
    • GSM254883.CEL
    • GSM254885.CEL

Day 2

Module 3: Statistical Analysis

*Faculty: Paul Boutros*


Module 3‎ pdf
Module 3‎ ppt
Module 3‎ mp4

Clustering Slides‎

Lab Practical: Modules 1-3 Lab questions
Status of R script at 11:55am
Status of R script at 12:33pm
Status of R script at 4:24pm
R script with MAS5

Module 4: Beyond the Microarray Experiment

*Faculty: Paul Boutros*


Module 4 pdf
Module 4 ppt
Module 4 mp4

Other (more advanced) resources


More detailed introduction to R. Not a basic tutorial, this is for people who really want to know more about R.



1) “Introductory Statistics with R” by Peter Dalgaard. It is not required for this workshop but if you are interested in buying a good book and/or want to know more, you might want to consider getting a copy.

Section 1-5 give a very good (perhaps very detailed) idea of what I plan to discuss during the workshop.

2) Statistics for Biology and Health by Robert Gentleman, Vincent Carey, Wolfgang Huber, Rafael Irizarry and Sandrine Dudoit

3) Building Bioinformatics Solutions with Perl, R and MySQL by Conrad Bessant, Ian Shadforth and Darren Oakley

View on GitHub