Informatics on High-Throughput Sequencing Data 2016

Workshop pages for students

Course Schedule

Schedule for June 9 to June 10, 2016

Workshop Q/A Forum

Post your workshop questions here!

Workshop Survey

We value your feedback. Please fill out our survey to help us make our workshops better.

Class Photo

Class photo

Laptop Setup Instructions

Instructions to setup your laptop can be found here.

Pre-workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

2) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/

3) IGV Tutorial: Review how to use IGV Genome Browser if you have not used this tool before.

Pre-workshop Readings

Before coming to the workshop, read these.

Logging into the Amazon Cloud

Instructions can be found here.

  • We have set up 30 instances on the Amazon cloud - one for each student. In order to log in to your instance, you will need a security certificate. If you plan on using Linux or Mac OS X, please download this certificate. Otherwise if you plan on using Windows (with Putty and Winscp), please download this certificate.

YouTube Playlist for Recorded Lectures

Recorded Lectures’ Playlist

Day 1


Ann Meyer

Module 1: Introduction to HT-sequencing and Cloud Computing

Zhibin Lu


Recorded Lecture

Module 2: Genome Alignment

Mathieu Bourgey


Recorded Lecture

Lab practical


Additional Resources:

Module 3: Genome Visualization

Florence Cavalli


Recorded Lecture

Lab practical

Module 4: De Novo Assembly

Jared Simpson


Recorded Lecture

Integrated Assignment

Florence Cavalli

We will perform the same analysis as in Module 2 but using the mother and father samples i.e sample NA12891 and NA12891.

Files are in the following directory of the cloud instance: ~/CourseData/HT_data/Module2/

 * raw_reads/NA12891_CBW_chr1_R1.fastq.gz
 * raw_reads/NA12891_CBW_chr1_R2.fastq.gz
 * raw_reads/NA12892_CBW_chr1_R1.fastq.gz
 * raw_reads/NA12892_CBW_chr1_R2.fastq.gz
#set up
export ROOT_DIR=~/workspace/Integrated_assignment
export TRIMMOMATIC_JAR=$ROOT_DIR/tools/Trimmomatic-0.36/trimmomatic-0.36.jar
export PICARD_JAR=$ROOT_DIR/tools/picard-tools-1.141/picard.jar
export GATK_JAR=$ROOT_DIR/tools/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar
export BVATOOLS_JAR=$ROOT_DIR/tools/bvatools-1.6/bvatools-1.6-full.jar
export REF=$ROOT_DIR/reference/

# Create a directory to work in (workspace/Integrated_assignment)
# this is where we'll place all of our output files

mkdir -p $ROOT_DIR

# Erase any files that might already be there</b>
 rm *
# Create symbolic links for all of the files contained in the Module2 directory
# this includes the hg19 genome and the FASTQ files
ln -s ~/CourseData/HT_data/Module2/* .

Task list:

  1. Check read QC

  2. Trim unreliable bases from the read ends

  3. Align the reads to the reference

  4. Sort the alignments by chromosome position

  5. Realign short indels

  6. Fixe mate issues

  7. Recalibrate the Base Quality

  8. Generate alignment metrics


  1. Explain the purpose of each step

  2. Which software tool can be used for each step

Integrated Assignment script

Day 2

Module 5: Genome Variation

Guillaume Bourque


Recorded Lecture

Lab practical

VCF format

Pro-tip: A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.

Pro-tip 2: Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.


Module 6: Genome Structural Variation

Guillaume Bourque


Recorded Lecture

Lab practical


Module 7: Bringing it Together with Galaxy

David Morais


Recorded Lecture

Lab practical

Data set:





Note: ## is your student number.

Galaxy workflow part 1 (cloud):

Galaxy workflow part 2 (main instance):

What you need for the lab:

Galaxy Resources:

Data for the Workshop

Tool Installation

Instructions for installing the tools used in the workshops can be found here.

Data Sets

Results from Instructor’s Instance on Amazon

View on GitHub