logo

Informatics on High-Throughput Sequencing Data 2016

Workshop pages for students



Course Schedule

Schedule for June 9 to June 10, 2016

Workshop Q/A Forum

Post your workshop questions here!

Workshop Survey

We value your feedback. Please fill out our survey to help us make our workshops better.

Class Photo

Class photo

Laptop Setup Instructions

Instructions to setup your laptop can be found here.

Pre-workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

2) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/

3) IGV Tutorial: Review how to use IGV Genome Browser if you have not used this tool before.

Pre-workshop Readings

Before coming to the workshop, read these.

Logging into the Amazon Cloud

Instructions can be found here.

  • We have set up 30 instances on the Amazon cloud - one for each student. In order to log in to your instance, you will need a security certificate. If you plan on using Linux or Mac OS X, please download this certificate. Otherwise if you plan on using Windows (with Putty and Winscp), please download this certificate.

YouTube Playlist for Recorded Lectures

Recorded Lectures’ Playlist


Day 1

Welcome

Ann Meyer


Module 1: Introduction to HT-sequencing and Cloud Computing

Zhibin Lu

Lecture

Recorded Lecture


Module 2: Genome Alignment

Mathieu Bourgey

Lecture

Recorded Lecture

Lab practical

Programs:

Additional Resources:


Module 3: Genome Visualization

Florence Cavalli

Lecture

Recorded Lecture

Lab practical


Module 4: De Novo Assembly

Jared Simpson

Lecture

Recorded Lecture


Integrated Assignment

Florence Cavalli

We will perform the same analysis as in Module 2 but using the mother and father samples i.e sample NA12891 and NA12891.

Files are in the following directory of the cloud instance: ~/CourseData/HT_data/Module2/

 * raw_reads/NA12891_CBW_chr1_R1.fastq.gz
 * raw_reads/NA12891_CBW_chr1_R2.fastq.gz
 * raw_reads/NA12892_CBW_chr1_R1.fastq.gz
 * raw_reads/NA12892_CBW_chr1_R2.fastq.gz
#set up
export ROOT_DIR=~/workspace/Integrated_assignment
export TRIMMOMATIC_JAR=$ROOT_DIR/tools/Trimmomatic-0.36/trimmomatic-0.36.jar
export PICARD_JAR=$ROOT_DIR/tools/picard-tools-1.141/picard.jar
export GATK_JAR=$ROOT_DIR/tools/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar
export BVATOOLS_JAR=$ROOT_DIR/tools/bvatools-1.6/bvatools-1.6-full.jar
export REF=$ROOT_DIR/reference/

# Create a directory to work in (workspace/Integrated_assignment)
# this is where we'll place all of our output files

mkdir -p $ROOT_DIR
cd $ROOT_DIR

# Erase any files that might already be there</b>
 rm *
 
# Create symbolic links for all of the files contained in the Module2 directory
# this includes the hg19 genome and the FASTQ files
ln -s ~/CourseData/HT_data/Module2/* .
ls

Task list:

  1. Check read QC

  2. Trim unreliable bases from the read ends

  3. Align the reads to the reference

  4. Sort the alignments by chromosome position

  5. Realign short indels

  6. Fixe mate issues

  7. Recalibrate the Base Quality

  8. Generate alignment metrics

Discussion/Questions:

  1. Explain the purpose of each step

  2. Which software tool can be used for each step

Integrated Assignment script


Day 2

Module 5: Genome Variation

Guillaume Bourque

Lecture

Recorded Lecture

Lab practical

VCF format

Pro-tip: A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.

Pro-tip 2: Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.

Programs:


Module 6: Genome Structural Variation

Guillaume Bourque

Lecture

Recorded Lecture

Lab practical

Programs:


Module 7: Bringing it Together with Galaxy

David Morais

Lecture

Recorded Lecture

Lab practical

Data set:

NA12878_CBW_chr1_R1.fastq.gz
http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R1.fastq.gz

NA12878_CBW_chr1_R2.fastq.gz
http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R2.fastq.gz

hg19_chr1.fa
http://cbw##.dyndns.info/Module7/hg19_chr1.fa

dbSNP_135_chr1.vcf.gz
http://cbw##.dyndns.info/HTSeq_module2/reference/dbSNP_135_chr1.vcf.gz

Note: ## is your student number.

Galaxy workflow part 1 (cloud):

Galaxy workflow part 2 (main instance):

What you need for the lab:

Galaxy Resources: * Galaxy home page * Galaxy public server * Source for installing local Galaxy * Galaxy in the Cloud


Data for the Workshop

Tool Installation

Instructions for installing the tools used in the workshops can be found here.

Data Sets

Results from Instructor’s Instance on Amazon


View on GitHub