Course Schedule
Schedule for June 9 to June 10, 2016
Workshop Q/A Forum
Post your workshop questions here!
Workshop Survey
We value your feedback. Please fill out our survey to help us make our workshops better.
Class Photo
Laptop Setup Instructions
Instructions to setup your laptop can be found here.
Pre-workshop Tutorials
1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.
2) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/
3) IGV Tutorial: Review how to use IGV Genome Browser if you have not used this tool before.
Pre-workshop Readings
Before coming to the workshop, read these.
-
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy
-
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
-
A survey of sequence alignment algorithms for next-generation sequencing
-
Genotype and SNP calling from next-generation sequencing data
Logging into the Amazon Cloud
Instructions can be found here.
- We have set up 30 instances on the Amazon cloud - one for each student. In order to log in to your instance, you will need a security certificate. If you plan on using Linux or Mac OS X, please download this certificate. Otherwise if you plan on using Windows (with Putty and Winscp), please download this certificate.
YouTube Playlist for Recorded Lectures
Day 1
Welcome
Ann Meyer
Module 1: Introduction to HT-sequencing and Cloud Computing
Zhibin Lu
Module 2: Genome Alignment
Mathieu Bourgey
Programs:
Additional Resources:
- SEQanswers bioinformatics forum
- SAM/BAM file format specification
- Base qualities vs mapping qualities
- The decoy genome
- FastQC Good/Bad Examples
Module 3: Genome Visualization
Florence Cavalli
Module 4: De Novo Assembly
Jared Simpson
Integrated Assignment
Florence Cavalli
We will perform the same analysis as in Module 2 but using the mother and father samples i.e sample NA12891 and NA12891.
Files are in the following directory of the cloud instance: ~/CourseData/HT_data/Module2/
* raw_reads/NA12891_CBW_chr1_R1.fastq.gz
* raw_reads/NA12891_CBW_chr1_R2.fastq.gz
* raw_reads/NA12892_CBW_chr1_R1.fastq.gz
* raw_reads/NA12892_CBW_chr1_R2.fastq.gz
#set up
export ROOT_DIR=~/workspace/Integrated_assignment
export TRIMMOMATIC_JAR=$ROOT_DIR/tools/Trimmomatic-0.36/trimmomatic-0.36.jar
export PICARD_JAR=$ROOT_DIR/tools/picard-tools-1.141/picard.jar
export GATK_JAR=$ROOT_DIR/tools/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar
export BVATOOLS_JAR=$ROOT_DIR/tools/bvatools-1.6/bvatools-1.6-full.jar
export REF=$ROOT_DIR/reference/
# Create a directory to work in (workspace/Integrated_assignment)
# this is where we'll place all of our output files
mkdir -p $ROOT_DIR
cd $ROOT_DIR
# Erase any files that might already be there</b>
rm *
# Create symbolic links for all of the files contained in the Module2 directory
# this includes the hg19 genome and the FASTQ files
ln -s ~/CourseData/HT_data/Module2/* .
ls
Task list:
-
Check read QC
-
Trim unreliable bases from the read ends
-
Align the reads to the reference
-
Sort the alignments by chromosome position
-
Realign short indels
-
Fixe mate issues
-
Recalibrate the Base Quality
-
Generate alignment metrics
Discussion/Questions:
-
Explain the purpose of each step
-
Which software tool can be used for each step
Day 2
Module 5: Genome Variation
Guillaume Bourque
Pro-tip: A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.
Pro-tip 2: Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.
Programs:
Module 6: Genome Structural Variation
Guillaume Bourque
Programs:
Module 7: Bringing it Together with Galaxy
David Morais
Data set:
NA12878_CBW_chr1_R1.fastq.gz
http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R1.fastq.gz
NA12878_CBW_chr1_R2.fastq.gz
http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R2.fastq.gz
hg19_chr1.fa
http://cbw##.dyndns.info/Module7/hg19_chr1.fa
dbSNP_135_chr1.vcf.gz
http://cbw##.dyndns.info/HTSeq_module2/reference/dbSNP_135_chr1.vcf.gz
Note: ## is your student number.
Galaxy workflow part 1 (cloud):
Galaxy workflow part 2 (main instance):
What you need for the lab:
- Galaxy public server
- An account on Galaxy to run tools in their environment.
Galaxy Resources:
- Galaxy home page
- Galaxy public server
- Source for installing local Galaxy
- Example of Galaxy pipeline put example here
- Galaxy 101 worked example
- Galaxy servers throughout the world
- Published pages
Data for the Workshop
Tool Installation
Instructions for installing the tools used in the workshops can be found here.