High-Throughput Sequencing 2015 Student Page

Laptop Setup Instructions

Instructions for setting up your laptop can be found here: Laptop Setup Instructions

Pre-Workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

The R Tutorial up to and including 5. Basic Plots
The R command cheat sheet

2) UNIX Preparation tutorials:

UNIX Bootcamp
Tutorials #1-3 on UNIX Tutorial for Beginners
Unix Cheat sheet

3) IGV Tutorial: Review how to use IGV Genome Browser if you have not used this tool before.

The IGV Tutorial

Pre-Workshop Readings

Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy

Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration

Genome structural variation discovery and genotyping

A survey of sequence alignment algorithms for next-generation sequencing

Genotype and SNP calling from next-generation sequencing data

Logging into the Amazon cloud

Instructions can be found here.

These instructions will ONLY be relevant in class, as the Cloud will not be accessible from home in advance of the class.

Day 1

Welcome

*Faculty: Michelle Brazas*

Module 1: Overview of HT-sequencing & Cloud Computing

*Faculty: Zhibin Lu*

Lecture:

HT-seq2015_Module1.pdf
HT-seq2015_Module1.ppt
HT-seq2015_Module1.mp4

Module 2: Reference-guided Genome Alignment

*Faculty: Matei David*

Lecture:

HT-seq2015_Module2.pdf
HT-seq2015_Module2.mp4

Lab Practical:

Reference Guided Genome Alignment Lab practical

Discussion questions

Data set:

After the workshop: You can download the data set from here. You may also need download the reference genome if you do not have one to do the lab practice on your own machine.

Programs used:

Links to Additional Resources:

Module 3: Data Visualization

*Faculty: Sorana Morrissy*

Lecture:

HT-seq2015_Module3.pdf
HT-seq2015_Module3.ppt
HT-seq2015_Module3.mp4

Lab Practical:

Using the IGV to visualize HTS datasets

Programs used:

Module 4: De Novo Assembly

*Faculty: Jared Simpson*

Lecture:

HT-seq2015_Module4.pdf
HT-seq2015_Module4.mp4

Integrated Assignment for Day 1

*Faculty: Sorana Morrissy*

Review the techniques learned in Modules 1-3. An additional dataset (fastq file) has been provided here for this purpose.

# Create a directory to work in:
# this is where we'll place all of our output files
mkdir -p ~/workspace/Integrated_assignment
cd ~/workspace/Integrated_assignment
# Erase any files that might already be there
rm *
# Create symbolic links for all of the files contained in the Module2 directory
# this includes the hg19 genome, the FASTQ files, and dbSNP annotation
ln -s ~/CourseData/HT_data/Module2/* .
ls

Task list:

Align the raw data to the human reference genome.
Sort the reads and perform duplicate removal.
Index the sorted bam file.
Perform indel cleaning.
Visualize the alignments.

Discussion/Questions:

Explain the purpose of each step.
Which software tool can be used for each step.

Integrated Assignment: IA_Question_Answers_2015.txt

Day 2

Module 5: Small variant calling & annotation

*Faculty: Guillaume Bourque*

Lecture:

HT-seq2015_Module5.pdf
HT-seq2015_Module5.ppt
HT-seq2015_Module5.mp4

Lab Practical:

Lab directions

VCF format

**Pro-tip:**

A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.

**Pro-tip 2:**

Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.

Data set:

After the workshop: You can download the data set from here to your local machine and work from there.

Programs used: