Post your workshop questions here!
We value your feedback. Please fill out our survey to help us make our workshops better.
Instructions to setup your laptop can be found here.
1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.
2) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/
3) IGV Tutorial: Review how to use IGV Genome Browser if you have not used this tool before.
Before coming to the workshop, read these.
Instructions can be found here.
- We have set up 30 instances on the Amazon cloud - one for each student. In order to log in to your instance, you will need a security certificate. If you plan on using Linux or Mac OS X, please download this certificate. Otherwise if you plan on using Windows (with Putty and Winscp), please download this certificate.
YouTube Playlist for Recorded Lectures
- SEQanswers bioinformatics forum
- SAM/BAM file format specification
- Base qualities vs mapping qualities
- The decoy genome
- FastQC Good/Bad Examples
We will perform the same analysis as in Module 2 but using the mother and father samples i.e sample NA12891 and NA12891.
Files are in the following directory of the cloud instance: ~/CourseData/HT_data/Module2/ * raw_reads/NA12891_CBW_chr1_R1.fastq.gz * raw_reads/NA12891_CBW_chr1_R2.fastq.gz * raw_reads/NA12892_CBW_chr1_R1.fastq.gz * raw_reads/NA12892_CBW_chr1_R2.fastq.gz
#set up export ROOT_DIR=~/workspace/Integrated_assignment export TRIMMOMATIC_JAR=$ROOT_DIR/tools/Trimmomatic-0.36/trimmomatic-0.36.jar export PICARD_JAR=$ROOT_DIR/tools/picard-tools-1.141/picard.jar export GATK_JAR=$ROOT_DIR/tools/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar export BVATOOLS_JAR=$ROOT_DIR/tools/bvatools-1.6/bvatools-1.6-full.jar export REF=$ROOT_DIR/reference/ # Create a directory to work in (workspace/Integrated_assignment) # this is where we'll place all of our output files mkdir -p $ROOT_DIR cd $ROOT_DIR # Erase any files that might already be there</b> rm * # Create symbolic links for all of the files contained in the Module2 directory # this includes the hg19 genome and the FASTQ files ln -s ~/CourseData/HT_data/Module2/* . ls
Check read QC
Trim unreliable bases from the read ends
Align the reads to the reference
Sort the alignments by chromosome position
Realign short indels
Fixe mate issues
Recalibrate the Base Quality
Generate alignment metrics
Explain the purpose of each step
Which software tool can be used for each step
Pro-tip: A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.
Pro-tip 2: Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.
NA12878_CBW_chr1_R1.fastq.gz http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R1.fastq.gz NA12878_CBW_chr1_R2.fastq.gz http://cbw##.dyndns.info/HTSeq_module2/raw_reads/NA12878/NA12878_CBW_chr1_R2.fastq.gz hg19_chr1.fa http://cbw##.dyndns.info/Module7/hg19_chr1.fa dbSNP_135_chr1.vcf.gz http://cbw##.dyndns.info/HTSeq_module2/reference/dbSNP_135_chr1.vcf.gz
Note: ## is your student number.
Galaxy workflow part 1 (cloud):
Galaxy workflow part 2 (main instance):
What you need for the lab:
- Galaxy public server
- An account on Galaxy to run tools in their environment.
- Example of Galaxy pipeline put example here
- Galaxy 101 worked example
- Galaxy servers throughout the world
- Published pages
Data for the Workshop
Instructions for installing the tools used in the workshops can be found here.