Video from 1st Bioinformatics Workshop
https://www.youtube.com/watch?v=yzL1yJ8znz0 Stephanie Finds Bioinformatics Stephanie Butland was in my group in 1999, and one of her first jobs in my group was to be a TA in the 1st two week CBW in Calgary, AB.
Laptop Setup Instructions
Instructions for setting up your laptop can be found here: Laptop Setup Instructions_HT-Biology
Pre-Workshop Tutorials
1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.
2) Cytoscape 3.x Preparation tutorials: Complete the introductory tutorial to Cytoscape 3.x: http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape3
- Introduction to Cytoscape3 - User Interface
- Introduction to Cytoscape3 - Welcome Screen
- Introduction to Cytoscape 3.1 - Networks, Data, Styles, Layouts and App Manager
3) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/
Pre-Workshop Readings
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
Savant Genome Browser 2: visualization and analysis for population-scale genomics
Genome structural variation discovery and genotyping
A survey of sequence alignment algorithms for next-generation sequencing
Genotype and SNP calling from next-generation sequencing data
Methods to study splicing from high-throughput RNA sequencing data
Differential analysis of gene regulation at transcript resolution with RNA-seq
Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing
GeneMANIA Prediction Server 2013 Update
How to visually interpret biological data using networks
g:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments
g:Profiler–a web server for functional interpretation of gene lists (2011 update)
Expression data analysis with reactome
Logging into the Amazon cloud
Instructions can be found here.
- These instructions will ONLY be relevant in class, as the Cloud will not be accessible from home in advance of the class.
Day 1
Module 1: Overview of HT-sequencing & Cloud Computing
*Faculty: Francis Ouellette*Lecture
HTBSN15_Day1_Module1.pdf
HTBSN15_Day1_Module1.ppt
HTBSN15_Day1_Module1.mp4
From http://goo.gl/ruy9ib
Module 2: Reference Genome Alignment
*Faculty: Matei David*Lecture:
HT-Biology2015_Module2.pdf
HT-Biology2015_Module2.mp4
Lab Practical:
Data set:
You can download the data set from here after the workshop. You may also need download the reference genome if you do not have one to do the lab practice on your own machine.
Programs used:
Links to Additional Resources:
- SEQanswers bioinformatics forum
- SAM/BAM file format specification
- Paired end vs mate pair reads
- Base qualities vs mapping qualities
- The decoy genome
Module 3: Data Visualization
*Faculty: Marc Fiume*Lecture:
HT-Biology2015_Module3.pdf
HT-Biology2015_Module3.ppt
Lab Practical:
- Visualization Scavenger Hunt
- Part I: Using the IGV to visualize HTS datasets
- Part II: Using the Savant Genome Browser to visualize HTS datasets (start at inspecting structural variants)
Data set:
You can download the data set from here after the workshop to your local machine and work from there.
Programs used:
IGV Tips and Tricks:
Module 4: De Novo Assembly
*Faculty: Jared Simpson*Lecture:
HT-Biology2015_Module4.pdf
HT-Biology2015_Module4.mp4
Paper cited by Jared in lecture: A comprehensive evaluation of assembly scaffolding tools
Integrated Assignment for Day 1
*Faculty: Richard de Borja*Review the techniques learned in Modules 1-3.
Task list:
- Align the raw data to the human reference genome.
- Sort the reads and perform duplicate removal.
- Index the sorted bam file.
- Perform indel cleaning.
- Visualize the alignments.
Discussion/Questions:
- Explain the purpose of each step.
- Which software tool can be used for each step.
Integrated Assignment: IA_Questions-Answers_2015.txt
Data set:
The data set for this integrated assignment is included in Module2 data set
Day 2
Module 5: Small variant calling & annotation
*Faculty: Guillaume Bourque*Lecture:
HT-Biology2015_Module5.pdf
HT-Biology2015_Module5.ppt
HT-Biology2015_Module5.pmp4
Lab Practical:
**Pro-tip:**A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.
**Pro-tip 2:**Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.
Data set:
You can download the data set from here after the workshop to your local machine and work from there.
Programs used:
Module 6: Structural variation calling
*Faculty: Guillaume Bourque*Lecture:
HT-Biology2015_Module6.pdf
HT-Biology2015_Module6.ppt
HT-Biology2015_Module6.mp4
Data set:
You can download the data set from here after the workshop to your local machine and work from there.
Programs used:
Module 7: Bringing it all Together: Galaxy
*Faculty: Francis Ouellette*Lecture:
Module_07_NYGC_CBW_Ouellette_ver04.pdf
Module_07_NYGC_CBW_Ouellette_ver04.ppt
Module_07_NYGC_CBW_Ouellette_ver04.mp4
Lab Practical:
Dataset for the Galaxy lab:
In Galaxy, under Get Data and Upload File in the URL box:
NA12878_CBW_chr1_R1.fastq.gz
http://cbwxx.dyndns.info/module2/NA12878_CBW_chr1_R1.fastq.gz
NA12878_CBW_chr1_R2.fastq.gz
http://cbwxx.dyndns.info/module2/NA12878_CBW_chr1_R2.fastq.gz
hg19_chr1.fa
http://cbwxx.dyndns.info/module7/hg19_chr1.fa
dbSNP_135_chr1.vcf.gz
http://cbwxx.dyndns.info/module2/dbSNP_135_chr1.vcf.gz
Note: xx is your student number.
Galaxy workflow part 1 (cloud): Galaxy-Workflow-CBW Galaxy lab part1 Alignment Variant calling.ga Galaxy workflow part 2 (main instance): Galaxy-Workflow-CBW Galaxy lab part2 VariantFiltration Annotation.ga
What you need for the lab:
- Galaxy public server: https://usegalaxy.org/
You will need to register for an account on Galaxy so that you can run tools in their environment.
Galaxy Resources:
- galaxyproject.org: Galaxy home page
- usegalaxy.org: main Galaxy public server
- getgalaxy.org: source for installing local Galaxy
- usegalaxy.org/cloud: use galaxy in the cloud
- Example of a Galaxy pipeline (we used for an RNASeq lab last year. Save file as: Galaxy-Workflow-Module_5_workflow_from_Emilie_Chautard_and_Francis.ga
- Galaxy 101 worked example
- Galaxy servers throughout the world
- Published (read: Public) pages
Day 3
Module 8: Introduction to RNA sequencing and analysis
*Faculty: Malachi Griffith*Lecture slides:
HT-Biology2015_Module8.pdf
HT-Biology2015_Module8.ppt
HT-Biology2015_Module8.mp4
Lab practical:
Lab introduction slides: HT-Biology2015_Module8_LabSlides.pdf
Tutorial scripts:
http://www.rnaseq.wiki - Module 1 Tutorial
Module 9: RNA-seq alignment and visualization
*Faculty: Obi Griffith*Lecture slides:
HT-Biology2015_Module9.pdf
HT-Biology2015_Module9.ppt
HT-Biology2015_Module9.mp4
Lab practical:
Lab introduction slides: HT-Biology2015_Module9_LabSlides.pdf
Tutorial scripts:
http://www.rnaseq.wiki - Module 2 Tutorial
Integrated Assignment - Day 3
*Faculty: Fouad Yousif*Paper for Integrated Assignment Day 3 - Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing PMC3107329
Assignment Text:
Assignment Questions:
Answer Key:
Day 4
Module 10: Expression and Differential Expression
*Faculty: Obi Griffith*Lecture slides:
HT-Biology2015_Module10.pdf
HT-Biology2015_Module10.ppt
HT-Biology2015_Module10.mp4
Lab practical:
Lab introduction slides: HT-Biology2015_Module10_LabSlides.pdf
Tutorial Scripts:
http://www.rnaseq.wiki - Module 3 Tutorial
Module 11: Isoform discovery and alternative expression
*Faculty: Malachi Griffith*Lecture slides:
HT-Biology2015_Module11.pdf
HT-Biology2015_Module11.ppt
HT-Biology2015_Module11.mp4
Lab practical:
Lab introduction slides: HT-Biology2015_Module11_LabSlides.pdf
Tutorial scripts:
http://www.rnaseq.wiki - Module 4 Tutorial
Integrated Assignment - Day 4
*Faculty: Fouad Yousif*Assignment Answer Key: Integrated Assignment Answers
Keeping Up-to-date with RNA-seq Analysis Developments
- The RNA-seq blog - recent developments in RNA-seq technology and analysis
- Biostar - A forum for the bioinformatics community
- SEQanswers - A forum for the next generation sequencing community
- HTS Mappers - A list of read mappers
- The periodic table of bioinformatics - A list of popular bioinformatics resources and tools
- The Bioinformatics Links Directory
Day 5
Module 12: Introduction to Pathway and Network Analysis
*Faculty: Jüri Reimand*Lecture:
HT-Biology2015_Module12.pdf
HT-Biology2015_Module12.ppt
HT-Biology2015_Module12.mp4
Links:
* The Synergizer - identifier mapping
* [Ensembl BioMart(http://www.ensembl.org/index.html) - (in menu bar, select the BioMart tab) eukaryotic gene query system
* ID Conversion Tool: gConvert - identifier mapping
* Gene Ontology - gene annotation
* Cytoscape - network visualization and analysis
Module 13: Finding over-represented pathways in gene lists
*Faculty: Quaid Morris*Lecture:
HT-Biology2015_Module13.pdf
HT-Biology2015_Module13.ppt
HT-Biology2015_Module13.mp4
Lab Practical:
*Faculty: Jüri Reimand*HT-Biology2015_Module13_Lab.pdf
g:Profiler Files and data for lab:
- Materials_for_ORA.zip
- MCF7_12hr_topgenes.txt - g:Profiler input with significantly expressed genes in MFC7 cells at 12h
- MCF7_24hr_topgenes.txt - g:Profiler input with significantly expressed genes in MFC7 cells at 24h
- MCF7_12hr_24hr_topgenes_for_gCocoa.txt - g:Cocoa input with both gene lists
- Yeast_TFs_in_cell_cycle.txt - cell cycle transcription factor list, to test with background set
- Yeast_TF_background_list.txt - all transcription factors in yeast, to test with background set
Link to g:Profiler:
http://biit.cs.ut.ee/gprofiler/index.cgi
Online info and tutorials:
BaderLab tutorial for g:Profiler + Enrichment Map: http://www.baderlab.org/Software/EnrichmentMap/GProfilerTutorial
Additional Links:
- Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources
- Comparison of enrichment tools
- ConceptGen - Enrichment Analysis for simple gene lists (Fisher’s Exact Test)
- GSEA - Enrichment Analysis for ranked gene lists
- Other Enrichment Analysis tools for simple gene lists: Funspec, GoMiner
- List of 68 Enrichment Tools available as of 2008
Module 14: Cytoscape Intro, Demo and Enrichment Maps
*Faculty: Jüri Reimand*Lab Practical:
Use the enrichment results from g:Profiler in module 13 (back up files) to create Enrichment Maps
HT-Biology2015_Module14_LabSlides.pdf
Cytoscape files and data for lab:
- Materials_for_Cytoscape_and_EnrichmentMap.zip
- Cytoscape_demo_session.cys - Cytoscape session to test network visualisation, filtering, and analysis
- Cytoscape_example_network.txt - Example network for Cytoscape demo. Load with File > Import > Network.
- Cytoscape_example_node_attributes.txt - Node attributes for Cytoscape demo. Load with File > Import > Table.
- EnrichmentMap_24h_Cytoscape_session.cys - Cytoscape session with Enrichment Maps of MCF7 cells on 24h.
- cancer_genes.gmt - GMT file with list of cancer genes used for Post-Analysis.
- enrichmentmap-2.0.1.jar - Java file of Enrichment Map app, install with Apps > App Manager > Install from File.
- enrichments_12h_gem1029976022995.txt - Table of pathway enrichments from g:Profiler, MFC7 cells at 24h.
- enrichments_24h_gem1047153205012.txt - Table of pathway enrichments from g:Profiler, MFC7 cells at 12h.
- hsapiens.NAME.gmt - GMT file with pathways and corresponding gene sets from g:Profiler.
Lab Practical optional:
Use your own data set.
Programs Used:
* Open Tutorials for Cytoscape: http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape * EnrichmentMap - http://apps.cytoscape.org/apps/enrichmentmap Enrichment Map App can be also downloaded from Cytoscape > Apps > App Manager > Search > EnrichmentMap > Install.
Enrichment Map info and tutorials:
- Enrichment Map Software: http://baderlab.org/Software/EnrichmentMap
BaderLab tutorial for g:Profiler + Enrichment Map: http://www.baderlab.org/Software/EnrichmentMap/GProfilerTutorial
Enrichment Map Post-Analysis Tutorial: http://www.baderlab.org/Software/EnrichmentMap/PostAnalysisTutorial
Other useful Cytoscape apps:
* Agilent Literature Search - extracts interactions from PubMed abstracts
* clusterMaker2 - provides multiple ways to cluster gene expression and networks
* BiNGO - provides over-representation analysis using Gene Ontology in Cytoscape - you can select genes in your network or provide a list of genes and see the enrichment results visually mapped to the Gene Ontology
* jActiveModules - requires gene expression data over multiple samples (>3). Finds regions of a network where genes are active (e.g. differentially expressed) across multiple samples.
* Many more at http://apps.cytoscape.org/apps/enrichmentmap
Integrated Assignment - Day 5
*Faculty: Irina Kalatskaya*Lab Practical:
HT-Biology2015_Day5_IntegratedAssignment.pdf
HT-Biology2015_Day5_IntegratedAssignmentAnswers.pdf
Input Data sets:
- Expression.txt
- gem1033458993259_BE.txt
- gem1047581616441_EAC.txt
- GeneSet1_BE.txt
- GeneList2_EAC.txt
- hsapiens.NAME.gmt
Day 6
Module 15: Depth on Pathway and Network Analysis
*Faculty: Robin Haw*Lecture:
HT-Biology2015_Module15.pdf
HT-Biology2015_Module15.ppt
HT-Biology2015_Module15.mp4
Lab Practical:
HT-Biology2015_Module15_LabSlides.pdf
HT-Biology2015_Module15_LabExercise.pdf
HT-Biology2015_Module15_LabAnswers.pdf
Data Sets:
Programs Used:
Papers:
Integrated genomic analyses of ovarian carcinoma
Clustering Algorithms: Newman Clustering and Hotnet
Reactome Website: NAR paper; Website guide
Nature Methods and Perspectives Paper
Links:
Pathway and Interaction databases
- GO
- KEGG
- Biocarta
- Reactome Curated human pathways
- NCI/PID
- Pathway Commons Aggregates pathways from multiple sources
- iRefWeb/iRefIndex Protein interactions
- >300 more
Module 16: Gene Function Prediction
*Faculty: Quaid Morris*Lecture:
HT-Biology2015_Module16.pdf
HT-Biology2015_Module16.ppt
HT-Biology2015_Module16.mp4
Lab Practical:
HT-Biology2015_Module16_LabSlides.pdf
HT-Biology2015_Module16_LabExercise.pdf
Data Sets for GeneMANIA exercises:
30_prostate_cancer_genes.txt
mixed_gene_list.txt
CYB11B_pearson_correlation_prostate.txt
Links:
Tools for gene function prediction systems (using functional associations)
- GeneMANIA (or beta version)
- STRING
- FunCoup – similar to STRING and GeneMANIA
- bioPIXIE – an early gene recommender system for yeast
- mouseNET – gene recommender for mouse
- FunctionalNet – composite functional networks for work, yeast, mouse and A thaliana
- FuncBase – a compiled database of gene functional predictions using supervised learning on Gene Ontology categories
Integrated Assignment - Day 6
*Faculty: Irina Kalatskaya*Lab Practical:
-
First step is to update your Reactome FI to an earlier fully functional version: reactomeFI-4.0.1
HT-Biology2015_Day5_IntegratedAssignment.pdf
HT-Biology2015_Day5_IntegratedAssignmentAnswers.pdf
Input Data sets:
- STAD_MutSig.txt (named GastricCancer_mutsig.txt in the instructions)
Day 7
Module 17: Gene Regulation Network Analysis
*Faculty: Michael Hoffman*Lecture:
HT-Biology2015_Module17.pdf
HT-Biology2015_Module17.ppt
HT-Biology2015_Module17.mp4
Lab Practical:
HT-Biology2015_Module17_Lab.pdf
HT-Biology2015_Module17_Lab_Addenda.pdf
Links:
Precomputed results:
- A549 c-Myc
The results provided during the workshop do not work outside the workshop. Archived results are in AppMEMECHIP_4.10.114306204728401779362043.tar.gz.
Tips, tricks, and resources
Data Sets from Entire Workshops
- Reference Genome for HT-seq
- Module 2/7/HT-seq Integrated Data
- Module 3 Data
- Module 5 Data
- Module 6 Data
- Reference for RNA-seq Modules 8-11
- Module 8-11 Data
- RNA-seq Integrated Data
- RNA-seq Integrated Reference
Results from Instructor’s Instance on Amazon
- Module 2 (HT-seq) result
- Module 5 (HT-seq) result
- Module 6 (HT-seq) result
- Module 8-11 (RNA-seq) result
Tools with installation instructions on our Amazon server
Instructions for installing the tools used in the workshops can be found here.
Launching CBW AMI
Steps to launch CBW public AMI
AMI ID: ami-b9a253d2 AMI Name: CBW workshops 2015