logo

High-Throughput Biology - From Sequence to Networks 2015

Workshop pages for students


Video from 1st Bioinformatics Workshop

https://www.youtube.com/watch?v=yzL1yJ8znz0 Stephanie Finds Bioinformatics Stephanie Butland was in my group in 1999, and one of her first jobs in my group was to be a TA in the 1st two week CBW in Calgary, AB.

Laptop Setup Instructions

Instructions for setting up your laptop can be found here: Laptop Setup Instructions_HT-Biology

Pre-Workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

2) Cytoscape 3.x Preparation tutorials: Complete the introductory tutorial to Cytoscape 3.x: http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape3

  • Introduction to Cytoscape3 - User Interface
  • Introduction to Cytoscape3 - Welcome Screen
  • Introduction to Cytoscape 3.1 - Networks, Data, Styles, Layouts and App Manager

3) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/


Pre-Workshop Readings

Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy

Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration

Savant Genome Browser 2: visualization and analysis for population-scale genomics

Genome structural variation discovery and genotyping

A survey of sequence alignment algorithms for next-generation sequencing

Genotype and SNP calling from next-generation sequencing data

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

ENCODE RNA-Seq Standards

Methods to study splicing from high-throughput RNA sequencing data

Differential analysis of gene regulation at transcript resolution with RNA-seq

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium

Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing

The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function

GeneMANIA Prediction Server 2013 Update

How to visually interpret biological data using networks

g:Profiler–a web-based toolset for functional profiling of gene lists from large-scale experiments

g:Profiler–a web server for functional interpretation of gene lists (2011 update)

Expression data analysis with reactome


Logging into the Amazon cloud

Instructions can be found here.

  • These instructions will ONLY be relevant in class, as the Cloud will not be accessible from home in advance of the class.

Day 1

Module 1: Overview of HT-sequencing & Cloud Computing

*Faculty: Francis Ouellette*

Lecture

HTBSN15_Day1_Module1.pdf
HTBSN15_Day1_Module1.ppt
HTBSN15_Day1_Module1.mp4

Cloud-computing-cartoon-new-yorker1.jpg From http://goo.gl/ruy9ib


Module 2: Reference Genome Alignment

*Faculty: Matei David*

Lecture:

HT-Biology2015_Module2.pdf
HT-Biology2015_Module2.mp4

Lab Practical:

Lab practical

Discussion questions

Data set:

You can download the data set from here after the workshop. You may also need download the reference genome if you do not have one to do the lab practice on your own machine.

Programs used:

Links to Additional Resources:


Module 3: Data Visualization

*Faculty: Marc Fiume*

Lecture:

HT-Biology2015_Module3.pdf
HT-Biology2015_Module3.ppt

Lab Practical:

Data set:

You can download the data set from here after the workshop to your local machine and work from there.

Programs used:

IGV Tips and Tricks:

IGV Tutorial practice


Module 4: De Novo Assembly

*Faculty: Jared Simpson*

Lecture:

HT-Biology2015_Module4.pdf
HT-Biology2015_Module4.mp4

Paper cited by Jared in lecture: A comprehensive evaluation of assembly scaffolding tools


Integrated Assignment for Day 1

*Faculty: Richard de Borja*

Review the techniques learned in Modules 1-3.

Task list:

  1. Align the raw data to the human reference genome.
  2. Sort the reads and perform duplicate removal.
  3. Index the sorted bam file.
  4. Perform indel cleaning.
  5. Visualize the alignments.

Discussion/Questions:

  1. Explain the purpose of each step.
  2. Which software tool can be used for each step.

Integrated Assignment: IA_Questions-Answers_2015.txt

Data set:

The data set for this integrated assignment is included in Module2 data set


Day 2


Module 5: Small variant calling & annotation

*Faculty: Guillaume Bourque*

Lecture:

HT-Biology2015_Module5.pdf
HT-Biology2015_Module5.ppt
HT-Biology2015_Module5.pmp4

Lab Practical:

Lab directions

VCF format

**Pro-tip:**

A great resource for putting together a GATK-based variant calling pipeline is the GATK Best practices page. This page will guide you in your quest to produce the best variant calls possible using GATK.

**Pro-tip 2:**

Another useful program for generating summary statistics on vcf files, filtering vcf files, and comparing multiple vcf files is vcftools.

Data set:

You can download the data set from here after the workshop to your local machine and work from there.

Programs used:


Module 6: Structural variation calling

*Faculty: Guillaume Bourque*

Lecture:

HT-Biology2015_Module6.pdf
HT-Biology2015_Module6.ppt
HT-Biology2015_Module6.mp4

Lab directions

Data set:

You can download the data set from here after the workshop to your local machine and work from there.

Programs used:


Module 7: Bringing it all Together: Galaxy

*Faculty: Francis Ouellette*

Lecture:

Module_07_NYGC_CBW_Ouellette_ver04.pdf
Module_07_NYGC_CBW_Ouellette_ver04.ppt
Module_07_NYGC_CBW_Ouellette_ver04.mp4

Lab Practical:

HT-seq_2015_Module7_Lab.pdf

Dataset for the Galaxy lab:

In Galaxy, under Get Data and Upload File in the URL box:

NA12878_CBW_chr1_R1.fastq.gz http://cbwxx.dyndns.info/module2/NA12878_CBW_chr1_R1.fastq.gz
NA12878_CBW_chr1_R2.fastq.gz http://cbwxx.dyndns.info/module2/NA12878_CBW_chr1_R2.fastq.gz
hg19_chr1.fa http://cbwxx.dyndns.info/module7/hg19_chr1.fa
dbSNP_135_chr1.vcf.gz http://cbwxx.dyndns.info/module2/dbSNP_135_chr1.vcf.gz

Note: xx is your student number.

Galaxy workflow part 1 (cloud): Galaxy-Workflow-CBW Galaxy lab part1 Alignment Variant calling.ga Galaxy workflow part 2 (main instance): Galaxy-Workflow-CBW Galaxy lab part2 VariantFiltration Annotation.ga

What you need for the lab:

You will need to register for an account on Galaxy so that you can run tools in their environment.

Galaxy Resources:

Day 3


Module 8: Introduction to RNA sequencing and analysis

*Faculty: Malachi Griffith*

Lecture slides:

HT-Biology2015_Module8.pdf
HT-Biology2015_Module8.ppt
HT-Biology2015_Module8.mp4

Lab practical:

Lab introduction slides: HT-Biology2015_Module8_LabSlides.pdf

Tutorial scripts:

http://www.rnaseq.wiki - Module 1 Tutorial


Module 9: RNA-seq alignment and visualization

*Faculty: Obi Griffith*

Lecture slides:

HT-Biology2015_Module9.pdf
HT-Biology2015_Module9.ppt
HT-Biology2015_Module9.mp4

Lab practical:

Lab introduction slides: HT-Biology2015_Module9_LabSlides.pdf

Tutorial scripts:

http://www.rnaseq.wiki - Module 2 Tutorial


Integrated Assignment - Day 3

*Faculty: Fouad Yousif*

Integrated Assignment

Paper for Integrated Assignment Day 3 - Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing PMC3107329

Assignment Text:

Assignment Questions:

Integrated Assignment

Answer Key:

Integrated Assignment Answers


Day 4


Module 10: Expression and Differential Expression

*Faculty: Obi Griffith*

Lecture slides:

HT-Biology2015_Module10.pdf
HT-Biology2015_Module10.ppt
HT-Biology2015_Module10.mp4

Lab practical:

Lab introduction slides: HT-Biology2015_Module10_LabSlides.pdf

Tutorial Scripts:

http://www.rnaseq.wiki - Module 3 Tutorial


Module 11: Isoform discovery and alternative expression

*Faculty: Malachi Griffith*

Lecture slides:

HT-Biology2015_Module11.pdf
HT-Biology2015_Module11.ppt
HT-Biology2015_Module11.mp4

Lab practical:

Lab introduction slides: HT-Biology2015_Module11_LabSlides.pdf

Tutorial scripts:

http://www.rnaseq.wiki - Module 4 Tutorial


Integrated Assignment - Day 4

*Faculty: Fouad Yousif*

Integrated Assignment

Assignment Answer Key: Integrated Assignment Answers


Keeping Up-to-date with RNA-seq Analysis Developments

Day 5


Module 12: Introduction to Pathway and Network Analysis

*Faculty: Jüri Reimand*

Lecture:

HT-Biology2015_Module12.pdf
HT-Biology2015_Module12.ppt
HT-Biology2015_Module12.mp4

Links:

The Synergizer - identifier mapping
* [Ensembl BioMart(http://www.ensembl.org/index.html) - (in menu bar, select the BioMart tab) eukaryotic gene query system
ID Conversion Tool: gConvert  - identifier mapping
Gene Ontology - gene annotation
Cytoscape - network visualization and analysis


Module 13: Finding over-represented pathways in gene lists

*Faculty: Quaid Morris*

Lecture:

HT-Biology2015_Module13.pdf
HT-Biology2015_Module13.ppt
HT-Biology2015_Module13.mp4

Lab Practical:

*Faculty: Jüri Reimand*

HT-Biology2015_Module13_Lab.pdf

g:Profiler Files and data for lab:

  • Materials_for_ORA.zip
  • MCF7_12hr_topgenes.txt - g:Profiler input with significantly expressed genes in MFC7 cells at 12h
  • MCF7_24hr_topgenes.txt - g:Profiler input with significantly expressed genes in MFC7 cells at 24h
  • MCF7_12hr_24hr_topgenes_for_gCocoa.txt - g:Cocoa input with both gene lists
  • Yeast_TFs_in_cell_cycle.txt - cell cycle transcription factor list, to test with background set
  • Yeast_TF_background_list.txt - all transcription factors in yeast, to test with background set

 Link to g:Profiler:

http://biit.cs.ut.ee/gprofiler/index.cgi

Online info and tutorials:

BaderLab tutorial for g:Profiler + Enrichment Map: http://www.baderlab.org/Software/EnrichmentMap/GProfilerTutorial

Additional Links:


Module 14: Cytoscape Intro, Demo and Enrichment Maps

*Faculty: Jüri Reimand*

Lab Practical:

Use the enrichment results from g:Profiler in module 13 (back up files) to create Enrichment Maps

HT-Biology2015_Module14_LabSlides.pdf

Cytoscape files and data for lab:

  • Materials_for_Cytoscape_and_EnrichmentMap.zip
  • Cytoscape_demo_session.cys - Cytoscape session to test network visualisation, filtering, and analysis
  • Cytoscape_example_network.txt - Example network for Cytoscape demo. Load with File > Import > Network.
  • Cytoscape_example_node_attributes.txt - Node attributes for Cytoscape demo. Load with File > Import > Table.
  • EnrichmentMap_24h_Cytoscape_session.cys - Cytoscape session with Enrichment Maps of MCF7 cells on 24h.
  • cancer_genes.gmt - GMT file with list of cancer genes used for Post-Analysis.
  • enrichmentmap-2.0.1.jar - Java file of Enrichment Map app, install with Apps > App Manager > Install from File. 
  • enrichments_12h_gem1029976022995.txt - Table of pathway enrichments from g:Profiler, MFC7 cells at 24h.
  • enrichments_24h_gem1047153205012.txt - Table of pathway enrichments from g:Profiler, MFC7 cells at 12h.
  • hsapiens.NAME.gmt - GMT file with pathways and corresponding gene sets from g:Profiler.

Lab Practical optional:

Use your own data set.

Programs Used:

* Open Tutorials for Cytoscape: http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape * EnrichmentMap - http://apps.cytoscape.org/apps/enrichmentmap Enrichment Map App can be also downloaded from Cytoscape > Apps > App Manager > Search > EnrichmentMap > Install.

Enrichment Map info and tutorials:

  BaderLab tutorial for g:Profiler + Enrichment Map: http://www.baderlab.org/Software/EnrichmentMap/GProfilerTutorial
 
 Enrichment Map Post-Analysis Tutorial: http://www.baderlab.org/Software/EnrichmentMap/PostAnalysisTutorial

Other useful Cytoscape apps:

* Agilent Literature Search - extracts interactions from PubMed abstracts
* clusterMaker2 - provides multiple ways to cluster gene expression and networks
* BiNGO - provides over-representation analysis using Gene Ontology in Cytoscape - you can select genes in your network or provide a list of genes and see the enrichment results visually mapped to the Gene Ontology
* jActiveModules - requires gene expression data over multiple samples (>3). Finds regions of a network where genes are active (e.g. differentially expressed) across multiple samples.
* Many more at http://apps.cytoscape.org/apps/enrichmentmap


Integrated Assignment - Day 5

*Faculty: Irina Kalatskaya*

Lab Practical:

HT-Biology2015_Day5_IntegratedAssignment.pdf
HT-Biology2015_Day5_IntegratedAssignmentAnswers.pdf

Input Data sets:


Day 6


Module 15: Depth on Pathway and Network Analysis

*Faculty: Robin Haw*

Lecture:

HT-Biology2015_Module15.pdf
HT-Biology2015_Module15.ppt
HT-Biology2015_Module15.mp4

Lab Practical:

HT-Biology2015_Module15_LabSlides.pdf
HT-Biology2015_Module15_LabExercise.pdf
HT-Biology2015_Module15_LabAnswers.pdf

Data Sets:

Module 15 Data Set

Programs Used:

Papers:

Integrated genomic analyses of ovarian carcinoma

Clustering Algorithms: Newman Clustering and Hotnet

Reactome Website: NAR paper; Website guide

Nature Methods and Perspectives Paper

Supplementary Materials

Links:

Pathway and Interaction databases


Module 16: Gene Function Prediction

*Faculty: Quaid Morris*

Lecture:

HT-Biology2015_Module16.pdf
HT-Biology2015_Module16.ppt
HT-Biology2015_Module16.mp4

Lab Practical:

HT-Biology2015_Module16_LabSlides.pdf
HT-Biology2015_Module16_LabExercise.pdf

Data Sets for GeneMANIA exercises:

30_prostate_cancer_genes.txt
mixed_gene_list.txt
CYB11B_pearson_correlation_prostate.txt

Links:

Tools for gene function prediction systems (using functional associations)

  • GeneMANIA (or beta version)
  • STRING
  • FunCoup – similar to STRING and GeneMANIA
  • bioPIXIE – an early gene recommender system for yeast
  • mouseNET – gene recommender for mouse
  • FunctionalNet – composite functional networks for work, yeast, mouse and A thaliana
  • FuncBase – a compiled database of gene functional predictions using supervised learning on Gene Ontology categories

Integrated Assignment - Day 6

*Faculty: Irina Kalatskaya*

Lab Practical:

HT-Biology2015_Day5_IntegratedAssignment.pdf
HT-Biology2015_Day5_IntegratedAssignmentAnswers.pdf

Input Data sets: * STAD_MutSig.txt (named GastricCancer_mutsig.txt in the instructions)


Day 7


Module 17: Gene Regulation Network Analysis

*Faculty: Michael Hoffman*

Lecture:

HT-Biology2015_Module17.pdf
HT-Biology2015_Module17.ppt
HT-Biology2015_Module17.mp4

Lab Practical:

HT-Biology2015_Module17_Lab.pdf
HT-Biology2015_Module17_Lab_Addenda.pdf

Links:

Precomputed results:

  • A549 c-Myc

The results provided during the workshop do not work outside the workshop. Archived results are in AppMEMECHIP_4.10.114306204728401779362043.tar.gz.

Tips, tricks, and resources

Data Sets from Entire Workshops

Results from Instructor’s Instance on Amazon

Tools with installation instructions on our Amazon server

Instructions for installing the tools used in the workshops can be found here.

Launching CBW AMI

Steps to launch CBW public AMI

AMI ID: ami-b9a253d2 AMI Name: CBW workshops 2015

Bioinformatics discussion Q&A forums

Genomics programming interfaces

Toolkits

View on GitHub