logo

Bioinformatics for Cancer Genomics 2015

Workshop pages for students


Laptop Setup Instructions

Instructions for setting up your laptop can be found here: Laptop Setup Instructions

Difference Between R and R Studio

RStudio doesn’t know where libraries are installed, when they are not installed through the RStudio package manager. To tell RStudio the location, you can define the path in a startup file. Create a file called .Renviron . Inside there:

R_LIBS= <R Library Path of other installed packages>

That was the problem when students installed things in R Studio at the command line using the R command install.package().

… or you could use the package manger to install libraries.

Syntax highlighting

… of scripts in the R editor does not seem to work under Windows. If you want highlighted syntax, use RStudio instead.


Pre-Workshop Tutorials

1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.

2) Cytoscape 3.x Preparation tutorials: Complete the introductory tutorial to Cytoscape 3.x: http://opentutorials.cgl.ucsf.edu/index.php/Portal:Cytoscape3

  • Introduction to Cytoscape3 - User Interface
  • Introduction to Cytoscape3 - Welcome Screen
  • Introduction to Cytoscape 3.1 - Networks, Data, Styles, Layouts and App Manager

3) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX at http://www.ee.surrey.ac.uk/Teaching/Unix/


Pre-Workshop Readings

Database resources of the National Center for Biotechnology Information

COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer

Integrative genomic profiling of human prostate cancer

Predicting the functional impact of protein mutations: application to cancer genomics

Cancer genome sequencing study design

Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy

The UCSC Genome Browser database: extensions and updates 2013

Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration

Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data

Expression Data Analysis with Reactome


Logging into the Amazon cloud

Instructions can be found here.

  • These instructions will ONLY be relevant in class, as the Cloud will not be accessible from home in advance of the class.

Day 1

Module 1: Introduction to cancer genomics

*Faculty: John McPherson*

Lecture: BiCG_2015_Module1.pdf


Module 2: Databases and Visualization Tools

*Faculty: Francis Ouellette*

Lecture:

BiCG_2015_Module2.pdf
BiCG_2015_Module2.ppt
BiCG_2015_Module2.mp4

Toy Data Sets:

Chromosome 21: 19,000,000-20,000,000

HCC1143.normal.21.19M-20M.bam

HCC1143.normal.21.19M-20M.bam.bai

Other Resources on IGV:

IGV Tutorial practice

BroadE IGV 2014-5.pdf

Links:

ICGC
DCC portal on ICGC
Docs for ICGC
Integrated Genomics Viewer
UCSC Genome Browser
UCSC Genome Browser
Cancer Genome Workbench
cBioPortal for Cancer Genomics
Savant Genome Browser


R Review Session

*Faculty: Sorana and Fouad*

Lecture:

BiCG_2015_RReview.pdf

Lab Practical:

BiCG_2015_R code file

Links:

R Studio


Day 2

Module 3: Alignment and Genome rearrangements

*Faculty: Jared Simpson*

Lecture:

BiCG_2015_Module3.pdf
BiCG_2015_Module3.ppt
BiCG_2015_Module3.mp4

Lab Practical:

Installation Instructions Module 3
BiCG_2015_Module3_Lab1.txt
BiCG_2015_Module3_Lab2.txt

Bonus: You can view your results (BAM and BAM.BAI file) in the IGV browser by using the URL for that file from your Cloud instance. We have a web server running on the Amazon cloud for each instance. In a browser, like Firefox, type in your server name (cbw#.dyndns.info) and all files under your workspace will be shown there. Find your Bam and Bam.Bai file, right click it and ‘copy the link location’. Start IGV and choose ‘load from URL’ from File menu, and then paste the location you just copied and you will see the Bam file you just generated in IGV! Narrow down the view to chromosome 15 or 17 where the break points were identified.

Links:

Tools for Mapping High-throughput Sequencing Data Paper   * SAM/BAM file specifications
samtools
Picard
bwa
GASV
BreakDancer

Extras:


example sam header
sam flags explained

Module 4: Gene Fusion Discovery

*Faculty: Andrew McPherson*

Lecture:

BiCG_2015_Module4.pdf
BiCG_2015_Module4.ppt
BiCG_2015_Module4.mp4

Lab Practical:

BiCG_2015_Module4_Lab.pdf

Lab 1:

Module 4 Prediction Lab Run
(For reference purposes, the install instructions for all data/tools can be found here )

Lab 2:

Module 4 Exploration Lab

Lab 3:

Module 4 Visualization Lab

Papers and Background Material:

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks ENCODE RNA-seq Standards

Links:

BioStar
SeqAnswers
Integrative Genomics Viewer (IGV)
FASTQ format
SAM/BAM format
Illumina iGenomes
SamTools
Picard
FastQC
SAMStat
Bowtie
Bowtie2
TopHat/TopHat2
Cufflinks/Cuffdiff
CummeRbund


Day 3

Module 5: Copy Number Alterations

*Faculty: Sohrab Shah*

Lecture:

BiCG_2015_Module5.pdf
BiCG_2015_Module5.ppt
BiCG_2015_Module5.mp4

*Faculty: Fong Chun Chan*

Lab Practical:

Lab Slides

Data for Lab:

* METABRIC Seg File

Plots for Lab

Links

* PennCNV-Affy: In-depth guide into pre-processing of Affymetrix 6.0 microarrays for OncoSNP

OncoSNP
Titan
SnpEff/SnpSift


Module 6: Somatic Mutations

*Faculty: Sohrab Shah*

Lecture:

BiCG_2015_Module6.pdf
BiCG_2015_Module6.ppt
BiCG_2015_Module6.mp4

*Faculty: Fong Chun Chan*

Lab Practical:

Lab Slides

Links:

Strelka
MutationSeq


Day 4

Module 7: Gene Expression Profiling

*Faculty: Paul Boutros*

Lecture:

BiCG_2015_Module7.pdf
BiCG_2015_Module7.ppt
BiCG_2015_Module7.mp4

Lab Practical:

* BiCG_2015_Module7_Lab

Data Sets:

* CEL Files.zip

Links:


Module 8: Variants to Pathways

Part I: Annotation of somatic coding variants and Part II: From Gene Lists to Pathways

*Faculty: Daniele Merico*

Lecture:

CBW_BiCG_2015_Module8_Part1_and_PartII.pdf
CBW_BiCG_2015_Module8_Part1_and_PartII.ppt
CBW_BiCG_2015_Module8_Part1_and_PartII.mp4

Part I Lab Practical: script (Annovar version March 2015)

Lab practical

Data Set: input (VCF):

Data Set Input - VCF

Data Set: output (Annovar text table)

Data Set Output - Annovar text table

Lab Practical: extra info

Lab Practical extra info

Part II Lab Practical: protocol

Lab practical protocol

Data Sets: Gene Lists Data Set Genelist GBM

Data Set Genelist KIRC

Data Sets: Enrichment Results (g:Profiler) from Gene Lists

gProfiler Results GBM

gprofiler Results KIRC

gProfiler hsapiens

Data Sets: Enrichment Map (Cytoscape) from Enrichment Results

EM cys

Enrichment Map


Day 5

Part III: Network Analysis using Reactome FI

*Faculty: Lincoln Stein and Robin Haw*

Lecture:

BiCG_2015_Module8_Part3.pdf
BiCG_2015_Module8_Part3.ppt

Lab Practical:

BiCG_2015_Module8_Part3_LabSlides.pdf
BiCG_2015_Module8_Part3_LabExercise.pdf
BiCG_2015_Module8_Part3_LabAnswers.pdf

Reactome User Guide
ReactomeFI User Guide

Data Sets:

Data Set Genelist KIRC OVCA_TCGA_Clinical.txt

OVCA_TCGA_GeneList.txt

OVCA_TCGA_MAF.txt

Papers:

Integrated genomic analyses of ovarian carcinoma

Clustering Algorithms: Newman Clustering and Hotnet

Reactome Website: NAR paper; Website guide

Nature Methods and Perspectives Paper

Supplementary Materials

Links:

Pathway and Interaction databases

Tools for finding/converting gene identifiers and gene attributes

Cytoscape

Useful plugins:

  • VistaClara - makes it easy to visualize gene expression data on networks
  • Agilent Literature Search - extracts interactions from PubMed abstracts
  • clusterMaker - provides multiple ways to cluster gene expression and networks
  • BiNGO - provides over-representation analysis using Gene Ontology in Cytoscape - you can select genes in your network or provide a list of genes and see the enrichment results visually mapped to the Gene Ontology
  • commandTool, coreCommands - used to control Cytoscape by a series of commands. E.g. automate the process: open network, layout network, save network as PDF. These plugins require Cytoscape 2.7
  • jActiveModules - requires gene expression data over multiple samples (>3). Finds regions of a network where genes are active (e.g. differentially expressed) across multiple samples.
  • EnrichmentMap
  • ReactomeFI
  • Many more

Special Guest Speaker: Dr. John Bartlett, Director of Transformative Pathology, Ontario Institute for Cancer Research

Guest Lecturer Biography - Dr. John Bartlett, Director Transformative Pathology, OICR

Guest Lecture Slides - This will be posted after the private data slides have been removed.


Module 9: Integration of Clinical Data

*Faculty: Anna Lapuk*

Lecture:

BiCG_2015_Module9.pdf BiCG_2015_Module9.pdf

Lab Practical:

BiCG_2015_Module9_Lab.R Taylor et al. Paper - Integrative genomic profiling of human prostate cancer PMC3198787 Data Sets: Module 9 Data Files.zip Papers: Cox Regression Survival Paper.pdf‎

PMID17157792.pdf
PMID17157792 Supplementary Data


Tools with installation instruction in our Amazon server

Tools Used in Our Workshops

Data Sets from Entire Workshops

Module3 data
data set for Module4,5,6

Results from Instructor’s Instance on Amazon

Module3 result
Module4 result
Module5 result
Module6 result
Module8 part I result

Launching CBW AMI

Steps to launch CBW public AMI

AMI ID: ami-b9a253d2 AMI Name: CBW workshops 2015

View on GitHub