Course Schedule
Schedule for May 30 to June 3, 2016
Workshop Q/A Forum
Post your workshop questions here!
Workshop Survey
We appreciate your feedback on your experience at the workshop. Please complete our survey at the end of the workshop.
Laptop Setup Instructions
Instructions to setup your laptop can be found here.
Difference Between R and RStudio
RStudio doesn’t know where libraries are installed, when they are not installed through the RStudio package manager. To tell RStudio the location, you can define the path in a startup file. Create a file called .Renviron . Inside there:
R_LIBS=<R Library Path of other installed packages>
That was the problem when students installed things in RStudio at the command line using the R command install.package()
.
… or you could use the package manger to install libraries.
Syntax highlighting
… of scripts in the R editor does not seem to work under Windows. If you want highlighted syntax, use RStudio instead.
Pre-Workshop Tutorials
1) R Preparation tutorials: You are expected to have completed the following tutorials in R beforehand. The tutorial should be very accessible even if you have never used R before.
2) Cytoscape 3.x Preparation tutorials: Complete the introductory tutorial to Cytoscape 3.x:
- Introduction to Cytoscape3 - User Interface
- Introduction to Cytoscape3 - Welcome Screen
- Introduction to Cytoscape 3.1 - Networks, Data, Styles, Layouts and App Manager
3) UNIX Preparation tutorials: Please complete tutorials #1-3 on UNIX Tutorial for Beginners
Pre-workshop Readings
Before coming to the workshop, read these.
Database resources of the National Center for Biotechnology Information
COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer
Integrative genomic profiling of human prostate cancer
Predicting the functional impact of protein mutations: application to cancer genomics
Cancer genome sequencing study design
Using cloud computing infrastructure with CloudBioLinux, CloudMan, and Galaxy
The UCSC Genome Browser database: extensions and updates 2013
Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration
Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data
Expression Data Analysis with Reactome
Logging into the Amazon Cloud
Instructions can be found here.
- We have set up 30 instances on the Amazon cloud - one for each student. In order to log in to your instance, you will need a security certificate. If you plan on using Linux or Mac OS X, please download this certificate. Otherwise if you plan on using Windows (with Putty and Winscp), please download this certificate.
Class Photo
YouTube Playlist for Recorded Lectures
Day 1
Welcome
Ann Meyer
Module 1: Introduction to Cancer Genomics
Trevor Pugh
Module 2.1: Databases and Visualization Tools
Michelle Brazas and Florence Cavalli
Links:
- ICGC
- DCC portal on ICGC
- Docs for ICGC
- Integrated Genomics Viewer
- UCSC Genome Browser
- UCSC Genome Browser
- Cancer Genome Workbench
- cBioPortal for Cancer Genomics
- Savant Genome Browser
Module 2.2: Logging into the Cloud
Francis Ouellette
Optional R Review Session
Florence Cavalli
Links:
Day 2
Module 3: Mapping and Genome Rearrangement
Jared Simpson
Lab practicals: Part 1 - Mapping and Part 2 - Rearrangements
Links:
- What does my SAM flag mean?
- Tools for Mapping High-throughput Sequencing Data Paper (2012)
- SAM/BAM file specifications
- samtools
- bwa
- lumpy-sv
Module 4: Gene Fusion Discovery
Andrew McPherson
Lab practical:
Papers and Background Material:
- A survey of best practices for RNA-seq data analysis
- The impact of translocations and gene fusions on cancer causation
- The emerging complexity of gene fusions in cancer
- The landscape and therapeutic relevance of cancer-associated transcript fusions
- Fusion genes and their discovery using high throughput sequencing
Links:
- BioStar
- SeqAnswers
- Integrative Genomics Viewer (IGV)
- FASTQ format
- SAM/BAM format
- Illumina iGenomes
- SamTools
- Picard
- FastQC
- SAMStat
- Bowtie
- Bowtie2
- TopHat/TopHat2
- Cufflinks/Cuffdiff
- CummeRbund
Day 3
Module 5: Copy Number Alterations
Sohrab Shah and Fong Chun Chan
- Lab Module
- This is the instructions for the lab practical.
- Data Analysis Package
- Contains the various files and Rmarkdown file that will be used to do further exploration and analysis on copy number alterations.
- This is package is already on the server. You can also download this to your own computer and perform the analyses locally.
- Software Installation
- This page contains information on how to install the different software used in the lab practical.
- Data Preparation
- This page contains information on how the data was prepared to be used for lab practical.
Data for Lab Practical
- METABRIC Seg File
- Seg file from the METABRIC project to be visualized in IGV.
Plots for Lab Practical
These plots are provided for convenience. They can be generated by following the lab practical.
- Oncosnp
- Titan
Links:
- PennCNV-Affy: In-depth guide into pre-processing of Affymetrix 6.0 microarrays for OncoSNP
- OncoSNP
- Titan
- SnpEff/SnpSift
Module 6: Somatic Mutations
Sohrab Shah and Fong Chun Chan
- Lab Module
- This is the instructions for the lab practical.
- Data Analysis Package
- Contains the various files and Rmarkdown file that will be used to do further exploration and analysis on somatic mutations data.
- This is package is already on the server. You can also download this to your own computer and perform the analyses locally.
- Data Preparation
- This page contains information on how the data was prepared to be used for lab practical.
- Pre-processing Bams
- This page contains information on how to pre-process bam (e.g. filtering) for downstream analyses.
Links:
Day 4
Module 7: Gene Expression Profiling
Fouad Yousif
Links:
Module 8: Variants to Networks
Part 1: How to annotate variants and prioritize potentially relevant ones
Robin Haw
Data Set Output - Annovar text table
Links
Part 2: From genes to pathways
Juri Reimand
Data Sets Gene Lists:
Data Sets Enrichment Results (g:Profiler) from Gene Lists:
Data Sets Enrichment Map (Cytoscape) from Enrichment Results:
Enrichmentmap
Day 5
Part 3: Network Analysis using Reactome
Robin Haw
Lab practical and Answers
Data Sets:
Papers:
Integrated genomic analyses of ovarian carcinoma
Clustering Algorithms: Newman Clustering and Hotnet
Reactome Website: NAR paper; Website guide
Nature Methods and Perspectives Paper
Links:
- GO
- KEGG
- Biocarta
- Reactome Curated human pathways
- NCI/PID
- Pathway Commons Aggregates pathways from multiple sources
- iRefWeb/iRefIndex Protein interactions
- >300 more
Tools for finding/converting gene identifiers and gene attributes
Cytoscape
Useful plugins:
- VistaClara - makes it easy to visualize gene expression data on networks
- Agilent Literature Search - extracts interactions from PubMed abstracts
- clusterMaker - provides multiple ways to cluster gene expression and networks
- BiNGO - provides over-representation analysis using Gene Ontology in Cytoscape - you can select genes in your network or provide a list of genes and see the enrichment results visually mapped to the Gene Ontology
- commandTool, coreCommands - used to control Cytoscape by a series of commands. E.g. automate the process: open network, layout network, save network as PDF. These plugins require Cytoscape 2.7
- jActiveModules - requires gene expression data over multiple samples (>3). Finds regions of a network where genes are active (e.g. differentially expressed) across multiple samples.
- EnrichmentMap
- ReactomeFI
- Many more
Module 9: Integration of Clinical Data
Anna Goldenberg and Lauren Erdman
Tools:
Papers:
Similarity network fusion for aggregating data types on a genomic scale
Data for the Workshop
Tool Installation
Instructions for installing the tools used in the workshops can be found here.
Data Sets
- HCC1395 data: CEL exome rnaseq
- Module 3 data
- Module 4 data: bams, cbw_tutorial, refdata, sampledata
- Module 5 data: data ref_data
- Module 6 data
- Module 7 data