logo

BiCG Module 8 Annovar Lab

Workshop pages for students


Setup

First login into the server.

ssh -i CBWCG.pem ubuntu@cbw##.dyndns.info

Now enter the ~/workspace directory

cd ~/workspace

Create a directory for this module and enter this directory:

mkdir Module8
cd Module8

Download the Data Set Input to your instance.

wget https://github.com/bioinformatics-ca/bioinformatics-ca.github.io/raw/master/2016_workshops/cancer/BICG_2016_Module8-Part1_Reimand/Passed.somatic.snvs.vcf

Convert the VCF file to an annovar input file.

convert2annovar.pl --includeinfo -format vcf4old Passed.somatic.snvs.vcf > Passed.somatic.snvs.vcf.annovar.in.txt

The output you will see is:

NOTICE: Read 121 lines and wrote 0 different variants at 8 genomic positions (8 SNPs and 0 indels)
NOTICE: Among 8 different variants at 8 positions, 0 are heterozygotes, 0 are homozygotes
NOTICE: Among 8 SNPs, 1 are transitions, 7 are transversions (ratio=0.14)

Run table_annovar.pl to annotate the variants in the annovar input file you have created.

table_annovar.pl --buildver hg19 Passed.somatic.snvs.vcf.annovar.in.txt /media/cbwdata/software/annovar/annovar/humandb/ --protocol refGene,ljb26_all,1000g2014oct_all,caddgt10,cg69,clinvar_20150330,cosmic70,esp6500siv2_all,exac02,snp138,genomicSuperDups,phastConsElements46way --operation g,f,f,f,f,f,f,f,f,f,r,r --nastring NA --outfile passed.somatic.snvs.vcf.annovar.out.txt

Arguments in the command

–buildver hg19

  • human genome reference build

Passed.somatic.snvs.vcf.annovar.in.txt

  • input file

/media/cbwdata/software/annovar/annovar/humandb/

  • Annovar db path

–protocol refGene,ljb26_all,1000g2014oct_all,caddgt10,cg69,clinvar_20150330,cosmic70,esp6500siv2_all,exac02,snp138,genomicSuperDups,phastConsElements46way

  • list of Annovar annotation modules to be executed, corresponding to specific databases

–operation g,f,f,f,f,f,f,f,f,f,r,r

  • type of operation to be executed by Annovar annotation modules
    • g = gene (only for the gene database), f = filter (exact match by coordinates, ref, alt), r = regional (coordinate overlap)
  • –protocol and –operation need the same number of comma-separated items

–nastring NA

  • encoding of NA values, “NA” is good for R post-processing, use “.” for VCF output

–outfile passed.somatic.snvs.vcf.annovar.out.txt

  • output file name prefix

Output File

The output file created (Passed.somatic.snvs.vcf.annovar.out.txt.hg19_multianno.txt) is a tab-delimited file, where each row represents one variant, and each column represents one annotation task. Table_annovar allows you to specify exactly which columns or annotation tasks are required, and allows you to select multiple versions of the same analysis (such as multiple gene-definition systems or multiple dbSNP databases).

The output you will see is:

 NOTICE: Processing operation=g protocol=refGene

 NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg19 -dbtype refGene -outfile passed.somatic.snvs.vcf.annovar.out.txt.refGene -exonsort passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: Reading gene annotation from /usr/local/annovar/humandb/hg19_refGene.txt ... Done with 51039 transcripts (including 11569 without coding sequence annotation) for 26311 unique genes
 NOTICE: Reading FASTA sequences from /usr/local/annovar/humandb/hg19_refGeneMrna.fa ... Done with 21 sequences
 WARNING: A total of 345 sequences will be ignored due to lack of correct ORF annotation
 NOTICE: Finished gene-based annotation on 8 genetic variants in passed.somatic.snvs.vcf.annovar.in.txt
 NOTICE: Output files were written to passed.somatic.snvs.vcf.annovar.out.txt.refGene.variant_function, passed.somatic.snvs.vcf.annovar.out.txt.refGene.exonic_variant_function
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=ljb26_all
 NOTICE: Finished reading 25 column headers for '-dbtype ljb26_all'

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype ljb26_all -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/ -otherinfo>
 NOTICE: the --dbtype ljb26_all is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_ljb26_all_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_ljb26_all_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 557362 and the number of bins to be scanned is 7
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_ljb26_all.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=1000g2014oct_all

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2014oct_all -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_ALL.sites.2014_10_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_ALL.sites.2014_10_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 2824642 and the number of bins to be scanned is 6
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_ALL.sites.2014_10.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=caddgt10

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype caddgt10 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype caddgt10 is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_caddgt10_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_caddgt10_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 2625942 and the number of bins to be scanned is 6
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_caddgt10.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=cg69

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype cg69 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype cg69 is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_cg69_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_cg69_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 2789339 and the number of bins to be scanned is 6
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_cg69.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=clinvar_20140929

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype clinvar_20140929 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype clinvar_20140929 is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_clinvar_20140929_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_clinvar_20140929_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 44738 and the number of bins to be scanned is 1
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_clinvar_20140929.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=cosmic70

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype cosmic70 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype cosmic70 is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_cosmic70_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_cosmic70_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 232279 and the number of bins to be scanned is 5
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_cosmic70.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=esp6500siv2_all

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype esp6500siv2_all -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype esp6500siv2_all is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_esp6500siv2_all_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_esp6500siv2_all_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 594771 and the number of bins to be scanned is 7
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_esp6500siv2_all.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=exac02

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype exac02 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: the --dbtype exac02 is assumed to be in generic ANNOVAR database format
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_exac02_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_exac02_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 750585 and the number of bins to be scanned is 7
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_exac02.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=f protocol=snp138

 NOTICE: Running system command <annotate_variation.pl -filter -dbtype snp138 -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: Variants matching filtering criteria are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_snp138_dropped, other variants are written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_snp138_filtered
 NOTICE: Processing next batch with 8 unique variants in 8 input lines
 NOTICE: Database index loaded. Total number of bins is 2894320 and the number of bins to be scanned is 6
 NOTICE: Scanning filter database /usr/local/annovar/humandb/hg19_snp138.txt...Done
 -----------------------------------------------------------------
 NOTICE: Processing operation=r protocol=genomicSuperDups

 NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype genomicSuperDups -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: Reading annotation database /usr/local/annovar/humandb/hg19_genomicSuperDups.txt ... Done with 51599 regions
 NOTICE: Finished region-based annotation on 8 genetic variants in passed.somatic.snvs.vcf.annovar.in.txt
 NOTICE: Output file is written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_genomicSuperDups
 -----------------------------------------------------------------
 NOTICE: Processing operation=r protocol=phastConsElements46wayPlacental

 NOTICE: Running with system command <annotate_variation.pl -regionanno -dbtype phastConsElements46wayPlacental -buildver hg19 -outfile passed.somatic.snvs.vcf.annovar.out.txt passed.somatic.snvs.vcf.annovar.in.txt /usr/local/annovar/humandb/>
 NOTICE: Reading annotation database /usr/local/annovar/humandb/hg19_phastConsElements46wayPlacental.txt ... Done with 3743478 regions
 NOTICE: Finished region-based annotation on 8 genetic variants in passed.somatic.snvs.vcf.annovar.in.txt
 NOTICE: Output file is written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_phastConsElements46wayPlacental
 -----------------------------------------------------------------
 NOTICE: Multianno output file is written to passed.somatic.snvs.vcf.annovar.out.txt.hg19_multianno.txt

From a separate local machine terminal instance, copy the output file back to your local machine

scp -i CBWCG.pem ubuntu@cbw##.dyndns.info://home/ubuntu/workspace/Module8/passed.somatic.snvs.vcf.annovar.out.txt.hg19_multianno.txt ./

You can open the file in Excel (select “tab-delimited” when opening the file). Click the “DATA” tab at the menu bar, then click the big “Filter” button. Then click any one of the headings to filter out variants, essentially by clicking the check boxes.

View on GitHub