Advanced Genome Analysis Using VAAST and Phevor

Charlene Son-Rigby -

VAAST Algorithm Overview

VAAST Family Analysis Workflows

VAAST Workflow Configurations

VAAST Individual Analysis Workflow

VAAST Individual and Family Reports

VAAST Cohort and Case-Control Workflows

Phevor

 

Opal provides VAAST and Flex, advanced interpretation algorithms, in preconfigured workflows:

  • VAAST provides a statistical ranking of variants and genes based on their likelihood for causing disease. VAAST can be used to analyze individuals, families and cohorts.
  • Flex enables analysis of family data based on mode of inheritance, and cohorts based on shared variants or genes.

This section provides information on VAAST workflows. See this section for information on Flex workflows.

You can access VAAST either from your project by selecting the “Launch App” dropdown menu, or from “App Store” on the Home Page.

 

VAAST Algorithm Overview

VAAST is a probabilistic search tool for identifying damaged genes and their disease-causing variants in personal genome sequences (Yandell et al. 2011, Hu et al 2013). VAAST builds upon existing amino acid substitution (impact scores), aggregative variant frequency and evolutionary conservation approaches to variant prioritization, combining these elements into a single unified likelihood-framework that allows users to identify damaged genes and deleterious variants with greater accuracy than any other currently available algorithm. VAAST can identify rare variants causing rare genetic diseases, and it can also use both rare and common variants to identify genes responsible for common diseases.

Opal implements the VAAST algorithm within pre-configured analysis workflows for analyzing individuals, single parent families, family trios, family quads, and cohort studies. VAAST scores genes by comparing to a background of allele frequencies from “healthy” reference genomes that contain singe variants, or compound heterozygotes that conform to a recessive inheritance model.

There are currently two versions of the VAAST algorithm available, VAAST 2 and VAAST 3. VAAST 3 includes the following enhancements:

  • Runs in under 10 minutes
  • Analyses intronic variants including splice regions
  • A single analysis report for indels and other variants
  • Ensembl 83 annotations for background allele frequency database

VAAST 2 will be supported through Q1 2016 to allow users to complete projects or studies that have already been initiated.

VAAST Family Analysis Workflows – Duo, Trio or Quad

To start a VAAST Family Analysis, select the desired VAAST analysis within a Project from the Launch dropdown menu, or from within the Opal App Store.

You will be presented with a pedigree diagram. The icons representing the members of the pedigree will be notated as:

• Green (unaffected) or red (affected),

• Rounded corners (female) or sharp corners (male), or have

• Sharp corners with a duller color (no sex has been specified)

VAAST Trio Analysis dialog

 

Clicking on one of the icons will open a dialog box:

Pedigree member specification dialog

This window allows you to associate a genome with the selected member of the pedigree. First, select a relevant project from the dropdown menu. Once a project is selected, the box below the project select list will show all of the upgraded genomes in that project. To associate a genome with this member of the pedigree, choose one from the list.

When choosing the genome for the child of the VAAST pedigree (or either child in the case of a quad), you must also specify sex. You do so by choosing one of the options in the sex selection panel, directly below the list of genomes. If you have previously specified a sex for the given genome during upload, then selecting the genome will automatically check the proper sex option for you. In a quad, you also specify whether the sibling is affected or not. Choose a genome for each member of the pedigree. When you have made all of your selections, click the ‘Select’ button.

 

Next, set the background genomes to be used for the analysis. The default is the 1,000 Genomes Project.

The current 1000 Genomes background includes the 1000 Genomes Phase 1 release 3. For variants not already represented in this dataset, the genotypes from the ExAC project have also been added, at the frequencies observed in ExAC. VAAST 3 uses Ensembl 83 coding gene annotations

For VAAST 2 runs, you need to specify:

  • Whether you want VAAST to include indels in the analysis. Due to the type of change insertions and deletions usually introduce, they are typically assigned high scores by VAAST. So, you may want to run two analyses with VAAST, one scoring indels and the other not scoring indels.
  • Whether you want to run both recessive and de novo inheritance modes. Both are selected by default.

 

Finally, click on “Run” to submit the job for execution by the VAAST pipeline. A new “Submitted” report will be added to the project listing. A typical VAAST 2 run will complete in about five hours, while a VAAST 3 run will complete in under 10 minutes. 

 

VAAST Workflow Configurations

The VAAST workflows in Opal use the following VAAST configurations:

  • Set up for completely penetrant conditions
  • In VAAST 2, 5000 permutations per run for Solo and Family. 10,000 permutations per run for Cohort and Case Control.
  • In VAAST 3, 5000 permutations per run for Solo and Family
  • The causative genotype (homozygous or compound heterozygous) is expected to be completely absent from the background and the causative variant(s) are expected to be present in the background file at no more than 5%
  • In VAAST 2, scoring is limited to CDS regions (including splice sites but not splice regions) of the genes present in the genome(s)
  • In VAAST 3, Ensemble 83 gene annotations are used. In VAAST 2, Ensembl 73 gene annotations are used. For both VAAST versions, annotations are filtered to remove transcripts that contain internal stops codons
  • For family workflows, with parental no-call data:
    • De Novo: VAAST collapses the parents’ genomes' no-calls to the reference allele before filtering for de novo variants in the child. In Opal, the parent zygosities will be displayed, and no calls can be filtered using the Filter False Positives filter
    • Recessive: If a parent has a hemizygous no-call for a variant that is present in the proband, VAAST will score the variant for the Recessive mode of inheritance

VAAST Individual Analysis Workflow

To start a VAAST analysis for an individual, select the VAAST Solo Analysis workflow within a Project from the Launch dropdown menu, or from within the Opal App Store.

Similar to the VAAST family analyses, you can select the background, whether to include indels, and whether you want to run both recessive and dominant modes of inheritance.

Finally, click on “Run” to submit the job for execution by the VAAST pipeline.

VAAST Individual and Family Reports

When your VAAST report is ready the status column will change to “Complete” in the project view. Click on the VAAST report link to review the report; the VAAST report window will appear.

The VAAST report is ordered by VAAST rank, which sorts using the VAAST p-value (a measure of the confidence in genome-wide significance) and secondly  VAAST Gene Score (G-Score). Rows with high G-Scores and low p-values are likely caused by sequencing errors and therefore are false positive hits. For that reason they are ranked lower in the list.

 

VAAST Interpretation Report Headers:

 

Review Priority A visual prioritization of variants based on three data elements: ClinVar, Allele Frequency and Effect. See Appendix 3 for more details.

Reports generated from Opal Pipeline 4.3 and below data use the Variant Classification (previously Predicted Class) field. See Appendix 7 for more details.

Gene HGNC symbol of the gene.
Position Chromosome and base pair position.
dbSNP dbSNP identifier if one exists (and an embedded URL link to dbSNP).
Change Reference position and alleles reported in the sample genome. In addition, the HGVS notation for the nucleotide and protein change (if any) for a representative transcript.
Effect

Lists the impact of the variant on the gene and transcripts; i.e. synonymous, non-synonymous, stop gain/loss, indel/frameshift, and splice variants.

Note that VAAST scores all transcripts. If the transcript that VAAST scores highest is not the canonical transcript and the effect is different than the canonical transcript's effect, an asterisk will be displayed in this column. When you click on the Effect hyperlink to display the Variant Consequences modal for the variant, the scored transcript will be highlighted in green, while the canonical transcript will be highlighted in yellow.

Zygosity Genotype of the variant of the proband (homozygote or heterozygote).
Sibling Zygosity Genotype of the variant in the affected or unaffected sibling (Optional).
Father Zygosity Genotype of the variant of the father.
Mother Zygosity Genotype of the variant of the mother.
1KG AF
EVS AF
ExAC AF
Frequencies from 1000 Genomes Project, Exome Variant Server and ExAC. Click on the hyperlinks to access the ethnic subpopulation frequencies.
Omicia score A proprietary impact assessment score that provides a rational aggregation of the PolyPhen, Mutation Taster,. SIFT, and PhyloP scores. Values range from 0 t0 1, the greater the value the more likely that a variant is deleterious or is located in a highly conserved region. Values >0.75 typically indicate that the variant is deleterious. (See Appendix 2)
VVP Score The VAAST Variant Prioritization score applies the VAAST algorithm at the variant level. VAAST takes predicted protein impact, conservation and allele frequency into consideration in its deleteriousness assessment.
CADD Score The CADD score combines information from 63 different annotations including PhastCons, GERP, PhyloP, SIFT and PolyPhen, using a support vector machine classifier (Kircher et al, 2013). It measures deleteriousness by using observed variant frequency as the basis for its calculation. The C score ranges from 1 to 99, with a higher score indicating greater deleteriousness. Values >= 10 are predicted to be the 10% most deleterious substitutions, >= 20 indicate the 1% most deleterious.
Evidence Literary evidence gathered from ClinVar, OMIM, Locus Specific Databases, GWAS and COSMIC.
VAAST Rank Numerical rank VAAST assigns based on the VAAST Gene Score and p-value.
VAAST V-Score VAAST uses a ‘burden test’ to score genes. This means that it scores each individual variant in a gene based both upon:
  • The nature of the amino-acid change induced by the variant (if any)
  • The relative enrichment of the frequency in the cases versus controls (frequency)

For example, a variant producing a non-conservative substitution, which is found in every case, but never, or only rarely in the controls will receive a high-score. High scoring variants are thus both damaging and overrepresented in the cases versus controls.

VAAST G-Score

The VAAST Gene Score is the sum of the variant scores for every variant in the gene. Again, the larger the VAAST gene score, the more likely that gene is to be damaged in the case genomes.

The VAAST p-value, displayed underneath the VAAST Gene Score, tells you the probability of observing that gene score by random chance given your case-control dataset. Some genes naturally contain many rare, weakly damaging alleles; thus a high VAAST gene-score is not necessary unexpected. When prioritizing candidate genes, focus on those with a high VAAST score, and low VAAST p-value.

The number of genomes in the target/background CDR constrains the lowest achievable significance level. If VAAST did not observe a single permutation with a CLRT score higher than the actual CLRT score, it will report the lowest possible p-value bound by the number of target/background genomes.

Filtering Variants by Inheritance Model

At the top of the report, the modes of inheritance buttons allow you to filter your results by different inheritance model:

• “Recessive” filters data to show only recessive variants

• “De Novo” filters data to show only de novo variants (the Solo Report provides a “Dominant” filter)

• “X-Linked” filters data to show only X chromosome variants (family reports with male probands only)

Only modes of inheritance with results will be shown. For instance if there are no X-linked variants in the VAAST results, the x-linked button will not be shown.

Note: if a parent has a hemizygous no-call for a variant that is present in the proband, it will still be scored for the Recessive mode of inheritance. In this case, the VAAST report will show that the parent does not have the variant, but the proband will either be homozygous for the variant or compound het for the gene.

Filtering VAAST Reports

VAAST's algorithmic ranking is complemented with the same rich filtering capabilities available in Variant Miner. See Filtering in Variant Miner for more information. 

Export

Once you are satisfied with the data displayed in the VAAST report, it can be exported as a CSV file by clicking the “Export Report” button.

VAAST Viewer

In addition, you can display VAAST Viewer for the current report by clicking the VAAST Viewer button at the top right of the table. VAAST Viewer shows the distribution of VAAST P-value across the genome in a plot similar to the so-called “Manhattan” plots commonly used to depict P-values in GWAS studies. We have implemented this familiar view, each chromosome clearly separated from the adjacent ones by the shaded pattern. The gene symbols of the top 10 genes are shown. Clicking on the dot representing the gene will display the gene symbol and associated P-value.

If the Viewer is launched while Polymorphic filters are active, the Viewer will filter out all Polymorphic genes.

The Viewer also provides the ability to zoom into a particular region of the plot by clicking and dragging. The inset graph at the bottom right continues to show the complete plot for context. To change the zoom or return to the fill view, click and drag in the small inset graph.

VAAST Viewer window

VAAST Cohort and Case-Control Workflows

Opal also provides preconfigured workflows for Cohort and Case-Control analysis.

To start an analysis:

  1. Select the VAAST Cohort or Case-Control Analysis workflow within a Project from the Launch dropdown menu, or from within the Opal App Store.
  2. Select the genomes to include in your cohort or case group.
  3. For the Case-Control, select the genomes you want to include in your Control group in the lower window.
  4. Select the background. For the Case-Control, you can choose to use only your control group, or to add your control group to the default 1000 Genomes background.
  5. Select whether you want to include indels, and which modes of inheritance you want to run.
  6. Click on “Run” to submit the job for execution by the VAAST pipeline.

The VAAST Cohort report is similar to the family reports, with two additional columns providing the number of genomes the variant was observed as heterozygous genotype, and the number as the homozygous genotype. In these columns, you can click on the blue link for a list of the genomes with a specific variant, to enable further analysis.

For the Case-Control, the data in the genotype columns is noted as “XX/YY”, where XX is the number of genomes in the case group, and YY is the number of genomes in the control group.

Phevor

Phevor enables you to use phenotype information to rank disease gene candidates. Phevor re-ranks the disease-causing genes from a VAAST analysis using phenotype information. A VAAST analysis needs to have been run prior to using Phevor.

To run Phevor:

  1. Within a Project, select Phevor from the Launch App dropdown menu
  2. Select a completed VAAST report
  3. In the Launch Phevor Analysis page, enter one or more phenotype terms.You can search on HPO1 terms, synonyms and definitions. You will be prompted with HPO matches as you enter terms.
    • Use specific terms (i.e. dilated cardiomyopathy instead of cardiomyopathy) and try to limit your search to five or fewer terms. Larger numbers of terms or general terms can reduce Phevor’s differential scoring of genes
    • HPO terms from the “Phenotypic Abnormality” subontology are used
  1. As you enter terms, details on the most recently entered HPO term will be provided below the search term box. If you click on a more specific sub-node term, it will replace the higher level HPO term.
  2. Run Phevor. Phevor will perform the analysis in real time.

Once the Phevor analysis has completed, you will have a report with five columns:

  • Phevor Rank
  • Gene Symbol
  • Phevor Score: Combined score of VAAST and Phevor Phenotype-Gene association. The Phevor Score is logarithmic, so the separation between scores is important. For instance, 3.0 is ten times larger than 2.0, and 100 times larger than 1.0. Further, Phevor scores less than 1 represent very small values with scores below 0 being insignificant. Several Phevor Scores may group together in a “plateau”. It is often useful to consider all of the candidate genes before the first plateau of scores, then go to the next plateau, etc.
  • Phenotype-Gene Association: score assigned to a gene by Phevor based on strength of association of the phenotype to the gene. The closer to one, the higher the association.
  • VAAST p-Value

 

You can review different modes of inheritance using the buttons to the top right of the table.

You can save the Phevor report, or Re-launch it using the buttons at the top right of the page. Re-launching the report will pre-populate the phenotype terms used for the previous report.

====

1Sebastian Köhler, Sandra C Doelken, Christopher J. Mungall, Sebastian Bauer, Helen V. Firth, et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucl. Acids Res. (1 January 2014) 42 (D1): D966-D974 doi:10.1093/nar/gkt1026</p>

Have more questions? Submit a request

Comments