Appendix 2. Omicia Variant Impact Score

Charlene Son-Rigby -

The Omicia Score is a proprietary score that assesses whether a variant is likely to be deleterious. It is a meta-classifier that combines scores from the following variant scoring algorithms:

  • SIFT (Ng et al. 2001)
  • PolyPhen (Adzhubei et al. 2010)
  • MutationTaster (Schwarz et al. 2010)
  • PhyloP (Siepel et al. 2006)

By combining these scores, the Omicia Score synthesizes into a single convenient score the information from these disparate algorithms to produce more accurate predictions on a wider range of variants than achieved by any of these algorithms in isolation (Coonrod et al, 2013). The underlying methodology for the Omicia score is a "bagging" approach that combines the outputs of weaker "random tree" classifiers (Breiman, 2001).

The Omicia score ranges from 0 to 1. An Omicia score of less than 0.5 would indicate that a variant is likely benign, with higher confidence at values closer to 0. On the other hand, scores greater than 0.5 suggest that a variant is likely to be damaging or deleterious, with higher confidence at values closer to 1.

Notes:

Sometimes there are no individual values for any of the underlying scores and the boxes are grey, yet an Omicia score is still displayed.  For variants that are not SNVs:

  • An Omicia score of 0.8 is assigned if the length of variant is greater than or equal to 25 base pairs.
  • An Omicia score of 0.8 is assigned based on effect; that is, if the consequence of the variant is on the severe list, including the following:
    • Stop-gained
    • Stop-lost
    • Frame-shift deletion/insertion/substitution
    • Splice-site [AG/GT]
    • Transcript ablation
  • If the variant is less than 25 base pairs long, and the effect is not on the severe list (for example non-protein changing effects), then the variant is assigned an Omicia score of 0.5. 

Reasons why underlying scores are not calculated can include the following factors:

  • Most of the scores were designed to be run on coding SNPs and cannot be run on indels or intronic regions.
  • Each individual score has different parameters.  When a box is grey, it usually means that the type of variant does not meet the scoring parameters for that particular impact score.

The Omicia Score has been retrained as part of the Opal Annotation Engine v4 release. This training set used 12,000 HGMD disease-causing mutations as a damaging set, and 11,968 polymorphisms with >5% frequency from the 1000 Genomes project as a benign set. We fit a random forest classifier using a set of 12,000 Disease Causing (DM) SNV's from HGMD as positives, and a set of 11,968 SNV's with frequency 5% or greater from the 1000 Genomes Project as negatives. A further 6,000 SNV's from each source were set aside for validation. A ROC curve of the performance of the Omicia score as compared to the other individual impact score algorithms is shown below.

ROC curve of the performance of different variant impact assessment algorithms in 12,000 HGMD disease-causing mutations.

The following table provides the false positive rates (FPR) at certain Omicia Score cutoff values:

 

Omicia Score Cutoff FPR
0.5 16%
0.7 7.9%
0.79 5%
0.85 2.8%
0.93 1%

 

These rates were found using the test data set described above.

 

References

Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P., Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting damaging missense mutations. Nat Meth 7, 248–249.

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.

Coonrod, E., Magraf, R., Russell, A., Voelkerding, K. , and Reese, M.G. (2013). Clinical analysis of next-generation sequencing data using the Omicia platform. Expert Rev. Mol Diagn. 13(6), 529-540.

Kumar, P., Henikoff, S., & Ng, P. C. (2009). Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature protocols, 4(7), 1073-1081.

Schwarz, J.M., Rödelsperger, C., Schuelke, M., and Seelow, D. (2010). MutationTaster evaluates disease-causing potential of sequence alterations. Nat Meth 7, 575–576.

Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., & Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome research, 20(1), 110-121.

Have more questions? Submit a request

Comments