Skip to Content

Wenyi Wang, PhD

Present Title & Affiliation

Primary Appointment

Associate Professor, Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX
Assistant Professor, Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX
Biostatistics, Bioinformatics and System Biology Program Co-director, The University of Texas Graduate School of Biomedical Sciences at Houston, Houston, TX

Dual/Joint/Adjunct Appointment

Adjunct Faculty, Statistics, Texas A&M, College Station, TX

Bio Statement

Dr. Wang had formal training in both statistical bioinformatics and basic science research.  She received Ph.D. training from the Department of Biostatistics at Johns Hopkins University. Her PhD thesis (advisor, Dr. Giovanni Parmigiani) is about statistical methods for cancer risk assessment and copy number estimation (with Dr. Rafael Irizarry). As a postdoctoral fellow at both Stanford Genome Technology Center (advisor, Dr. Ron Davis) and UC Berkeley Department of Statistics (advisor, Dr. Terry Speed), she completed three years of research on statistical methods for analyzing high-throughput sequencing data, where she developed a new and improved analysis tool for rare variant calling with resequencing arrays. More information about her lab is available here

Research Interests

Dr. Wang's research is motivated by large-scale complex data sets in recent genomic and familial studies and by important biological questions that emerge from the analysis of these data. Her current interests can be divided into two parts: 1) Development of methods and software for the accurate measurement of high-throughput genomic data; 2) Development and validation of statistical approaches and software for personalized cancer risk prediction.

It is non-trivial to extract genomic information of interest from the raw signals that come directly from chemical or physical reactions. Current high-throughput technologies have all inevitably incorporated multi-level confounders that affect the observed signals. The large amount of data they produce also make it difficult to calibrate these technologies using "gold standards", usually generated by experiments that are more accurate but are low-throughput and expensive. My work in this part is focused on the accurate interpretation of raw high-throughput signals using statistical modeling. Currently, I have worked with high-throughput data measuring copy number, single nucleotide variants and alternative splicing. 

Cancer results from accumulation of multiple genetic mutations. Germline mutation of a cancer gene predisposes the carrier to the development of cancer, known as "inherited susceptibility". This inheritance results in familial clustering of cancers, known as "familial cancer syndromes". Clinical researchers utilize model-based prediction algorithms to identify cancer patients at earlier and more treatable stages and/or to identify healthy individuals at high risk of developing cancer in future. As a result, Mendelian carrier probability models are based on Bayesian methods using detailed family history as input,and have shown performances better than empirical models using regression or classification trees alone. My work in this part is focused on a) applying Mendelian models to cancers of interest for personalized risk assessment and b) developing methodologies for evaluation of risk assessment models using family and correlated data. 

Education & Training

Degree-Granting Education

2007 Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, PHD, Biostatistics
2003 Columbia University College of Physicians and Surgeons, New York City, NY, MA, Human Nutrition
2001 Fudan University, Shanghai, China, BS, Honor Science Program, Biology

Honors and Awards

2014 Outstanding service to graduate education, The University of Texas Graduate School of Biomedical Sciences at Houston
2011 The Stellar Abstract Award, The 5th Annual Program in Quantitative Genomics, Harvard School of Public Health
2008 Delta Omega Alpha Inducted Member, Johns Hopkins Bloomberg School of Public Health
2008 Phi Beta Kappa Inducted Member, Johns Hopkins University Chapter of Phi Beta Kappa
2008 The Jane and Steve Dykacz Award, Johns Hopkins University, Baltimore, MD
2007 Travel Award, The 11th International Conference on Research in Computational and Molecular Biology
2006 Travel Award, The International Genetic Epidemiology Society 15th Annual Meeting
2005 The June B. Culley Award, Johns Hopkins University, Baltimore, MD
1997-2001 People's Scholarship, Fudan Univeristy, Shanghai, China
1994-2001 Honor Science Program, Fundan University, Shanghai, China

Selected Publications

Peer-Reviewed Original Research Articles

1. Lefterova MI*, Shen P*, Odegaard JI*, Fung E, Chiang T, Peng G, Davis RW, Wang W, Schrijver I, Scharfe C. Next-generation molecular testing of newborn dried blood spots for cystic fibrosis. Journal of Molecular Diagnostics. In Press.
2. Nikooienejad A, Wang W*, Johnson VE*. Bayesian variable selection for binary outcomes in high dimensional genomic studies using non-local priors. Bioinformatics, doi: 10.1093/bioinformatics/btv764. e-Pub 1/2016.
3. Palculict TB, Ruteshouser EC, Fan Y, Wang W, Strong L, Huff V. Identification of germline DICER1 mutations and loss of heterozygosity in familial Wilms tumor using whole genome sequenc- ing. Journal of Medical Genetics. e-Pub 11/2015.
4. Fang LT, Afshar PT, Chhibber A,Mohiyuddin M, Fan Y, Mu J, Gibeling G, Barr S, Asadi NB, Gerstein M, Koboldt D, Wang W, Wong WH, Lam H. An ensemble approach to accurately detect so- matic mutations using SomaticSeq. Genome Biology. e-Pub 9/2015.
5. Ewing AD, Houlahan KE, Hu Y, Ellrott K, Caloian C, Yamaguchi TN, Bare JC, P’ng C, Waggott D, Sabelnykova VY; ICGC-TCGA DREAM Somatic Mutation Calling Challenge participants, Kellen MR, Norman TC, Haussler D, Friend SH, Stolovitzky G, Margolin AA, Stuart JM, Boutros PC. Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide- variant detection. Nature Methods. e-Pub 7/2015.
6. Peng G, Fan Y, Wang W. FamSeq: a variant calling program for family-based sequencing data using graphics processing units. PLOS Comput Biol. e-Pub 10/2014. PMID: 25357123.
7. Davis CF et al., The Cancer Genome Atlas Research Network.. The Somatic Genomic Landscape of Chromophobe Renal Cell Carcinoma. Cancer Cell. e-Pub 8/2014.
8. Ahn J, Liu S, Wang W*, Yuan Y*. Bayesian latent-class mixed-effect hybrid models for dyadic longitudinal data with non-ignorable dropouts. Biometrics 69(4):914-24, 12/2013. e-Pub 11/2013. NIHMSID: NIHMS56830.
9. Cancer Genome Atlas Research Network, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet 45(10):1113-20, 10/2013. PMCID: PMC3919969.
10. Ahn J, Yuan Y, Parmigiani G, Suraokar MB, Diao L, Wistuba II, Wang W. DeMix: deconvolution for mixed cancer transcriptomes using raw measured data. Bioinformatics 29(15):1865-71, 8/2013. e-Pub 5/2013. PMCID: PMC3841439.
11. Peng G, Fan Y, Palculict TB, Shen P, Ruteshouser EC, Chi AK, Davis RW, Huff V, Scharfe C, Wang W. Rare variant detection using family-based sequencing analysis. Proc Natl Acad Sci U S A 110(10):3985-90, 3/2013. e-Pub 2/2013. PMCID: PMC3593912.
12. Srivastava S, Wang W, Manyam G, Ordonez C, Baladandayuthapani V. Integrating Multi-Platform Genomic Data Using Hierarchical Bayesian Relevance Vector Machines. EURASIP J Bioinform Syst Biol 2013(1):9, doi:10.1186/1687-4153-2013-9, 2013. e-Pub 6/2013. PMCID: PMC3726335.
13. Shen P*, Wang W* , Chi AK, Fan Y, Davis RW and Scharfe C. Multiplex target capture with long padlock probes. Genome Medicine(5). e-Pub 2013. PMCID: PMC370973.
14. Hua Y, Gorshkov K, Yang Y, Wang W, Zhang N, Hughes DP. Slow down to stay alive: HER4 protects against cellular stress and confers chemoresistance in neuroblastoma. Cancer 118(20):5140-54, 10/2012. e-Pub 3/2012. PMCID: PMC3414637.
15. Zhang N, Xu Y, O'Hely M, Speed TP, Scharfe C, Wang W. SRMA: an R package for resequencing array data analysis. Bioinformatics 28(14):1928-30, 7/2012. e-Pub 5/2012. PMCID: PMC3389772.
16. Wilkins EJ, Rubio JP, Kotschet KE, Cowie TF, Boon WC, O'Hely M, Burfoot R, Wang W, Sue CM, Speed TP, Stankovitch J, Horne MK. A DNA Resequencing Array for Genes Involved in Parkinson's Disease. Parkinsonism Relat Disord 18(4):386-90, 5/2012. e-Pub 1/2012. PMID: 22243833.
17. Shen P*, Wang W*, Krishnakumar S, Palm C, Chi AK, Enns GM, Davis RW, Speed TP, Mindrinos MN, Scharfe C. High quality DNA sequence capture of 524 disease candidate genes. Proc Natl Acad Sci U S A 108(16):6549-54, 4/2011. PMCID: PMC3080966.
18. Wang W, Shen P, Thiyagarajan S, Lin S, Palm C, Horvath R, Klopstock T, Cutler D, Pique L, Schrijver I, Davis RW, Mindrinos M, Speed TP, Scharfe C. Identification of Rare DNA Variants in Mitochondrial Disorders with Improved Array-based Sequencing. Nucleic Acids Res 39(1):doi: 10.1093/nar/gkq750, 1/2011. PMCID: PMC3017602.
19. Wang W, Niendorf KB, Patel D, Blackford A, Marroni F, Sober AJ, Parmigiani G, Tsao H. Estimating CDKN2A carrier probability and personalizing cancer risk assessments in hereditary melanoma using MelaPRO. Cancer Res 70(2):552-9, 1/2010. e-Pub 1/2010. PMCID: PMC2947347.
20. Lin S*, Wang W*, Palm C, Davis RW, Juneau K. A Molecular Inversion Probe Assay for Detecting Alternative Splicing. BMC Genomics 11:712, 2010. e-Pub 12/2010. PMCID: PMC3022918.
21. Wang W, Carvalho B, Miller ND, Pevsner J, Chakravarti A, Irizarry RA. Estimating genome-wide copy number using allele-specific mixture models. J Comput Biol 15(7):857-66, 9/2008. PMCID: PMC2612042.
22. Wang W, Chen S, Brune KA, Hruban RH, Parmigiani G, Klein AP. PancPRO: risk assessment for individuals with a family history of pancreatic cancer. J Clin Oncol 25(11):1417-22, 4/2007. PMCID: PMC2267288.
23. Nicodemus KK, Wang W, Shugart YY. Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene x gene and gene x environment interactions. BMC Proc 1 Suppl 1:S58, 2007. e-Pub 12/2007. PMCID: PMC2367584.
24. González JR, Wang W, Ballana E, Estivill X. A recessive Mendelian model to predict carrier probabilities of DFNB1 for nonsyndromic deafness. Hum Mutat 27(11):1135-42, 11/2006. PMCID: PMC2268028.
25. Chen S, Wang W, Lee S, Nafa K, Lee J, Romans K, Watson P, Gruber SB, Euhus D, Kinzler KW, Jass J, Gallinger S, Lindor NM, Casey G, Ellis N, Giardiello FM, Offit K, Parmigiani G, Colon Cancer Family Registry. Prediction of germline mutations and cancer risk in the Lynch syndrome. JAMA 296(12):1479-87, 9/2006. PMCID: PMC2538673.
26. Xu Z, Sproul A, Wang W, Kukekov N, Greene LA. Siah1 interacts with the scaffold protein POSH to promote JNK activation and apoptosis. J Biol Chem 281(1):303-12, 1/2006. e-Pub 10/2005. PMID: 16230351.
27. Chen S, Wang W, Broman KW, Katki HA, Parmigiani G. BayesMendel: an R environment for Mendelian risk prediction. Stat Appl Genet Mol Biol 3:Article21, 2004. e-Pub 9/2004. PMCID: PMC2274007.

Grant & Contract Support

Title: Cancer risk in LI Fraumeni syndrome (LFS) kindreds in regions of high founder mutation prevalence and regions of low prevalence in absence of founder as determined by LFSpro
Funding Source: MD Anderson Cancer Center The Sister Institution Network Fund
Role: Co-Principal Investigator
Principal Investigator: Louise Strong
Duration: 1/1/2015 - 12/31/2016
Title: Statistical methods for genomic analysis of heterogeneous tumors
Funding Source: NIH/NCI
Role: Principal Investigator
Duration: 9/24/2014 - 8/31/2019
Title: Developing New Rational, Personalized Medicine for Lung Cancer Based on Understanding of Lung Cancer Molecular and Cellular Biology
Funding Source: NIH/NCI (Subcontract from University of Texas Southwestern Medical Center
Role: Co-Investigator
Principal Investigator: John Minna
Duration: 9/1/2014 - 8/31/2019
Title: GCC/Keck Center's Computational Cancer Biology Training Program
Funding Source: Cancer Prevention Institute of Research (subcontract from University of Houston)
Role: Principal Investigator-MDACC
Principal Investigator: Rathindra Bose
Duration: 5/1/2014 - 4/30/2016
Title: Core C UT Spore in Lung Cancer
Funding Source: NIH/NCI (subcontract from University of Texas Southwestern Medical Center)
Role: Co-Investigator
Principal Investigator: John Minna
Duration: 9/12/2013 - 8/31/2014
Title: An Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC)
Funding Source: NIH/NCI
Role: Co-Investigator
Principal Investigator: John N. Weinstein
Duration: 8/1/2013 - 7/31/2014
Title: Personalized risk assessment for families with Li-Fraumeni Syndrome
Funding Source: Cancer Prevention & Research Institute of Texas (CPRIT)
Role: Principal Investigator
Duration: 6/1/2013 - 5/31/2016
Title: Bioinformatics tools for genomic analysis of tumor and stromal pathways in cancer
Funding Source: NIH/NCI (Subcontract from Dana Farber Cancer Institute)
Role: Principal Investigator-MDACC
Principal Investigator: Giovanni Parmigiani
Duration: 2/1/2013 - 1/31/2018
Title: Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC) Supplement
Funding Source: NIH/NCI
Role: Co-Investigator
Principal Investigator: John N. Weinstein
Duration: 9/1/2010 - 4/30/2012
Title: Next-Generation Genomic Sequence Identification of the 19q Familial Wilms Tumor Predisposition Gene
Funding Source: Cancer Prevention & Research Institute of Texas (CPRIT)
Role: Co-Investigator
Principal Investigator: Vicky Huff
Duration: 5/1/2010 - 4/30/2013
Title: An Integrative Pipeline for Analysis & Translational Application of TCGA Data (GDAC)
Funding Source: NIH/NCI
Role: Co-Investigator
Principal Investigator: John N. Weinstein
Duration: 9/29/2009 - 7/31/2012

Last updated: 1/5/2016