
Michael Kane
Department of Lymphoma - Myeloma, Division of Cancer Medicine
Present Title & Affiliation
Primary Appointment
Associate Professor, Department of Lymphoma and Myeloma, Division of Cancer Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas
Adjunct Associate Professor, Department of Biostatistics, Yale University, New Haven, CT
Education & Training
Degree-Granting Education
2010 | Yale University, New Haven, Connecticut, US, Statistics, PhD |
2007 | Yale University, New Haven, Connecticut, US, Statistics, MA |
2003 | Rochester Institute of Technology, Rochester, New York, US, Electrical Engineering with Emphasis in Signal and Image Processing, MS |
2000 | Rochester Institute of Technology, Rochester, NY, US, Computer Engineering, Bachelor of Science |
Experience & Service
Administrative Appointments/Responsibilities
Director, Department of Biostatistics Data Science Pathway, Yale School of Public Health, New Haven, Connecticut, 2021 - 2024
Other Professional Positions
Affiliate, Yale University, New Haven, CT, 2022 - Present
Scientific Advisor and Co-Founder, Telperian Inc, Austin, Texas, 2018 - Present
Assistant Professor, Yale School of Public Health, New Haven, Connecticut, 2016 - 2023
Associate Research Scientist, Yale University, New Haven, Connecticut, 2010 - 2016
PhD Intern, Revolution Computing, New Haven, Connecticut, 2009 - 2009
PhD Intern, Barclays Capital, New York, New York, 2008 - 2008
Research Scientist, Kodak Commercial and Government Systems, Rochester, New York, 2001 - 2003
Software Development Engineer, Lenel Systems, Pittsford, New York, 2000 - 2001
Editorial Activities
Editorial Advisory Board Member, The R Journal, 2021 - Present
Associate Editor, The R Journal, 2021 - Present
Editor-in-Chief, The R Journal, 2020 - 2021
Associate Editor of Reproducibility, The Journal of the American Statistical Association, 2019 - Present
Associate Editor, The Journal of Statistical Software, 2014 - Present
Honors & Awards
2010 | The American Statistics Association’s John M. Chambers Statistical Software Award |
2009 | Yale University Dissertation Fellowship |
2007 | Yale University Teaching Fellowship |
2005 | Yale University Fellowship |
1997 | Fred Emerson Foundation Scholarship |
Professional Memberships
Selected Presentations & Talks
Regional Presentations
- 2024. Using LLM ICD-10 Embeddings in the Analysis of Health Data. Invited. Seattle, WA, US.
- 2022. “A programmatic approach to pre-revenue pharmacofinance and clinical trial planning.”. Invited. New York R Meetup. New York, New York, US.
- 2018. “Generalized function composition and pipe construction with the fc package.”. Invited. Cleveland R Meetup. Cleveland, Ohio, US.
- 2018. “Exposed in Connecticut.”. Invited. Noreast’R Conference. Providence, Road Island, US.
- 2017. “A First Look at Using Human Mobility Data to Assess Community Resilience.”. Invited. The New York R Conference. New York, New York, US.
- 2017. “Tools and Techniques for Applying Human Mobility Data to Challenges in Public Health.”. Invited. The New England Statistical Symposium. Storrs, Connecticut, US.
- 2016. “Big Data in R.”. The New England Statistical Symposium. Storrs, Connecticut, US.
- 2016. “Writing Regression Routines in R.”. Invited. The Boston R User Group. Boston, Massachusetts, US.
- 2015. Practical Principles for Scalable Statistical Analysis. Invited. The New York R Conference. New York, New York, US.
National Presentations
- 2019. “Applied Distribution Learning with Applications in Biomedicine.”. Invited. North Carolina State University Seminar. Raleigh, North Carolina, US.
- 2019. “Graphical Model Approaches to Integrating Cell Tower and Spatial Data.”. Invited. The New England Statistical Symposium. Hartford, CT, US.
- 2019. “Considerations for Basket Trial Design under Multisource Exchangeability Assumptions.”. Invited. International Chinese Statistical Association Conference. Raleigh, North Carolina, US.
- 2019. “Bootstrap Learning Python with R.”. Invited. Joint Statistical Meetings Day-Long Course. Denver, Colorado, US.
- 2019. “A Deliberate Approach to Augmenting Radiomics Pipelines for Oncology.”. Invited. Amgen Invited Talk. Thousand Oaks, Californa, US.
- 2018. “Drug development of non-cytotoxics: Subtype Identification and Strategies for Trial Design.”. Invited. Amgen Invited Short Course. Thousand Oaks, California, US.
- 2018. “Estimating Environmental Exposure using Cell Tower Data.”. Invited. AT&T Labs Research Technical Seminar, New York, NY, 2018. New York, New York, US.
- 2018. “Applied Distribution Learning with Sets of Data.”. Invited. The University of Massachusetts Amherst Mathematics and Statistics Seminar. Amherst, Massachusetts, US.
- 2018. “Applications of Distribution Learning Applications in the Medical Sciences.”. Invited. The Cleveland Clinic Department of Quantitative Health Sciences Seminar. Cleveland, Ohio, US.
- 2016. “Moving Toward a Concurrent Computing Grammar.”. Invited. Directions in Statistical Computing. Palo Alto, California, US.
- 2015. “Bigmemory and the Upcoming Storage Singularity.”. Invited. Directions in Statistical Computing. Palo Alto, California, US.
- 2014. “Productizing Your Statistical Analysis.”. Invited. Conference on Statistical Practice. New Orlenas, Louisana, US.
- 2014. “The Big Data Analytics Landscape with Applications in Augmented Knowledge Discovery.”. Invited. Bill and Melinda Gates Foundation Seminar. Seattle, Washington, US.
- 2014. “An Exploratory Analysis of Text Trends in the G77 (and the rest of the U.N.).”. Invited. A The United Nations Tech Event. New York, New York, US.
Grant & Contract Support
Date: | 2023 - 2024 |
Title: | External Controls for Clinical Trials |
Funding Source: | Takeda |
Role: | Co-PI |
ID: | Research Collaborative Agreement |
Date: | 2022 - 2024 |
Title: | Machine Learning of Biomarkers from Wearable Devices |
Funding Source: | Takeda |
Role: | PI |
ID: | Research Collaborative Agreement |
Date: | 2022 - Present |
Title: | Enhancing the Efficiency of Pragmatic Clinical Trials Using Administrative Data: Analysis of the STRIDE Study |
Funding Source: | NIA |
Role: | Co-I |
ID: | 1R01AG071528-01A1 |
Date: | 2020 - Present |
Title: | Collaborative Research: HNDS-I: Data Resources and Analytic Tools to Understand Population Scale Human Mobility for Applications in Social, Behavioral, and Economic Sciences |
Funding Source: | NSF |
Role: | PI |
ID: | 2024335 |
Date: | 2020 - 2022 |
Title: | Methods for Understanding Clinical Trial Patient Enrollment Heterogeneity |
Funding Source: | Amgen |
Role: | PI |
ID: | Research Collaborative Agreement |
Date: | 2020 - 2022 |
Title: | Yale Connecticut DPH Data Integration and Analytics Collaboration DPH 29582 |
Funding Source: | Connecticut Department of Public Health |
Role: | Co-I |
ID: | DPJ2021-0071 |
Date: | 2019 - 2023 |
Title: | Understanding the role of EGFR Exon19 Expression in Lung Cancer |
Funding Source: | Boehringer Ingelheim |
Role: | Co-I |
ID: | Research Collaborative Agreement |
Date: | 2018 - 2019 |
Title: | Yale Comprehensive Cancer Center Award |
Funding Source: | NIH |
Role: | Faculty |
ID: | P30 CA 016359-33S3 |
Date: | 2016 - Present |
Title: | CTSA |
Funding Source: | NCATS |
Role: | faculty |
ID: | 5UL1TR001863 |
Date: | 2016 - 2018 |
Title: | Expansion of Methods for Two-Stage Trial Designs for Testing Treatment, Self-Selection and Treatment Preference Effects |
Funding Source: | PCORI |
Role: | Co-I |
ID: | HSRP20164106 |
Date: | 2013 - 2015 |
Title: | Semantic clustering of human-animal medical corpuses |
Funding Source: | The Bill and Melinda Gates Foundation |
Role: | PI |
ID: | Grand Challenges Round 11 |
Date: | 2012 - 2016 |
Title: | ECON: Elastic Computational Numerics |
Funding Source: | DARPA |
Role: | Co-PI |
ID: | FA8750-12-C-0324 |
Date: | 2012 - 2013 |
Title: | Animal and Human Sentinels of Environmental Health Hazards from Natural Gas Extraction (Hydraulic Fracturing) |
Funding Source: | Peter Rabinowitz, M.D |
Role: | Biostatistician |
ID: | Unknown |
Date: | 2011 - 2012 |
Title: | Zoonotic Risk Prediction Tool for the Extractive Industries |
Funding Source: | USAID/UC Davis/WCS |
Role: | Biostatistician |
ID: | PREDICT |
Date: | 2011 - 2012 |
Title: | Daily Exposure Monitoring Intervention to Prevent Hearing Loss |
Funding Source: | CDC/NIOSH |
Role: | Biostatistician |
ID: | R01OH008641-03 |
Date: | 2011 - 2015 |
Title: | Control of Influenza Transmission in Swine Facilities |
Funding Source: | The National Pork Board |
Role: | Biostatistician |
ID: | Unknown |
Date: | 2011 - 2015 |
Title: | Modeling H5N1 Outbreaks in Egypt |
Funding Source: | USAID/UC Davis/WCS |
Role: | Biostatistician |
ID: | PREDICT |
Date: | 2011 - 2015 |
Title: | Assessing Hearing Conservation Effectiveness |
Funding Source: | Assessing Hearing Conservation Effectiveness |
Role: | Biostatistician |
ID: | R01 OH 010132-4 |
Selected Publications
Peer-Reviewed Articles
- Ganz DA, Greene EJ, Latham NK, Kane M, Min LC, Gill TM, Reuben DB, Peduzzi P, Esserman D. Endpoint assessment via routinely collected data generates estimates comparable to randomized controlled trial data: analysis of a cluster-randomized trial on fall injury prevention. J Clin Epidemiol 181:111718, 2025. e-Pub 2025. PMID: 39938700.
- Kane, MJ, Gilani, O, Khusainova, E, Urbanek, S. Identifying loci in mobility networks with applications in New Zealand work commutes. Applied Network Science 9(1), 2024. e-Pub 2024.
- Kane M, Gilani O, Khusainova E, Urbanek S. Identifying loci in mobility networks with applications in New Zealand work commutes: a statistical test for identifying extreme stationary distribution values in Markov transition matrices. Applied Network Science 9(57), 2024. e-Pub 2024.
- Yu D, Kane M, Koay E, Wistuba I, Hobbs B. Machine learning identifies prognostic subtypes of the tumor microenvironment of NSCLC. Scientific Reports 14(1):15004, 2024. e-Pub 2024.
- Ganz, DA, Esserman, DA, Latham, NK, Kane, MJ, Min, L, Gill, TM, Reuben, DB, Peduzzi, P, Greene, EJ. Validation of a Rule-Based ICD-10-CM Algorithm to Detect Fall Injuries in Medicare Data. Journals of Gerontology - Series A Biological Sciences and Medical Sciences 79(7), 2024. e-Pub 2024. PMID: 38566617.
- Esserman, DA, Greene, EJ, Latham, NK, Kane, MJ, Lu, C, Peduzzi, P, Gill, TM, Ganz, DA. Assessing readiness to use electronic health record data for outcome ascertainment in clinical trials – A case study. Contemporary Clinical Trials 142, 2024. e-Pub 2024. PMID: 38740298.
- Wei, W, Blaha, O, Esserman, DA, Zelterman, D, Kane, MJ, Liu, R, Lin, J. A Bayesian platform trial design with hybrid control based on multisource exchangeability modelling. Statistics in Medicine 43(12):2439-2451, 2024. e-Pub 2024. PMID: 38594809.
- Kane, MJ, King, C, Esserman, DA, Latham, NK, Greene, EJ, Ganz, DA. A compressed large language model embedding dataset of ICD 10 CM descriptions. BMC bioinformatics 24(1), 2023. e-Pub 2023. PMID: 38105180.
- Athreya, A, Lubberts, Z, Priebe, CE, Park, Y, Tang, M, Lyzinski, V, Kane, MJ, Lewis, B. Numerical Tolerance for Spectral Decompositions of Random Matrices and Applications to Network Inference. Journal of Computational and Graphical Statistics 32(1):145-156, 2023. e-Pub 2023.
- Yu, C, Blaha, O, Kane, MJ, Wei, W, Esserman, DA, Zelterman, D. Regression methods for the appearances of extremes in climate data. Environmetrics 33(7), 2022. e-Pub 2022.
- Liu, Y, Kane, MJ, Esserman, DA, Blaha, O, Zelterman, D, Wei, W. Bayesian local exchangeability design for phase II basket trials. Statistics in Medicine 41(22):4367-4384, 2022. e-Pub 2022. PMID: 35777367.
- Zabor, EC, Kane, MJ, Roychoudhury, S, Nie, L, Hobbs, BP. Bayesian basket trial design with false-discovery rate control. Clinical Trials 19(3):297-306, 2022. e-Pub 2022. PMID: 35128970.
- Kane, MJ, Jiang, X, Urbanek, S. On the Programmatic Generation of Reproducible Documents. Journal of Statistical Software 103(8), 2022. e-Pub 2022.
- Zabor, EC, Hobbs, BP, Kane, MJ. ppseq. R Journal 14(4):280-290, 2022. e-Pub 2022.
- Wei, W, Esserman, DA, Kane, MJ, Zelterman, D. Unified exact design with early stopping rules for single arm clinical trials with multiple endpoints. Statistical Methods in Medical Research 30(7):1575-1588, 2021. e-Pub 2021. PMID: 34159859.
- Atkinson, EJ, Higgins, PD, Esserman, DA, Kane, MJ, Schwager, SJ, Rickert, JB, Mark, D, Alexeev, M, Kadauke, S. R Medicine 2020. R Journal 13(1):642-647, 2021. e-Pub 2021.
- Kane, MJ, Gilani, O. The need to incorporate communities in compartmental models. Statistics and its Interface 14(1):29-32, 2021. e-Pub 2021.
- Kane, MJ, Jiang, X, Urbanek, S. Automating Reproducible, Collaborative Clinical Trial Document Generation with the listdown Package. R Journal 13(1):556-562, 2021. e-Pub 2021.
- Kane, MJ. Towards a Grammar for Processing Clinical Trial Data. R Journal 13(1):563-569, 2021. e-Pub 2021.
- Li X, Kane M, Zhang Y, Sun W, Song Y, Dong S, Lin Q, Zhu Q, Jiang F, Zhao H. Circadian Rhythm Analysis Using Wearable Device Data: A Novel Penalized Machine Learning Approach. Journal of Medical Internet Research 23(10), 2020. e-Pub 2020. PMID: 34647895.
Other Articles
- Wrobel, J, Hector, EC, Crawford, L, McGowan, LD, da Silva, N, Goldsmith, J, Hicks, S, Kane, MJ, Lee, Y, Mayrink, V, Paciorek, CJ, Usher, T, Wolfson, J Partnering With Authors to Enhance Reproducibility at JASA. Journal of the American Statistical Association 119(546):795-797, 2024.
- Kane M, Chen N, Kaizer AM, Jiang X, Xia HA, Hobbs BP Analyzing Basket Trials under Multisource Exchangeability Assumptions. The R Journal 12(2):342-358, 2021.
- Shi Y, Cameron B, Gu X, Kane M, Peduzzi P, Esserman DA Two‐stage randomized trial design for testing treatment, preference, and self‐selection effects for count outcomes. Statistics in Medicine 39(25):3653-3683, 2020. PMID: 32875582.
- Cameron B, Kane M, Esserman D Preference: An R Package for Two-Stage Clinical Trial Design Accounting for Patient Preference. Journal of Statistical Software 94(1):1-16, 2020.
- DeVeaux M, Kane M, Wei W, Zelterman D A two‐stage phase II clinical trial design with nested criteria for early stopping and efficacy. Pharmaceutical Statistics, 2019. PMID: 31507079.
- Kaizer AM, Koopmeiners JS, Kane M, Roychoudhury S, Hong DS, Hobbs BP Basket Designs: Statistical Considerations for Oncology Trials. JCO Precision Oncology:1-9, 2019. PMID: 35100726.
- Gilani O, Urbanek S, Kane M Distributions of Human Exposure to Ozone During Commuting Hours in Connecticut Using the Cellular Device Network. Journal of Agricultural, Biological and Environmental Statistics:1-20, 2019.
- O’Connor JM, Sedghi T, Dhodapkar MJ, Kane M, Gross CP Factors Associated With Cancer Disparities Among Low-, Medium-, and High-Income US Counties. JAMA Network Open, 2018. PMID: 30646225.
- Kane M, Deveaux M, Zelterman D Clinical trial design using a stopped negative binomial distribution. Statistics and Its Interface 11(4):699-707, 2018. PMID: 30655933.
- Hobbs BP, Kane M, Hong DS, Landin R Statistical challenges posed by uncontrolled master protocols: sensitivity analysis of the Vemurafenib study. The Annals of Oncology:2296-2301, 2018. PMID: 30335125.
- Ali I, Hart G, Gunabushanam G, Liang Y, Muhammad W, Nartowt B, Kane M, Ma X, Deng J Lung Nodule Detection via Deep Reinforcement Learning. Frontiers in oncology, 2018. PMID: 29713615.
- Arnold T, Kane M, Urbanek, S iotools: High-Performance I/O Tools for R. The R Journal 9(1):6-13, 2017.
- Kane M bittrex: An R client for the Bittrex Crypto-Currency Exchange. The Journal of Open Source Software, 2017.
- Paccha B, Jones RM, Gibbs S, Kane M, Torremorell M, Niera-Ramirez V, Rabinowitz PM Modeling risk of occupational zoonotic influenza infection in swine workers. Journal of occupational and environmental hygiene:577-587, 2016.
- Kane M, Lewis B, Tatikonda S, Urbanek S Scatter matrix concordance as a diagnostic for regressions on subsets of data. Statistical Analysis and Data Mining: The ASA Data Science Journal:249-259, 2016.
- Costa F, Hagan JE, Calcagno J, Kane M, Torgerson P, Martinez-Silveira MS, Stein C, Abela-Ridder B, Ko AI Global morbidity and mortality of leptospirosis: a systematic review. PLoS neglected tropical diseases, 2015. PMID: 26379143.
- Torgerson PR, Hagan JE, Costa F, Calcagno J, Kane M, Martinez-Silveira MS, Goris MG, Stein C, Ko AI, Abela-Ridder B Global burden of leptospirosis: estimated in terms of disability adjusted life years. PLoS neglected tropical diseases, 2015. PMID: 26431366.
- Wrzesniewski A, Schwartz B, Cong X, Kane M, Omar A, Kolditz T Multiple types of motives don't multiply the motivation of West Point cadets. Proceedings of the National Academy of Sciences, 2014. PMID: 24982165.
- Kane M, Price N, Scotch M, Rabinowitz P Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks. BMC Bioinformatics:1-9, 2014. PMID: 25123979.
- Rabinowitz PM, Slizovskiy IB, Lamers V, Trufan SJ, Holford TR, Dziura JD, Peduzzi PN, Kane M, Reif JS, Weiss TR, Stowe MH Proximity to natural gas wells and reported health status: results of a household survey in Washington County, Pennsylvania. Environmental Health Perspectives, 2014. PMID: 25204871.
- Baglama J, Kane M, Lewis B, Reichel L IRLBA, a fast partial SVD. Proceedings of the 12th annual Scientific Computing with Python Conference:23-24, 2013.
- Kane M, Lewis B A Generative Communication Approach to Scalable, Distributed Learning. Proceedings of the NIPS Big Learning Workshop:1-8, 2013.
- Kane M, Emerson JW, Weston S Scalable strategies for computing with massive data. Journal of Statistical Software:1-19, 2013.
- Scotch M, Mei C, Makonnen YJ, Pinto J, Ali A, Vegso S, Kane M, Sarkar IN, Rabinowitz P Phylogeography of influenza A H5N1 clade 2.2. 1.1 in Egypt. BMC Genomics, 2013. PMID: 24325606.
- Kazmi SA, Kane M, Krauthammer M Benchmarking Technology Infrastructures for Embarrassingly and Non-embarrassingly Parallel Problems in the Biomedical Domain. Biomedical Sciences and Engineering Conference (BSEC), 2013.
- Rabinowitz PM, Galusha D, Vegso S, Michalove J, Rinne S, Scotch M, Kane M Comparison of human and animal surveillance data for H5N1 influenza A in Egypt 2006–2011. PLoS One, 2012. PMID: 23028474.
- Evans ME, Ferriere R, Kane MJ, Venable DL Bet hedging via seed banking in desert evening primroses (Oenothera, Onagraceae): demographic evidence from natural populations. The American Naturalist 169(2):184-194, 2006. PMID: 17211803.
- Kane M, Savakis A Bayesian network structure learning and inference in indoor vs. outdoor image classification 2:479-482, 2004.
- Kane MJ, Sahin F, Savakis A A two phase approach to Bayesian network model selection and comparison between the MDL and DGM scoring heuristics. IEEE International Conference on Systems, Man and Cybernetics 5:4601-4606, 2003.
Patient Reviews
CV information above last modified October 01, 2025