Runzhe Wan

Runzhe Wan

Senior Applied Scientist

Core AI, Amazon Inc.

Biography

I am currently an Senior Applied Scientist at Core AI, Amazon. I obtained my Ph.D. in Statistics at North Carolina State University in Feb. 2022, advised by Dr. Rui Song. Previously, I received my B.S. in Mathematics from Fudan University, China in May 2017.

My current research interests center around optimal decision-making under uncertainty. Such a decision may involve counterfactual effects, have long-term impacts, need to be personalized, and can be evaluated/learned either during online interactions or from offline data. I am passionate to develop powerful and robust frameworks to evaluate and optimize our decisions and policies, with reliable statistical guarantees and efficient numerical algorithms. Accordingly, I have broad interests in Reinforcement Learning (inc. Bandits), GenAI/LLM, and Causal Inference (inc. Causal ML).

At Amazon, I worked in Core AI, Amazon Stores’ central science org that reports directly to the CEO. I work across organizational boundaries to drive high-stakes initiatives to power Amazon Stores’ most critical decision systems. My work has delivered over $1B in annual business impact, but also shaped the strategic adoption of causal ML and RL methodologies across Amazon’s core platforms, including inventory management, markdown algorithms, product selection, delivery speed optimization, experimentation platforms, etc.

Education
  • Ph.D. in Statistics

    North Carolina State University, 2022

  • B.S. in Mathematics

    Fudan University, 2017

Experience
  • Core AI, Amazon Inc.

    Senior Applied Scientist

    April.2024 -- Present

  • Core AI, Amazon Inc.

    Applied Scientist

    April.2022 -- March.2024

  • Core AI, Amazon Inc.

    Applied/Research Scientist Intern

    May.2020 -- Feb.2022 (part-time within semesters)

  • Bell Labs

    Research Intern

    Jun.2019 - Aug.2019

Interests
  • Reinforcement Learning (inc. Bandits)
  • GenAI/LLM
  • Causal Inference (esp. Causal ML)

Research

Publications

* : Co-First Author

  1. A Review of Causal Decision Making RL Causal
    Ge, L.*, Cai, H.*, Wan, R.*, Xu, Y.* and Song, R. (2025). Journal of Artificial Intelligence Research
  2. Zero-Inflated Bandits RL
    Wei, H.*, Wan, R.*, Shi, L. and Song, R. (2025). ICML 2025
  3. Know When to Fold: Futility-Aware Early Termination in Online Experiments Statistics/ML
    Liu, Y.*, Wan, R.*, Huang, Y., McQueen, J., Hains, D., Gu, J., and Song, R. (2025). WWW 2025
  4. Contextual Deep Reinforcement Learning with Adaptive Value-Based Clustering RL
    Gao, Y., Wan, R., Giannakakis, I., Gu, J., and Song, R. (2025). International Conference on Machine Learning Technologies 2025
  5. Robust Offline Policy Evaluation and Optimization with Heavy-Tailed Rewards RL Causal
    Zhu, J. , Wan, R., Qi, Z., Luo, S. and Shi, C. (2024). AISTATS 2024.
  6. Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches Statistics/ML
    Liu, Y.*, Wan, R.*, McQueen, J., Hains D., Gu, J., and Song, R. (2024) AAAI 2024. (Oral Presentation)
  7. Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring RL Statistics
    Wan, R.*, Liu, Y.*, McQueen, J., Hains D. and Song, R. (2023). KDD 2023
  8. Multiplier Bootstrap-based Exploration RL
    Wan, R.*, Wei, H.*, Kveton, B. and Song, R. (2023). ICML 2023
  9. A Review of Reinforcement Learning in Financial Applications RL
    Bai, Y.*, Gao, Y.*, Wan, R.*, Zhang, S.*, and Song, R. (2024). Annual Review of Statistics and Its Application
  10. Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework RL
    Wan, R*., Ge, L.* and Song, R. (2023). AISTATS 2023
  11. Batch Policy Learning in Average Reward Markov Decision Processes RL Causal
    Liao, P.*, Qi, Z.*, Wan, R., Klasnja, P. and Murphy S. (2022). Annals of Statistics (AoS)
  12. A Multi-Agent Reinforcement Learning Framework for Treatment Effects Evaluation in Two-Sided Markets RL Causal
    Shi, C., Wan, R., Song, G., Luo, S., Song, R. and Zhu, H. (2022). Annals of Applied Statistics (AOAS)
  13. Mining the Factor Zoo: Estimation of Latent Factor Models with Sufficient Proxies Statistics
    Wan, R., Li Y., Lu, W. and Song, R. (2022). Journal of Econometrics (JOE)
    (Won the Best Student Paper Award, Business&Econ Section, American Statistical Association. Declined following the one-award-per-year policy)
  14. Safe Exploration for Efficient Policy Evaluation and Comparison RL Causal
    Wan, R., Kveton, B. and Song, R. (2022). ICML 2022
  15. Metadata-based Multi-Task Bandits with Bayesian Hierarchical Models RL Causal
    Wan, R., Ge, L. and Song, R. (2021). NeurIPS 2021
  16. Deeply-Debiased Off-Policy Interval Estimation Causal RL
    Shi, C.*, Wan, R.*, Chernozhukov, V. and Song, R. (2021). ICML 2021
    (Long Oral, acceptance rate 3%)
  17. Multi-Objective Model-based Reinforcement Learning for Infectious Disease Control RL Causal
    Wan, R., Zhang, X. and Song, R. (2021). KDD 2021
    (Norman Breslow Young Investigator Award, American Statistical Association)
  18. Pattern Transfer Learning for Reinforcement Learning in Order Dispatching RL
    Wan, R.*, Zhang, S.*, Shi, C., Luo, S. and Song, R. (2021). IJCAI 2021, RLITS Workshop
    (Best Paper Award)
  19. Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making RL Causal
    Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020). ICML 2020

Under Review / Revision

  1. STEEL: Singularity-aware Reinforcement Learning RL
    Chen, X., Qi, Z. and Wan, R.
  2. Heterogeneous Synthetic Learner for Panel Data Causal
    Shen, Y., Wan, R., Cai, H. and Song, R.

Internal Publications (Amazon)

  1. Inventory Long-Term Value for Inventory Health: An Offline Causal Reinforcement Learning Approach RL Causal
    Wan, R., Yao, J., Zeng, Y., Liu, C. and Gu, J. (2025) AMLC 2025
  2. Know When to Fold: Futility-aware Early Termination in Online Experiments. Statistics/ML
    Wan, R.*, Liu, Y.*, Huang Y., McQueen, J., Hains D., Gu J. and Song, R. (2023) CSS 2024 (Best Paper Award)
  3. Experimentation Platforms Meet Reinforcement Learning: Bayesian Sequential Decision-Making for Continuous Monitoring. RL Statistics/ML
    Wan, R.*, Liu, Y.*, McQueen, J., Hains D. and Song, R. (2023) CSS 2023 (Best Paper Award) and AMLC 2023.
  4. Data-driven substitution aware promise tuning model and promise extension experiment Statistics/ML
    Giannakakis, I., Svoboda , R., Wan, R.*, Gu, J., Yao, J. (2023) AMLC 2023
  5. Effect Size Estimation for Duration Recommendation in Online Experiments: Leveraging Hierarchical Models and Objective Utility Approaches Statistics/ML
    Liu, Y.*, Wan, R.*, McQueen, J., Hains D., Gu, J., and Song, R. (2023) Econ Summit 2023
  6. Continuous Monitoring of A/B Tests: A Meta Analysis Statistics/ML
    Liu, Y.*, Wan, R.*, McQueen, J., Song, R, Hains D. and Richardson, T. (2023) AMLC 2023
  7. Deep Inventory Control Policy for Perishables RL
    Wan, R., Giannakakis, I., Sisikoglu E., Jiang T., Goyal, V., Song, R. and Gu, J. (2022). CSS 2022 (Long talk) and AMLC 2022.
  8. Reinforcement Learning for Replaceability Index Estimation and Assortment Optimization. RL
    Wan, R., Giannakakis, I., Gu, J. and Song, R. (2021). CSS 2021 (Spotlight talk, rate 4.8%) and AMLC 2021.

Other Contributions

Awards / Honors

Professional Services

Reviewer:

  • Conferences: NeurIPS, ICML, ICLR, UAI, AISTATS, KDD, IJCAI, AAAI, SDM
  • Journals: Annals of Statistics, Journal of the American Statistical Association (T&M), Journal of the American Statistical Association (book review), Journal of the Royal Statistical Society (Series B), EJS, STPA, Transactions on Machine Learning Research (TMLR)

PC member: RLITS Workshop at IJCAI 2021, Online Marketplaces Workshop at KDD 2022/2023