Education
- M.S. Computational Data Science @ Carnegie Mellon University SCS (May 2025 - December 2026)
- B.S. Statistics and Machine Learning @ Carnegie Mellon University (August 2021 - May 2025)
Technical Skills
- Languages: Python, SQL, R, C++
- Tools: Git, SageMaker, Jupyter Notebook, Tableau, Snowflake, AWS, Google Cloud Platform
- Interests: Time-Series Forecasting, Predictive Modeling, Deep Learning, A/B Testing
Work Experience
Data Science Intern @ Reddit (May 2025 - August 2025)
- Ads Marketplace Team
- Developing and optimizing a Prophet model using Python, SQL, and Vertex AI to forecast ad revenue.
Data Science Intern @ Salesforce (May 2024 - August 2024)
- Revamped a revenue-forecasting tool using Python and SQL that automated the sales quota allocation process for over 4,000 sales strategy employees.
- Improved XGBoost model’s performance against business logics by 14% with simplified pre-processing logic, targeted feature selection, and regularization for zero-inflated data.
- Led cross-functional collaboration with product manager and stakeholders to align model capabilities with business needs and integrate the final product into planning dashboards like AnaPlan.
Machine Learning Researcher @ Carnegie Mellon University (November 2023 - May 2024)
- Developed a deep learning system simulating Bayer’s supply chain that identified weak nodes through stress-testing and improved supply chain resilience.
- Trained a Graph Neural Network (GNN) using PyTorch and TensorFlow on over 250,000 high-impact disruption scenarios, predicting the monetary impact with an accuracy of 84%.
- Analyzed Time to Recovery (TTR) metric to pinpoint nodes whose failure would have the largest impact and suggest alternative paths, reducing potential financial loss by up to 9.2% per node.
Data Analyst Intern @ Carnegie Mellon University Cognitive Development Lab (May 2023 - August 2023)
- Modeled how children acquire and organize new semantic knowledge through a comprehensive analysis of three language corpora, each containing 8-13 million words, with Python, R, and NLTK.
- Conducted hypothesis tests and built visualizations to compare the performances of two probability-based word association measures, presenting results to a lab of over 15 researchers and informing future research efforts.
Projects
Racial Identity Profiling in Contra Costa County (R)
Poster, Repository
Earned 3rd Place Prize in Undergraduate Statistics Research at CMU’s Meeting of the Minds
- Uncovered patterns of racial and ethnic disproportionality in police stops with linear/logistic regression, hypothesis tests, and decision trees in R to make policy recommendations to Contra Costa’s Office of Justice.
- Performed sensitivity analyses to ensure robustness of findings despite variation in demographic data sources.
Behind Schedule: An Investigation into Pittsburgh Regional Transit Performance (Python)
Repository
- Sourced, cleaned, and built visualizations of key performance metrics with pandas and altair to evaluate the quality of Pittsburgh Regional Transit’s bus and rail services.
- Tracked On-Time Performance (OTP) and identified disparities, such as the poor OTP of CMU-centric bus routes compared to system-wide averages, and evaluated temporal, seasonal, and day-type influences on OTP.
Teaching Assistantships
- Advanced Data Analysis (36-402): January 2025 - May 2025
- Statistical Computing (36-350): August 2023 - January 2025
Selected Coursework
- Intro to Deep Learning (11-785)
- Natural Language Processing (11-611)
- Algorithms and Data Structures (15-122)
- Statistical Machine Learning (36-462)
- Probability and Statistical Inference (36-236)
- Discrete Math (21-122)
- Linear Algebra (21-241)