Main
Shun Xie
Education
Imperial College London
Msci. (Bachelor and Master) in Mathematics
London, UK
2018
Undergraduate GPA: 3.7/4.0, Graduate GPA: 3.83/4.0, First-Class Honours Degree (UK)
Thesis: Correlation between unemployment and earnings using Distance Correlation
Columbia University
M.S. in Biostatistics
New York, US
2022
First Year GPA: 4.0/4.3
Professional Experience
Data Analyst Intern
Yum China
Shanghai, China
May 2023 - Aug 2023
- Identified anomalies in the data due to holidays by applying CNN on time series data
- Reconstructed weekly report after communication with operation team using HiveSQL; Improve SOP ability by supporting operation team to maintain weekly report dashboard
- Based on A/Btest for the new ai recommendation strategy on cltv1 (low-frequency user), discovered an improvement in ARPU but a decrease in transaction frequency; Proposed to solve the problem by splitting cltv1 into new and sleeping member groups
- Improved customer value by constructing user churn model using LightGBM/MLP (AUC: 0.86) and defined the KPI via customer return rate for churn model evaluation
- Offered a data-driven solution using Shapley Value and WOE during weekly meeting; Presented the solution with Altair/Seaborn visualization package to the operation team
- Summarize the key attributes of churn users through K-means clustering on PySpark
Data Analyst, Intern
Caitong Security Asset Management Co. Ltd.
Shanghai, China
Jun 2020 - Aug 2020
- Improved accuracy by imputing missing value using developed formula.
- Analyze data extracted from Wind and generate analysis report on the feature of public equity fund
- Investigated the idiosyncrasies of the mutual funds with more than 5 billion subscriptions and produced a research report
Data Analyst, Intern
Hycon Research Co. Ltd.
Shanghai, China
Jul 2019 - Sep 2019
- Contributed to a market research project to optimize the design of a new towelette product and boost sales
- Conducted conjoint analysis to identify the weights of different properties of the product gathering consumers’ attention
- Generated 8 randomly profiles comprising the four factors and gathered 60 responses to the profiles
- Implemented the algorithm using R and determined that the production location was the major concern
Research Experience
Study of Life Expectancy
Group Leader, Columbia University
New York
Nov 2022 - Dec 2022
- Built main page of website using R and html language, published the website on github webpage
- Impute dataset using k mean imputation with 4 groups regarding to countries income level and verified using k mean clustering
- Based on Pearson and distance correlation, confirmed that linear regression is sufficient for life expectancy analysis and achieved an adjusted R square of 0.78.
- Chose as the paradigm of the projects and displayed in lecture’s webpage
Correlation between Unemployment and Wages
Master Thesis, Imperial College London
London, UK
Oct 2021 - Jun 2022
- Applied Distance Correlation in a new field and captured additional 10% non-linear correlations.
- Implemented spatial regression with approximate profile-likelihood estimator (APLE) to solve the problem of nonzero spatial correlation.
- Iterated over confounders and concluded that the correlation arises from confounders under different time lag
Depression Status Predicted by COVID-19 Associated Behavior Change
Group Leader, Yale University
Remote
Jun 2021 - Sep 2021
- Increased accuracy by an average of 6% after replacing selected models (Logistic Regression, Mixed Logistic Regression, Random Forest, k-NN) by Linear SVM
- Improved sensitivity by 0.6 in k-NN and LR using Under-sampling, Lower-sampling to tackle unbalanced data.
- Identified Random Forest achieved the optimal F-score (0.787) and sensitivity (0.750) using 4-fold cross-validation
Evaluation of Supervised Classification Models for Image Recognition
Academic, Imperial College London
London, UK
Jan 2021 - Mar 2021
- Built and optimized the Multilayer Perceptron model to prevent overtraining based on the performance on CIFAR-10 dataset
- Compared results between MLP and CNN model; Concluded that CNN model has a 20% higher accuracy at epoch 40 and saved 7 megabytes of data
Comparison of Different Word Embedding Models based on IMDb Reviews Jun 2020
Group Leader, Imperial College London, UK
London, UK
Jun 2020
- Compared LSA, GloVe, and Word2vec models via theoretical analysis using intrinsic and extrinsic evaluation
- Discovered no global difference among models after applying sentiment analysis on IMDB movie reviews
Activities
Buldhism
In charge of organizing zoom meetings and activities.
Shanghai, China
Sep 2023 - Now
QunYao Consulting.
Teaching for algebra and differential equations for A-level.
Shanghai, China
Jun 2018 - Jul 2018
Selected Publications
Comparison of Models’ Performance for Predicting Depression under COVID-19.
accepted by the 2021 International Conference on Statistics, Applied Mathematics and Computing Science (CSAMCS 2021)
Shanghai, China
2021
Shun Xie