The tentative project title:

World Life Expectancy—Live to thrive

The motivation for this project:

It’s a survival instinct in blood that keeps us asking ourselves, how do we live, and how do we live longer. People in different regions have different life expectancies. Even in New York, life expectancy varies over regions due to different economic statuses, different infrastructures, etc. It is natural to wonder about the factors that could potentially have the most influential impact on life expectancy: social-economic factors, public health factors, or both.

The intended final products:

  • A website with homepage for overview of project and our screencast, a page for some plots for exploratory data, a page for analysis of some factors and approaches we used, a page for interactive dashboard by R-Shiny, a page for report and link to github.
  • A report showing how we completed the project, and any changes during completion.
  • A short screencast describing our project.

The planned analyses / visualizations / coding challenges

Analyses:

  • Hypothesis test : difference in mean life expectancy among regions, e.g. developed v.s. developing countries
  • Linear model : Fit a linear model with Y = life expectancy and X=economic/environmental / health determinants

Visualizations:

  • EDA Plots: variable distribution / scatter plots with best line fit / correlation plot of variable
  • Time Series Plot : life expectancy over time
  • Interactive World Map : life expectancy in different countries and years

Challenges :

  • Missing data:Many variables in our major dataset have missing values Select reasonable variables: We need to find more variables for our dataset, in order to cover aspects of society, economics, environment, health, and so on.
  • Website Design : We want to build a fancier website with advanced settings
  • R-Shiny interactive dashboard : We plan to use R-Shiny to build an interactive platform.
  • Model fitting : Our dataset contains many predictors of life expectancy. Variable selection and interpretation could be a challenge

The planned timeline

  • 11.12 Proposal
  • 11.12~11.15 Find more related datasets to include more determinants for life expectancy. Determine the variables/predictors. Explore the data and decide what exact plots that we should draw.
  • 11.15 Meeting with TA.
  • 11.15~11.20 Assign individual tasks and schedule weekly meetings. Finish analysis test part and begin EDA Section.
  • 11.21~11.27 Finish EDA Section, Time Series Plot and analysis linear model part, beginning Interactive World Map.
  • 11.28~12.4 Finish the World Map, begin Website design and begin to write the report.
  • 12.5~12.10 Finish the report and video.
  • 12.11~ Enjoy a life without DS project. (So sad TAT)