Motivation

Two of our group members celebrated their birthday in November, and immediately they started wondering how many birthdays remain in their life, how long could they survive in this world.

   

Initial Question

Initial questions:

What factors have a significant impact on life expectancy?

Do different countries have significantly different life expectancy?

What factors account for the differences in life expectancy in different countries and regions?

   

Introduction

People in different regions have different life expectancies. Even in New York, life expectancy varies over regions due to different economic statuses, different infrastructures, etc. Other factors such as the explosion of measles and polio in the previous century also had a great impact on life expectancy. Studies in the course of epidemiology and the emergence of pandemics such as Covid-19 and SARS have shown us how devastating viruses can be and how vulnerable people are facing epidemics. In response, people produce vaccines to avoid being infected before becoming immune. Inspired by the course we learn, it is natural to wonder about the effect of vaccinations on life expectancy and perhaps to figure out the factors that could potentially have the most influential impact on life expectancy: social-economic factors, public health factors, or both.

Longevity is one of the factors people pursue over centuries. Numerous articles such as The Value of Health and Longevity by Murphy, et. al. and Exercise and longevity by Gremeaux et. al. all stated the desire for longevity and offered various suggestions to maintain good health in both quantitative and theoretical ways. On the other hand, the emergence of Public health also aims to increase life expectancy by preventing, intervening, monitoring, evaluating diseases, environmental hazards, and promoting healthy behaviors. Indeed, with vast improvements in health and quality of life after the invention of Public Health, the global average life expectancy has more than doubled since 1900 Roser 2013. With the importance of long life expectancy, Our project aims to identify the factors that contribute to life expectancy, both positively and negatively, so that individuals can make decisions on personal behavior. More importantly, from a macro perspective, governments for different countries can publish legislation such as mandatory vaccinations based on our results to increase average life expectancy.

There are papers focused on predicting life expectancy. In the work by Clarke et. al., they stratified the result predicted by different occupations, namely doctors nurse and medical students, and discover a low prediction accuracy (10%) based only on disease factors. Therefore, in our paper, we decide to consider more non-medical variables. On the other hand, the architectural model designed by Sormin et. al. reaches an accuracy of 97% in predicting life expectancy on a global level. However, the model was over-complicated (requires a 1000 epoch training) and it can not identify potential significant factors based on the model. Takes into consideration of all factors, we consider linear model as our main regression method and choose economic factors and health related factors (discuss later).

   

Sources

Our main datasets is WHO Life Exp Dataset, which covers the years 2000-2016 for 183 countries, including 32 variables such as health-related factors and GNI per capita. It is extracted from the following website: https://www.kaggle.com/datasets/mmattson/who-national-life-expectancy?resource=download

   

Variable Selection

Based on the current research and the data we had, we decided to study the impressions of the variables by country on life expectancy. We decided to remove data related to mortality, because life expectancy is calculated from these data, so those are meaningless to predict life expectancy. Other variables can be considered relevant and for following study. For example, the GDP and development status of the country, local disease situation, education level, and the living habits of people in the country like alcohol consumption .

Based on these ideas, we looked through several papers to generate more variables that we thought would relate to life expectancy.

Punching above their weight: a network to understand broader determinants of increasing life expectancy. Fran Baum, et al. 2018. Why do some countries do better or worse in life expectancy relative to income? An analysis of Brazil, Ethiopia, and the United States of America. Toby Freeman, et al. 2020. Both of the papers introduce an idea of some countries achieving higher or lower life expectancy than would be predicted by their per capita income, and one of the reasons is the women’s access to education and politics—-this is explained in the second paper. Therefore, we decided to add a variable related to women’s power in the country—-percentage of women in the national parliament around 2015~2018.

Determinants of inequalities in life expectancy: an international comparative study of eight risk factors.”Mackenbach,JohanP.,et al. 2019 This article from the Lancet studies 8 different factors on human life expectancy, including income levels, smoking status, alcohol consumption levels, education levels, bmi, physical activity intensity and diet types. It inspires us to include income groups into our study as a categorical variable.

Ambient PM2. 5 reduces global and regional life expectancy. Apte, Joshua S., et al. 2018 This article inspires us to take environmental factors into consideration. Thus we added PM2.5 relative concentration as a numeric variable.