Mini-project 1: Convergence
S470/670
Upload your initial submission through the Assignments tab on Canvas by 11:59 pm, Tuesday 29th September.
You may do this project alone, in a pair, or in a group of 3. If you are working in a pair or group, register your group using the “Mini-project 1 groups” tab on Canvas (or ask me or Haoran to do it.) If you don’t have a partner and want me to pseudo-randomly match you with somebody else, email me to let me know by Thursday 17th September.
A researcher for a thinktank is interested in the hypothesis of economic “convergence.” Ac- cording to this theory, poorer countries’ GDP per capita will tend to “catch up” by growing faster than richer countries’ GDP per capita. You may wish to (but don’t have to) read the Wikipedia article on convergence:
https://en.wikipedia.org/wiki/Convergence_(economics)
The researcher notices the Gapminder website (www.gapminder.org/data) has data on GDP per capita (adjusted for inflation), among many other indicators. He has taken an introductory statistics course using R, but that was a long time ago, so he is outsourcing the exploratory data analysis to YOU.
Data
Go to the Gapminder website and download data for the following variables: • Income per person (GDP/capita, PPP$ inflation-adjusted)
• Population, total
You may also download other variables if you think they will be useful.
For questions about “continents,” use the continent definition used by Gapminder, which divides
countries into Asia, Africa, Europe, Oceania, and the Americas. You might want to merge your data with the data in the gapminder R package if you don’t want to define this manually; it’s okay if you lose some countries in the merge. Also note that there are few countries with usable data in Oceania; you may wish to omit Oceania when you have to split the data by continent.
Questions
The researcher’s major research question is: When and where has there been convergence in GDP per capita since 1960? However, he recognizes this question may be difficult to answer,
1
at least straight away. So he has brainstormed a series of questions he would like you to address, which can be divided into three groups:
1. What countries have grown the most and least since 1960? Calculate the annual growth rate of per capita GDP for each country. For example, world GDP per capita was $4933 in 1960 and $15,941 in 2018, corresponding to an annual growth rate of 2.04%. Draw a visualization of the ten fastest growing and the ten least fast growing countries and their growth rates. Do the fastest-growing countries have anything in common? The slowest- growing countries?
2. In general, has there been convergence since 1960? Visualize the relationship between per capita GDP in 1960 and annual growth rate since 1960. Looking at the data as a whole, is there evidence for convergence? Also group the data by continent (Asia, Africa, Europe, and the Americas.) Has there been convergence between richer and poorer continents? Has there been convergence between richer and poorer countries within each continent?
3. Has the pattern been different in different time periods? Now divide the time since 1960 into three time periods: 1960 to 1980, 1980 to 2000, and 2000 to present. Repeat your analysis from Part 2 for each time period. What are the differences in convergence across these three time periods?
Write a report of no more than eight pages, including graphs, for the researcher.
The third set of questions is the deepest and will probably require the most attention. Note that some of these questions may not have definitive answers; the researcher recognizes this.
Some constraints:
• The researcher is familiar with elementary methods like linear models, but not with non- parametric methods such as loess and gam. That means that if you want to use those more fancy models, you need to briefly describe what those techniques are doing in words that a non-statistician can understand.
• He is comfortable with transformations, but they would have to be interpretable.
• He took his statistics course from a fairly skeptical lecturer, so he knows all models are wrong. However, he is willing to accept some wrongness in exchange for a simple description of the data.
• He doesn’t need to see the R code, but wants to be able to reproduce your work if required.
• The researcher has noticed that student reports on complex real-world phenomena occasion-
ally (accidentally one hopes) say offensive things, and would prefer if you didn’t do that.
What to submit
Your initial submission should consist of:
• A report (PDF preferred) of no more than eight pages, excluding appendices.
• A .Rmd or other file containing your code. 2
• Any other supplementary files required to reproduce your work.
We will give you feedback, then you will make a final submission by a date to be announced. (You may not get a long turnaround, so it would be prudent to make your initial submission fairly polished.) The grade for your final submission will be the one that counts.
Notes
• There is no one objectively right answer to either part of the project (but there are infinitely many subjectively bad answers.)
• You should put a short executive summary at the beginning giving a brief answer to the major research question.
• Make sure you justify your answers to the questions (don’t just state answers.)
• There aren’t many countries in Oceania, so it may not be possible to fit complex models for that continent. You may drop that continent from your analyses should you find that necessary (but only where necessary.)
• You do not necessarily need one overall model that describes all the data.
• Because there’s no correct model, you’re free to use multiple models for the same data and
question, if you feel that’s a good use of your time and page count.
• All the data in Gapminder is estimated. It is certainly possible that some countries fudge their official statistics for their own benefit.
• A large fraction of the points are for communication, so maintain a decent level of profession- alism.
• Additional technical graphs such as residual plots can be included in an appendix, which will not count toward the eight page limit and which we might not bother to read. Submit your code as a separate file. Also upload any additional sources required to reproduce your work.
Grading
• First set of questions: 5 points
• Second set of questions: 5 points
• Third set of questions: 10 points
• Communication: 10 points. Full credit for communication requires a readable, informative, comprehensive, clearly labeled set of graphs, and a comprehensible write-up with few glaring spelling and grammatical errors that makes the main points of the analysis clear.
3