Applied Regression Analysis
Final Project
ECN 410 – Fall 2016
For the final project, you will need to produce an original statistical analysis or forecast using the regression techniques covered in this course. Students are free to select any topic they would like so long as it meets the requirements described below, and may also work in groups of up to two people.
Requirements
All projects should include the following components:
» An introduction outlining the purpose of the project and the primary question it seeks to answer.
» A section explaining the data to be used.
» A section explaining the regression methods to be used and important considerations being taken
to ensure that analysis is correct.
» A section presenting the results of the regressions according to best practices as well as the results
of tests and checks done to reinforce the analysis.
» A section which interprets and discusses the results of the regressions and tests.
» A section which summarizes and concludes about what the analysis gives as the answer to the
primary question of the project.
Timeline
October 3rd, 2016
Groups should be formed (optional) and topics selected. The names of the members of the group, a short description of the topic selected, and a description of how the data will be obtained should be emailed to me at Brian.Goegan@asu.edu.
October 24th, 2016
Each group/student should email be a brief description of the data collected, along with the data itself. A serious effort should be made to collect all data by this date so that issues can be identified with plenty of time to remedy them.
November 14th, 2016
Each group/student should email me their “methods section”, which describes the methods being used in the analysis, along with the preliminary results of the main regression being run. These results should be presented professionally in a chart or table.
December 2nd, 2016
The final report is due by email in PDF format. The report should include the names of all group members. There are no formatting or page requirements for this report.
Choosing a topic for this paper is difficult, but there are few key things to keep in mind to ensure that the topic you select will satisfy the requirements of the paper and still be enjoyable to write.
1. You need to choose a dependent variable. That means you need an outcome which can be measured. A regression can only accommodate one dependent variable, so if you want to look at multiple outcomes (encouraged), you will be doing several regressions (not hard).
2. You need to think about the causes of your outcome, and then find or collect data on the causes which can be measured. These will be your independent variables.
3. Most of the time, it is just one or two independent variables that you are interested in. Regression is a tool for singling out the impact of each independent variable.
4. Don’t choose a topic without exploring your data options. There are a lot of plausible topics, but only a few possible ones. Data is a much bigger limitation than your imagination.
To help you think of something, you might try filling out the following questionnaire:
What are you interested in?
_____________________________________________________________________
What measurable outcome in that field would you like to better understand/predict? _____________________________________________________________________ What do you suspect are some of the measurable determinants of that outcome? _____________________________________________________________________ Where can you acquire the data on all of those things? _____________________________________________________________________
Example:
What are you interested in?
Movies.
What measurable outcome in that field would you like to better understand/predict?
Box Office Success
What do you suspect are some of the measurable determinants of that outcome?
Popularity of Cast, Genre, Budget, MPAA Rating, Reviews, Awards, Season
Where can you acquire the data on all of those things?
IMDb.com and BoxOfficeMojo.com
Fill that out several times. Try to think of at least 10. Try to make all 10 as different as possible. Then think about the actual process of getting the data from the sources you need. Good regressions need a lot of data that varies widely. To do the project I outlined above, I would need data on thousands of films. These films would need to span the spectrum of every variable I outlined. I would need high grossing and low grossing films, films with many popular actors, some popular actors, and no popular actors. I would need many films from every genre and rating. Films with big budgets and small. Films with good reviews and bad. Films that won many awards and films that were not even nominated. I would also need films released at every point in the year. In short, I would need a $#!%load of data, which is a technical term.
Getting the data for thousands of films from IMDb.com and BoxOfficeMojo.com would take years. Make sure you are considering this. There are ways to automate data collection though, so you should talk to me about it if you know the data is there but need a way to automate collecting it. Ideally though you want to find the data already in spreadsheet format for you. There are a lot of datasets out there. Talk to me if you want help finding some.
These are a few ideas I have had over the years which I think make for interesting topics. You should not select them unless you think they are compelling though. By virtue of me listing them here, it means I have thought about them, and will be more critical of your approach.
» Determine whether or not the minimum wage contributes to unemployment.
» Determine whether or not the minimum wage contributes to GDP.
» Determine whether or not grades, or GPA, matter for outcomes like income.
» Find the most important determinants of people’s discount rate.
» Predict the outcome of a sporting event. This can be anything from predicting the records of each team in a league to predicting the score of each game. You could even focus on predicting specific player statistics.
» Estimate a person’s resting metabolic rate, life expectancy, or some other biometric.
» Predict the outcome of an awards show. The Emmys, Grammys, Oscars, Tonys, or whatever else
you would like to know the winners of in advance.
» Determine whether or not income inequality is an important factor for income growth by
evaluating its use as a predictor of GDP growth.
» Use alternative methods to predict something that is commonly forecasted (e.g. GDP, quarterly
revenue, unemployment, the weather, commodity futures, etc.). Compare your results to what has been done previously.