FIN330: Advanced Methods
Final Project – Fall 2018
due at 12pm Tuesday December 11, 2018
Please submit your project report to the blackboard course site, by 12pm Tuesday December 11, 2018.
********************************************
One of the goals in movie industry is to produce successful movies. A successful movie is often considered as generating revenues. Can information available soon after a movie’s release be used to predict its world revenue? To investigate this, let’s consider an SRS of 28 movies released in the past to guarantee they are no longer in the theaters. The response variable is a movie’s world revenue (WorldRevenue). Among the explanatory variables are the movie’s budget (Budget), opening-weekend revenue (Opening), how many theaters the movie was in for the opening weekend (Theaters), and variables related to ratings (Rating, Rating1). All dollar amounts are measured in millions of U.S. dollars. In your entire analysis, you should ignore variables ”Profit”, ”USRevenue”, and ”IntRevenue”
You need to build a multiple linear model to explain /predict WorldRevenue, and to write a report to document why and how certain variables play significant role in explaining WorldRevenue.
General instruction: Your report should be organized in the following order: introduction (describe the problem under your investigation), methodology (describe your data, and all your models), discussion (discuss your findings based on the statistical analysis, and comment on your final model and its interpre- tation), and finally one paragraph of summary. Please put all tables and/or graphs in the main body of the report, and all R codes in the appendix. Please note that your report should be typed and no more than 5 pages (excluding R codes, Tables, and plots).
Some comments:
- You do not have to use all the variables in the dataset. A final model with 2-4 variables would be sufficient.
- Variable ”LOpening” is the logarithm of variable ”Opening”. Should you use both in your regression model? If so, why. If not, which is better for your data analysis?
- Should you use both ”Rating” and ”Rating1” in your model? Why and why not?