THE AUSTRALIAN NATIONAL UNIVERSITY
RESEARCH SCHOOL OF FINANCE, ACTUARIAL STUDIES AND STATISTICS
STAT7030 Generalised Linear Modelling
Final Project: Due at 5pm, Friday, 11/03/2022
Copyright By PowCoder代写 加微信 powcoder
1. This final project is worth 60% of your final grade and is compulsory. 2. Maximum marks: 60.
3. Project reports can only be submitted via Turnitin on the Wattle.
4. Please sign the declaration form on the Wattle and include it as the cover page in your submission. Please be aware of the quality of the file when you are preparing the submission, such that the file is legible to read.
5. File size limit for Turnitin submission: 50MB.
6. Several trials of submission are strongly recommended before the due date. If there is any problem in your trials, please send an email to report before the due date. Late submission will not be accepted and your final project will be marked 0. Please prepare to submit your report at least 30 minutes before the end of due time.
7. While you may use course material, computer software, internet, or other resources, you must complete the final project individually. Identical submissions even only for one sentence are treated as cheating.
8. You can use any result, formula or statement from the course material without proof. In fact, doing this will help your project.
This individual final project is designed to apply materials in this course to analyse any one or two real-world datasets chosen by yourself. It is worth noting some broad types of real data that are available without charge. For example:
• COVID19dataforAustralia(https://www.covid19data.com.au/data-notes);
• Australian government data (https://data.gov.au);
• Wales government data (https://data.nsw.gov.au);
• Victoria government data (https://www.data.vic.gov.au);
• ACT government data (https://www.data.act.gov.au/browse);
• Australian government statistics and datasets (e.g., https://www.abs.gov.au,
https://www.abs.gov.au/statistics/microdata-tablebuilder);
• Australian central bank datasets (http://www.rba.gov.au);
• US Census data (https://www.census.gov/data.html);
• Federal Reserve Economic Data (FRED) (https://fred.stlouisfed.org/);
• Country level data sets provided by the National Bureau of Economic Research (http://www.nber.org/data/);
• World Bank datasets (https://www.doingbusiness.org/en/data);
• United Nations (international) demographics data (http://data.un.org/); and
• ANUlibrary(https://anulib.anu.edu.au/find-access/e-resources-databases), e.g., DatAnalysis Premium, Factiva (global news database), Connect4.
You may also consider some well-established datasets. For instance, datasets from academics (via their university websites), e.g., (http://mba.tuck. dartmouth.edu/pages/faculty/ken.french/data_library.html); and datasets from https://archive.ics.uci.edu/ml/datasets.php. However, since these datasets are usually from refereed journal articles/academic books, if you choose to use one or two of these datasets, please clearly indicate what is the difference/novelty/ improvement in your analysis and report compared to the extant literature. You may also consider data from your industrial experience (you do not need to submit data
but only report). Note that the one or two datasets that you choose cannot be the same as any dataset used in the lectures, tutorials, and assignments on Wattle.
Based on the one or two datasets that you choose, you need to consider at least two types of models to fit your data from the list below (each bullet point can only count as one type):
• Linear Mixed Effects Model;
• Binary Regression/Binomial Logistic Regression;
• Poisson Log-Linear Regression/Log-linear Regression with Extra-Poisson Vari- ation;
• Multicategory Logistic Regression; and
• Gamma/Exponential Generalised Linear Model.
When you work on the fitting of the two types of models that you choose, you may consider only one dataset but select two different variables as response variables respectively, such that your two types of models can be applied to each of them; or you can consider two datasets respectively corresponding to the two types of models that you select. Note that the selection of the response variables needs to be meaningful and useful in real practice. If you include the fitting results for only one type of models in your report, you will be deducted 30 marks for this final pro ject.
REPORT (60 marks) Report Format – PDF or Word Upload
Written reports for this project (10 pages maximum for the main manuscript and 20 pages maximum for the appendix based on the format below, and all the R code should be relegated to the appendix) are expected to be submitted via Turnitin. Turnitin similarity check will be conducted for all the submitted reports. Please use Australian English spelling. All pages (uploaded in PDF or Word form) must be as follows:
• Black type, or occasional coloured type for highlighting purposes; • Single column;
• White A4 size paper with at least 0.5 cm margin on each side, top and bottom;
• Text must be size 12 point Times New Roman or an equivalent size before
converting to PDF format and must be legible to assessors;
• References and appendices only can be in 10 point Times New Roman or equiv- alent.
Report Guideline
In this project, you need to submit a report by analysing one or two datasets and considering at least two types of models to fit your data; see the details of “DATA” section above. Your final project report is supposed to be formed as follows.
Main Manuscript (10 pages maximum)
Please make your main manuscript precise and concise because of the 10-page limit!
1. INTRODUCTION
State the objectives of the project and provide an adequate background of data. This section may also include: variable descriptions, where your data come from,
what scientific question(s) that you can answer using your analysis in the following sections, what contributions your real data analyses of the following sections possibly have (in the literature or in real practice), etc.
2. DATA CHARACTERISTICS
This section may include: exploration data analysis (EDA), descriptive statistics, etc. However, your analysis in this section should have a clearly connection to or a motivation for the following model fitting section. Otherwise, you may consider selecting only the important results to report and shortening this section because of the 10-page limit for the main manuscript. Results should be clear and concise.
This is an official report. Please do not paste any R code and R output (from the console of R) in the main manuscript, otherwise it is really not professional. If you want to report a table, please use a Word table but not pasting R output (from the console of R)! If R code or R output (from the console of R) appears in the main manuscript, you will be deducted 10 marks for this final project. Similar rules also apply for the following sections in the main manuscript.
You may have many tables/figures to report, however, because of the 10-page limit, please only report the most important ones in the main manuscript. Other useful tables/figures mentioned in the main manuscript may be relegated to the appendix. You may also need to adjust the ta- ble/figure size properly in order to satisfy the page limit. If you think some tables/figures are not useful, please do not paste those in the report (either in the main manuscript or in the appendix) at all. Similar rules also apply for the following sections in the main manuscript.
You may summarise your results in tables and figures, however, please use your words to analyse your results but not just stacking results in the main manuscript, since it is an official report. Similar rules also apply for the following sections in the main manuscript.
3. MODEL FITTING AND INTERPRETATION
Based on the one or two datasets that you choose, you need to consider at least two types of models to fit your data from the list below (each bullet point can only count as one type):
• Linear Mixed Effects Model;
• Binary Regression/Binomial Logistic Regression;
• Poisson Log-Linear Regression/Log-linear Regression with Extra-Poisson Vari- ation;
• Multicategory Logistic Regression; and
• Gamma/Exponential Generalised Linear Model.
When you work on the fitting of the two types of models that you choose, you may consider only one dataset but select two different variables as response variables respectively, such that your two types of models can be applied to each of them; or you can consider two datasets respectively corresponding to the two types of models that you select. Note that the selection of the response variables needs to be meaningful and useful in real practice. If you include the fitting results for only one type of models in your report, you will be deducted 30 marks for this final pro ject.
For each type of models, you can use all the other available variables in the dataset (instead of the response variable) as explanatory variables, or you can use a subset of these variables/interaction terms of these variables/transformations of these variables to accomplish your model fitting but you need to clearly indicate
which variables/interaction terms you use in the model. Then please explain the reason why you choose all variables/a subset of variables/variables + some interaction terms/transformations of variables.
Under each type of models, please select at least one fitted model to report. Please explain how you obtain this model. In addition, please explain why you select this model to report, for instance, the reasons can be any one or several from below:
(i) The reported model passes the model diagnostics but the other models may not;
(ii) The AIC/BIC of the reported model is the smallest;
(iii) Some hypothesis testing shows that the reported model can be better than the other models;
(iv) Some variables included in the reported model are important in the interpre- tation of real practice, and hence cannot be eliminated;
(v) The interpretations of the interaction/square/cubic/polynomial terms in the reported model may have some special meanings in real practice;
(vi) and more …
Please also interpret and discuss the model fitting outcome for the fitted models that you select to report. One example of interpretation is trying to report which variables are statistically significant but can this significance be explained by the background of data? You can also consider other discussions and interpretations as long as they are practically useful based on the data background.
Instead of the above compulsory parts, you may also report additional model fitting and interpretations as long as they are useful in real practice and are within the page limit.
4. LIMITATION
Please clearly discuss the possible limitations of the fitted models that you select to report, e.g., model diagnostics, problems in real practice, etc. If you cannot figure out any, please give a reason why the fitted models that you select to report are perfect.
5. CONCLUSION
Please give a short paragraph to summarise your findings of this project.
6. APPENDIX
Appendix (20 pages maximum)
This section should include: R code (NOT R output!). This section may include: additional important tables/figures which are mentioned in the main manuscript, etc.
7. REFERENCE
Please ensure that every reference (if you have any) cited in the text is also present in the reference list (and vice versa).
Report Rubric
Your score for the final project report will be calculated using this rubric. You will be evaluated on a scale of marks on the criteria below. The following table indicates the meaning of each of these scores:
Minimal or no effort.
Needs improvement.
OK, but with many problems.
OK, but several major problems.
9-10 marks
Only minor problems.
11-12 marks Excellent
No problems.
1. The project is comprehensive and complete.
e.g. Have the required steps in the instruction been followed? Has every aspect of the project been thought through and explained? Is the workflow complete with a clear logic?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
2. The report is well-written.
e.g. Is the analysis cogent? Is the report concise and precise? Is the summary of results accurate and neat? Does the grammar mistake affect the understanding of this report? Is the report organised well? Is the transition smooth in the report to show the logic of analysis?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
3. The analysis is correct.
e.g. Is the content statistically correct? Are technical terms used properly? Is the analysis consistent with the principles discussed in class? Is the methodology proper for the data? Is the method relevant to this course?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
4. The interpretation is insightful.
e.g. Do you bring insight to the conclusions reached? Is the analysis accurately addressed based on the background of data? Does the report show a good interpre- tation of output?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
5. Overall impression.
e.g. Does the report have contributions in real practice? Is the interpretation of statistical analysis useful in reality? Does the analysis address some scientific questions of interest? Does the report show effort?
1-2 marks 3-4 marks 5-6 marks 7-8 marks 9-10 marks 11-12 marks
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com