CS计算机代考程序代写 Excel Third homework

Third homework

The datasets related to the third homework can be found in xlsx format (Excel) in the

Moodle at Week 11 with a short summary, the appropriate file is named

Homework3_data. Everyone should carry out the analysis on the dataset (different

sheets in the Excel), which is next to Her/His Neptun code in the table below.

You can get at most 11 points for the third homework. You should upload a shortly

commented, transparent R script and a textual analysis with at least 3000 characters

(Times New Roman, 12pt, 1.5 line space) in Word or PDF to the Moodle. You should

summarize the applied statistical methods, interpret the gained results and make logical

conclusions. The Word/PDF file should have a title page with Your name, and it also has

to contain the name of the dataset.

Homeworks that are really similar to each other will get 0 points.

Deadline: 2021.12.13. Monday 11:55 pm, Moodle

Tasks:

1. Import the data in an R data frame. Determine the type of variables, transform the

nominal and logical variables. Handle the possible mistakes in the data (excluding

strange values, etc.). Pay attention to factor variables.

2. Build a binary logistic regression model with all the possible explanatory

variables (the dependent variable is highlighted by yellow color in the Excel).

Interpret the significant beta coefficients.

3. Model selection based on the significance levels and information criteria if it is

reasonable.

4. Analyzing the classification matrix with a 50% cut-off for the best model. Interpret

all 5 related measures of explanatory power (accuracy, recalls and precisions).

5. Select the positive category of the dependent variable, create a ROC curve, and evaluate

the model based on the area under the ROC curve (AUC).

6. Change the cut-value, and explain the new, chosen cut-off based on a chosen

(learnt) method. Present the changes of the classification matrix and the related

measures of explanatory power because of the new cut-value.

7. Evaluate the best model based on the McFadden’s pseudo R^2 and the global Chi^2

test (independence test).

The adequate R script (code) worth 5 points, while the analysis is 6 points at most.

You have to interpret the parameters and statistical measures, but the analysis has

to be even wider. You have to make conclusions based on the results. For instance: Why

should we exclude some explanatory variables from the model? Is there any surprising in

the signs of the coefficients? Evaluate the classification matrix properly! What is the

reasoning behind the chosen cut-value? Etc.

If You have any questions, do not hesitate to write me an e-mail (zsombor.szadoczki@uni-

corvinus.hu) or write me on Teams. Good luck! 🙂

Neptun

code
Dataset

bhxc4h Women

c49ja2 MBA

d341y8 MBA

dp7pwa Default

dszh0c MBA

eeghro Employees

fiiz02 Women

foyx5z MBA

gyoyqp Employees

he1mfo MBA

hgv9nd Women

i7xfpl Employees

icw7i0 Default

iinq0p MBA

jfvcsl MBA

jhxyhn Employees

jmv2fp Women

kb9ovg Women

lukyuy Default

ndkj74 Women

ospyt6 Employees

pewimw Employees

rkmzti Default

sen7m8 Women

um8chw Default

urghwh MBA

ve3tdh Women

vg8b92 Women

w9g4w6 MBA

wu6brx Employees

xfa90n Women

mailto:zsombor.szadoczki@uni-corvinus.hu
mailto:zsombor.szadoczki@uni-corvinus.hu