程序代写代做代考 Excel Andrew Falcone

Andrew Falcone

EM-623 A

Homework #1

I pledge my honor that I have abided by the Stevens Honor System.

Andrew Falcone 10/3/2016

Homework #1: Analysis of Churn.txt

1.) Explore whether there are missing values for any of the variables.

 Excel:

o In order to check for missing values in the document, the following measures

were taken:

1) Utilizing an “IF” function to check for blank cells

 A copy of each column of data was created prior to employing the

“IF” function


 The “IF” function states that if the original cell is blank, it will

return a value of “No Value” in the new cell, otherwise it will

return the original value

 A search of the file was then completed in order to find any cells

which contain “No Value”


 From the search completed above, one can conclude that there is

no missing values in the document

 Rattle

o There are two methods that can be utilized in Rattle in order to determine if there

are any missing values, both of which will be illustrated.

o First, the dataset “churn.txt” must be imported into Rattle—this is completed by

going to the “Data” tab and browsing within the “Filename” section until the data

is found

1) Note that the separator for this data set is a comma, so a comma is inserted

into the Separator box

o Next, the Explore tab is chosen, and a report can be ran: in this case, the options

“Summary”, “Describe”, and “Show Missing” are utilized

By doing so, both the “Description” and “Show Missing” options will show how many

values are missing per column of data

From the “Description” method, it can be seen that there are 0 missing numbers

o
Similarly, the “Show Missing” function outputs a value of 0 missing values (in [2,] row)

for each variable

2.) Compare the area code and state fields. Discuss any apparent abnormalities.

 Excel

o In order to properly compare the two fields, new columns of data were first

created that isolated them

o Next, the State column was sorted A through Z in order to observe the data on a

state-to-state basis

o After analyzing the data, it seems as if every state within the file contains a

random combination of the same three Area Codes, namely 408, 415, and 510

(snippets of Alaska and Delaware can be seen above). This seems like a clear

discrepancy, as every state usually boasts their own unique set of area codes, and

not instead share area codes throughout the entire country.

o After doing further research on actual Area Codes by state (source:

http://www.worldatlas.com/na/us/area-codes.html) it was discovered that the three

apparent area codes (408,415, and 510) are all Area Codes specific to one state—

California

o Therefore, the abnormality is confirmed, and the Area Codes are incorrect for

every state besides California

 Rattle

o Another “Describe” exploration was executed, and the State and Area Code

variables were focused on:

o By looking at the results for the Area Code variable, it can be seen that there are

only 3 unique values, when in reality each state should have at least one unique

Area Code. Thus, the unique value should be greater than or equal to 51, which is

the unique value for the State variable—this is the abnormality

o As discovered before, all three unique values within the Area Code variable (408,

415, 510) are all Area Codes for California; thus, the abnormality is that the other

states have incorrect inputs for Area Code

http://www.worldatlas.com/na/us/area-codes.html

3.) Use a graph to determine visually whether there are any outliers among the number of calls to

customer service.

 Excel

o The first step to building a histogram is to determine the range of the Customer

Service calls data set; this was completed by calculating the Min and Max values

of the data set by utilizing the MIN() and MAX() functions

o Next, the “Histogram” function within the Data Analysis plug-in was utilized in

order to construct the graph

In the above snippet, the Input Range is the entire data set for Customer Service

calls, and the Bin Range is the range of the data set (0-9). After selecting the

“Chart Output” option, the graph was generated, as well as the associated chart

o From the graph above, it can be assumed that the sub-datasets represented by 7, 8,

and 9 Calls to Customer Service are definitely outliers

 Rattle

o Under the Describe exploration, the following information was obtained for the

Customer Service Calls variable:

o From this information, it seems like there are a few values which can be

considered outliers; however a graphical representation is better

o To generate graphs for this information, the first step was to transform the

Customer Service Calls variable from numerical to categorical—this is completed

by going to the “Transform” tab, then selecting “Recode” and “As Categoric”,

and finally choose the variable to complete the transformation

200

400

600

800

1000

1200

1400

F
re

q
u

e
n

Number of Calls Made to Customer Servcie

Calls to Customer Service

Frequency

o Next, the graph is generated by returning to the “Explore” tab and selecting

“Distributions”, then going down to the Categoric section and selecting “Bar Plot”

next to the transformed Customer Service Call variable

o
o From the above histogram, it can be concluded that 7, 8, and 9 Customer Service

Calls are almost certainly outliers

4.) Normalize the night minutes attribute using either Z-score or Min-Max transformation

 Excel

o For this scenario, Min-Max will be utilized in order to normalize the data;

meaning that the following formula will be employed:

o First, the Min and Max of the dataset was generated using the MIN() and MAX()

functions as seen below:

o Next, the Range was calculated by subtracting the Max value by the Min value

o Finally, the dataset was normalized by using the Min-Max formula as seen above:

In this case, “BG3” is the original data point, “BJ2” is the Min value, and “BJ4” is

the range. The dollar signs ($) are utilized for the Min and Range values in order

to keep these values constant while carrying out the normalization for the entire

dataset. A snippet of the normalized data can be seen below:

Note: the Min Value (33) will have a value of 0, and Max (175) will be 1.

 Rattle

o Normalizing data is relatively simple in Rattle, as it can be done by selecting

“Rescale” under the Transform tab, and then selecting “Scale [0-1]” before

clicking on the Night Minutes variable and clicking Execute

o Once executed, Rattle will generate a new variable, which is the normalized Night

Minutes variable

o Below are the original values, followed by the normalized dataset

Related Posts