CS代写 Descriptive statistics for continuous features

Descriptive statistics for continuous features
Fundamentals of Machine Learning for Predictive Data Analytics,
Appendix A

Central tendency
Central tendency refers to the value that is typical of the sample Arithmetic mean (or sample mean, or mean)

Median and mode
Median: the middle value when you order the values from lowest to highest
Mode: most commonly occurring value

Variation
Range = max – min
• Very sensitive to outliers
Range(sample_Fig1) = 163-140 = 23 Range(sample_Fig3) = 192-102=90

Variance
Variance
(for the sample in Fig 1) (for the sample in Fig 3)

Standard deviation
Standard deviation
sd(sample_Fig1) = 8.08 sd(sample_Fig3) = 31.94

Percentiles
ith percentile:
proportion of 𝑖 of the values in a sample are equal or lower
than the i
th
100
percentile

1st and 3rd quartile
• Lower quartile (or 1st quartile)
the median of the lower half of the data
• Upper quartile (3rd quartile)
the median of the upper half of the data

Descriptive statistics for categorical features

Frequency count and proportion

Mode
Mode – most frequent level
Second mode – 2nd most frequent level
• Example
Mode → “guard”
2nd mode → “forward”