程序代写代做代考 4. Let X be a real-valued random variable with mean µ and standard deviation σ < ∞. The

4. Let X be a real-valued random variable with mean µ and standard deviation σ < ∞. The median of X is the real number m ∈ R satisfying p(X ≥ m) = p(X ≤ m) = 1 2 . If g is some function on R, then arg minx g(x) is the value x ∗ that minimises g(x): that is g(x∗) ≤ g(x) for all x, and minx g(x) = g(x∗). The argmin may not be unique, and so we consider it to be set-valued and thus write x∗ ∈ arg minx g(x). (a) Prove that m ∈ arg min c∈R E (|X − c |) (Hint: break the expectation into two conditional expectations.) [5 points] (b) Prove that |µ − m| ≤ σ. (Hint: The function x 7→ |x | is convex). [5 points] (c) The α-quantile of X is the real number qα which satisfies1 p(X ≤ qα) = α. For τ ∈ (0, 1), define the pinball loss `τ : R→ R via `τ(x) = { τx, x ≥ 0 (τ − 1)x, x < 0 . Show that for any α ∈ (0, 1), qα ∈ arg min c∈R E (`α(X − c)) . (1) [5 points] (d) One can show that µ ∈ arg min c∈R E ( (X − c)2 ) (2) and given (2), by substitution we have σ2 = min c∈R E ( (X − c)2 ) In light of this, and of part (c), for α ∈ (0, 1), give an interpretation of Qα = min c∈R E (`α(X − c)) . Argue that, like σ2, for α ∈ (0, 1), Qα measures the deviation or variability of X . Explain why Qα(X) = 0 only when X is constant. What advantages might Qα have over σ2 as a measure of variability? [5 points] 1 The quantile is not necessarily unique. Observe that q1/2 = m. 5 (e) (Don’t panic!) A property T of a distribution P is a real number that summarises some aspect of the distribution. One often nees to simplify from grappling with an entire distribution to a single number summary of the distribution. Means, variances, and quantiles are all properties, as is the entropy since we can just as well consider the entropy of a random variable X as a property of its distribution P (think of the defintion) and thus write H(P). Expressions such as (1) and (2) are examples of eliciting a property of a distribution. In general2 one might have a function S (called a scoring function) such that for some desired property T of a distribution P one has T(P) ∈ arg min c∈R EY∼PS(c,Y ), (3) where Y ∼ P means that Y is a random variable with distribution P. It turns out3 that not all properties T can be elicited; that is, there is no S such that T(P) can be written in the form (3). For example, the variance can not be elicited. A necessary (and sufficient) condition for a property to be elicitable is that for arbitrary P0, P1 such that T(P0) = T(P1) we have that for all α ∈ (0, 1), T((1 − α)P0 − αP1) = T(P0). The distribution (1 − α)P0 + αP1 is called a mixture of P0 and P1. Construct an example P0, P1 such that H(P0) = H(P1) but for some α ∈ (0, 1), the entropy of the mixture distribution satisfies H((1 − α)P0 − αP1) , H(P0) to prove (using the above fact) that one can not elicit entropy — that is, there can be no equation of the form (3) which yields the entropy H(P). (Hint: Easier than it might look. Start simple!) [5 points] 2In (1) we have S(c, x) = |x − c | and in (2) we have S(c, x) = (x − c)2. 3This is a non-trivial result, which requires several additional technicalities to state precisely. It is proved in [Ingo Steinwart, Chloé Pasin, Robert C. Williamson, and Siyu Zhang, “Elicitation and identification of properties.” In JMLR: Workshop and Conference Proceedings, 35 (Conference on Learning Theory), pp. 1–45. 2014]. 6