Maths Group Project :
Central Limit Theorem
Copyright By PowCoder代写 加微信 powcoder
In this report we investigate the behaviour of the Central Limit Theorem. The theorem is first described in general with some of its history. Then are stated the conditions under which the theorem holds with some examples where it can be used in real life. Finally, the theorem was tested for several distributions using Python, including an extreme case where it should fail.
1. Descriptive Task
Description
The Central Limit Theorem (CLT) is a statistical concept which states that the sample mean distribution of a random variable will assume a near-normal or normal distribution if the sample size is large enough. More simply, as the size of the sample increases, the sampling distribution of the mean will approach a normal distribution, regardless of the shape of the original population distribution.
The CLT forms the basis of the probability distribution, where it is able to show us how population estimates behave when they are sampled repeatedly. One of the most important components of the CLT is that the mean of the sample will be the same as the mean of the population and the same goes for the standard deviation. So, as the sample size gets larger, the distribution of the means from the repeated sample will normalise and resemble a normal distribution, which will mirror the means and standard deviation of the whole population. (1)
Various conditions must be met for the Central Limit Theorem to hold. Most of them are statistical restrictions, yet the first one is a purely mathematical constraint. For the theorem to work the distribution from which the samples are drawn must have a finite mean and variance (2).
This implies that the Central Limit Theorem does not apply to a Cauchy distribution, since it has the particularity to have no mean nor variance, they simply do not exist.
Furthermore, apart from this purely theoretical constraint the Central Limit Theorem has some
practical restrictions. The first being that the samples must be taken randomly. Indeed, fake data can easily falsify the theorem.
Moreover, each sample must be independent from the others, otherwise bias could interfere. Also, the samples should be large enough. In general a sample size between 30 and 50 is enough for the theorem to hold. However, for really skewed distributions it may not be enough. But, in real life examples the distributions are not skewed enough to need more than 50 numbers per sample.
Finally, if the samples are drawn without replacement, which is generally the case in real studies, the sample size should be no larger than 10% of the total population (3).
For the purpose of our study the last condition is irrelevant since the samples are drawn from a programme.
Examples- The Central Limit Theorem(CLT) is used in a variety of ways in the accounting/finance industry and also the science industry. For example, in the finance industry the CLT is useful when examining the returns of an individual stock or broader indices. This is because the analysis is very simple as getting the necessary financial data is easy. It is quite common in the finance/accounting industry for investors to use the Central Limit Theorem to analyse stock returns, construct portfolios and also manage risks. For example, a scenario arises where there is an investor and they wish to analyse the overall return for a stock index that comprises 1000 equities. In this case, the investor will have to study a random sample of stocks to be able to estimate the returns of the total index. To be on the safe side, the investor should use at least 30-50 randomly selected stocks across various sectors and they should be sampled using the Central Limit Theorem. The previously selected stocks must be swapped out with different names to eliminate bias.
(https://www.investopedia.com/terms/c/central_limit_theorem.asp#:~:text=The%20CLT%20is%20useful%20when,construct%20portfolios%2C%20and%20manage%20risk.)
Furthermore, in the science industry data scientists on the CLT by making statistical inferences about data. The theorem gives data scientists the ability to quantify the likelihood that the sample will deviate from the population without having the need to take other samples for comparisons. Hypothesis testing and confidence intervals are all based on the CLT. If the sample fits within a normal distribution we can conclude that 68% of the observations will lie within one standard deviation from the population mean, 95% will lie within two standard deviations of the mean and so on. From the CLT, 4 inferences can be made which are the inferences of a valid sample, population, population and a valid sample and two different valid samples. Data scientists should be able to deeply understand this theorem, explain it and understand why it is important.
(https://www.kdnuggets.com/2016/08/central-limit-theorem-data-science.html#:~:text=The%20Central%20Limit%20Theorem%20is,sample%20to%20compare%20it%20with.)
When the CLT fails to hold true:
As mentioned above, the major conditions required for the CLT to work on a distribution is having finite variance and the sample being independent and identically distributed of each other. However this is only for the classical CLT, also known as Lindeberg–Lévy CLT. There exist stable distributions such as the Cauchy and Pareto distributions that do not have finite variance and sometimes an infinite mean as well, which means that the classical CLT does not apply towards those distributions. However, a generalised form of the CLT states that the summation of a distribution of random variables having a Paretian tail (which is, infinite variance) decreasing and tending to a stable distribution as the number of summands grows. For dependant data such as a time series, the classical CLT also fails to prove true, but an improvisation called the Martingale central limit theorem tackles this to an extent when it generalises the time dependance to martingales which are stochastic processes with a change of t and t+1.
* Voit, Johannes (2003). “Section f5.4.3”. The Statistical Mechanics of Financial Markets. Texts and Monographs in Physics. Springer-Verlag. ISBN 3-540-00978-7.
* Hall, Peter; C. C. Heyde (1980). Martingale Limit Theory and Its Application. : Academic Press. ISBN 0-12-319350-8.
1. Coding Task
(Expected results and prediction)
i) Binomial distribution
For the binomial_CLT function:
N is the number of trials
p is the probability of a success
size is the number of random numbers generated of which we will take the mean e.g. the mean of size=50 random numbers
num is the number of different means we want to find e.g. the mean of 50 random numbers num=100 times
opt_size is an optional argument which, if passed, adds a second curve to the graph which corresponds to a different value of size i.e. the mean of 50 random numbers opt_size=500 times.
iv) Cauchy distribution
Conclusion
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com