Stats
Statistics Assignment 1 HET551 – Design and Development Project 1 Michael Allwright Haddon O’Neill Tuesday, 24 May 2011
1 Normal Approximation to the Binomial Distribution
This section of the assignment shows how a normal curve can be used to approximate the binomial distribution. This section of the assignment was completed using a MATLAB function (shown in Listings 1) which would generate and save plots of the various Binomial Distributions after normalisation, and then calculate the errors between the standard normal curve and the binomial distribution. The plots in Figures 1 and 2 show the binomial distribution for various n trials with probability p = 1
3 and p = 1
2 respectively. These binomial plots have been normalised so that they can be compared with the standard normal distribution. From these plots it can be seen that once the binomial distribution has been normalised, the normal approximation is a good approach to estimating the binomial distribution. To determine its accuracy, the data in Table 1 shows the evaluation of qn = P(bn ≥ µn + 2σn) for both the normal curve and binomial distribution.
qn = P(bn ≥ µn + 2σn) Calculation Error
n N(0, 1) B(n, 1 2 ) B(n, 1
3 ) B(n, 1
2 ) B(n, 1
3 )
1 0.0228 0.0000 0.0000 -0.02278 -0.02278 2 0.0228 0.0000 0.0000 -0.02278 -0.02278 3 0.0228 0.0000 0.0370 -0.02278 0.01426 4 0.0228 0.0000 0.0123 -0.02278 -0.01043 5 0.0228 0.0313 0.0453 0.00847 0.02249 10 0.0228 0.0107 0.0197 -0.01203 -0.00312 20 0.0228 0.0207 0.0376 -0.00208 0.01486 30 0.0228 0.0214 0.0188 -0.00139 -0.00398 40 0.0228 0.0192 0.0214 -0.00354 -0.00134 50 0.0228 0.0164 0.0222 -0.00636 -0.00059
100 0.0228 0.0176 0.0276 -0.00518 0.00479
Table 1: Calculating the error of the normal approximation to the binomial for various n and p
2 Analytical investigation of the Exponential Distribution
For this part of the assignment the density function shown in Equation 1 was given.
f(x) = λe−λx for x ≥ 0 and λ ≥ 0 (1)
Before any calculations were attempted, the area under graph was checked to show that ´∞ −∞f(x) dx = 1. That is
that the total probability of all possible values was 1.
2.1 Derivation of CDF
To find the CDF of the given function, the function was integrated with 0 and x being the lower and upper bound respectively. This derivation is shown in Equations 2 to 4.
CDF =
ˆ x o f(x) dx =
ˆ x o λe−λx dx (2)
2
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 0.33)
P ro
b a
b ili
ty
(a) n = 1
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 0.67)
P ro
b a
b ili
ty
(b) n = 2
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 1.00)
P ro
b a
b ili
ty
(c) n = 3
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 1.33)
P ro
b a
b ili
ty
(d) n = 4
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 1.67)
P ro
b a
b ili
ty
(e) n = 5
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 3.33)
P ro
b a
b ili
ty
(f) n = 10
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 6.67)
P ro
b a
b ili
ty
(g) n = 20
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 10.00)
P ro
b a
b ili
ty
(h) n = 30
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 13.33)
P ro
b a
b ili
ty
(i) n = 40
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 16.67)
P ro
b a
b ili
ty
(j) n = 50
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 33.33)
P ro
b a
b ili
ty
(k) n = 100
Figure 1: Binomial Distribution with p = 1 3
3
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 0.50)
P ro
b a
b ili
ty
(a) n = 1
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 1.00)
P ro
b a
b ili
ty
(b) n = 2
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 1.50)
P ro
b a
b ili
ty
(c) n = 3
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 2.00)
P ro
b a
b ili
ty
(d) n = 4
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 2.50)
P ro
b a
b ili
ty
(e) n = 5
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 5.00)
P ro
b a
b ili
ty
(f) n = 10
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 10.00)
P ro
b a
b ili
ty
(g) n = 20
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 15.00)
P ro
b a
b ili
ty
(h) n = 30
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 20.00)
P ro
b a
b ili
ty
(i) n = 40
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 25.00)
P ro
b a
b ili
ty
(j) n = 50
−5 −4 −3 −2 −1 0 1 2 3 4 5 0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Number of Successes (shifted left by u = 50.00)
P ro
b a
b ili
ty
(k) n = 100
Figure 2: Binomial Distribution with p = 1 2
4
ˆ x o λe−λx dx =
[ −e−λxdx
]x 0
(3)
ˆ x o λe−λx dx = 1 −e−λx (4)
2.2 Mean Calculation
E[X] =
ˆ ∞ −∞
xf(x) dx
E[X] = λ
ˆ ∞ 0
xe−λx dx
To evaluate the integral on the right hand side, integration by parts must be applied as follows
ˆ f(x)g′(x)dx = f(x)g(x) −
ˆ f ′(x)g(x)dx
f(x) = x f ′(x) = 1
g′(x) = e−λx g(x) = −1 λ e−λx
E[X] = λ
ˆ ∞ 0
xe−λx dx =
[ −x λ e−λx
]∞ 0
+ 1
λ
ˆ ∞ 0
e−λxdx
E[X] =
[ −x λ e−λx
]∞ 0
+
[ −1 λ2 e−λx
]∞ 0
E[X] = λ
[( − x
λ −
1
λ2
) e−λx
]∞ 0
(5)
E[X] = λ
( lim x→∞
[( − x
λ −
1
λ2
) e−λx
] − lim x→0
[( − x
λ −
1
λ2
) e−λx
]) as x → ∞ the e−λx of first part decays faster than x
λ grows. This allows the first term of the expression to be
considered zero.
E[X] = (−λ)(− 1
λ2 ) =
1
λ (6)
5
2.3 Variance Calculation
Var(X) =
ˆ (x−E[x])2 f(x) dx (7)
Substituting Equation 1 and 6 into the expression in Equation 7 yields and can be expanded to the following.
Var(X) =
ˆ ∞ 0
(x2 − 2x
λ +
1
λ2 ) λe−λx dx
(Note: to simplify working, the upper and lower bounds have been temporarily omitted)
Var(X) = λ
ˆ x2e−λx dx− 2
ˆ xe−λx dx +
1
λ
ˆ e−λx dx (8)
To solve this expression, integration by parts was utilised to simplify the first term ˆ x2e−λx =
ˆ f(x)g′(x)dx = f(x)g(x) −
ˆ f ′(x)g(x)dx
where:
f(x) = x2 f ′(x) = 2x
g′(x) = e−λx g(x) = −1 λ e−λx
ˆ x2e−λx =
−x2
λ e−λx +
2
λ
ˆ xe−λxdx + C1 (9)
Substituting Equation 9 into the expression in Equation 8.
Var(X) = λ
( −x2
λ e−λx +
2
λ
ˆ xe−λxdx
) − 2 ˆ xe−λx dx +
1
λ
ˆ e−λx dx
Var(X) = −x2e−λx + 2 ˆ xe−λxdx− 2
ˆ xe−λx dx +
1
λ
ˆ e−λx dx
Var(X) = −x2e−λx + 1
λ
ˆ e−λx dx
Var(X) = −x2e−λx − 1
λ2 e−λx + C2
Removing the constant of integration and applying the upper and lower limits of the integral allows this expression to be factorised.
Var(X) =
[ −e−λx
( x2 +
1
λ2
)]∞ 0
Finally evaluating the integral yields:
Var(X) = lim x→∞
[ −e−λx
( x2 +
1
λ2
)] − lim x→0
[ −e−λx
( x2 +
1
λ2
)] Var(X) =
1
λ2
6
2.4 Kth Moment Expression
Starting with the expression Mk = E[Xk] = ´∞
0 xk e−λx dx, integration by parts is used to develop an expression in
terms of Mk−1, where Mk−1 = ´∞
0 xk−1 e−λx dx.
Mk =
ˆ ∞ 0
xk f(x) dx
ˆ f(x)g′(x)dx = f(x)g(x) −
ˆ f ′(x)g(x)dx
f(x) = xk f ′(x) = kxk−1
g′(x) = e−λx g(x) = −1 λ e−λx
Mk =
[ − xk
λ e−λx
]∞ 0
+ k
λ
ˆ ∞ 0
xk−1e−λxdx
Evaluating the limits of the first term and substituting second term for Mk−1 then yields:
Mk = lim x→∞
[ − xk
λ e−λx
] − lim x→0
[ − xk
λ e−λx
] + k
λ Mk−1
Mk = k
λ Mk−1
Hence the recursive expression for the Kth moment of the exponential distribution.
3 Properties of Variance Estimators
3.1 σ2 = Var(Xi) Calculation
For a uniform random variable in the range [0,1], the distribution is represented by:
F(x) =
0
x
1
x < 0
0 < x < 1
1 < x
From this, f(x) can be found given that f(x) = d dx F(x):
f(x) =
0
1
0
x < 0
0 < x < 1
1 < x
or, for simplification
f(x) = 1, 0 < x < 1 otherwise 0
7
0 1 2 3 4 5 6 7 8 9 10
x 10 4
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Sample Size
S a
m p
le V
a ri a
n ce
Figure 3: Variance across 100,000 samples
The expectation of X is defined as:
E[X] =
ˆ xf(x)dx
Substituting in the derived density function:
E[X] =
1ˆ
0
xdx =
[ 1
2 x2 ]1
0
= 1
2
Using this result, the Variance can now be defined such that:
V ar(Xi) =
ˆ 1 0
(x− 1
2 )2dx =
ˆ 1 0 x2 −x−
1
4 dx =
[ 1
3 x3 −
1
2 x2 +
1
4 x
]1 0
= 1
3 −
1
2 +
1
4
V ar(Xi) = 1
12 = 0.083̄3
3.2 Estimation of σ2 using Sample Variance
As stated before, the sample variance is defined as:
V ar(X) =
ˆ (x−E[X])2f(x)dx
To simplify the simulation, the MATLAB function Var(X) to calculate the sample variance. The MATLAB code estimating σ2 by generating a plot of variance of a independent uniform random variables against the number of samples is in Listing 2 MATLAB calculated the mean as µ = 0.0833. Comparing this to what the generated plot shows from the 105 samples that were generated, it is fair assumption for it to be approximately around this figure.
8
0 0.05 0.1 0.15 0.2 0.25 0
500
1000
1500
2000
2500
Sample Variance
N u
m b
e r
o f
S a
m p
le s
Figure 4: Distribution of Sample Variance over 100000 sequences where n = 5
3.3 Deriving the Mean of the distribution of the Sample Variance from simulation.
We have been asked to plot the distribution of the sample variance of a number of sequences of n = 5. To do this, an array of sequences of length 5 were generated and the variance of each sequence was calculated. The MATLAB code to first create the data sets and then plot the distribution is in Listing 3. The result of this simulation is shown in Figure 4. The mean of SamVarSet was calculated out:
σ2 = 0.0833 ≈ 1
12
σ2 ≈ V ar(Xi)
Using the generated Histogram of the distribution, the mean is observed to be approximately the calculated value.
3.4 Estimator Bias Considerations
For this section we are asked to prove that S̃2 = ∑n i=1(Xi−
1 2
)2
n is an unbiased estimator of the variance, which is
defined asV ar(X) = ´
(x−E[X])2f(x)dx. To prove that it is unbiased, we must compare it against both the calculated variance and the sample variance and ensure that the results match.
To do this, we used the result from Part 3.3 using S̃2 = ∑n i=1(Xi−
1 2
)2
n instead of V ar(x) in the MATLAB code in
Listing 3. The same number of samples were generated. This MATLAB code to generate the sample sequences and plot the distribution is shown in Listing 4. The mean of SamVarSetEstimator was calculated out:
S̃2 = 0.0836 ≈ 1
12
S̃2 ≈ V ar(Xi) ≈ σ2
From Figure 5 we can say that S̃2is an unbiased estimator of the variance, as proven by it’s close approximation to to calculated Variance found in Part 3.1 and the sample variance generated by MATLAB, Parts 3.2 and 3.3.
9
0 0.05 0.1 0.15 0.2 0.25 0
500
1000
1500
2000
2500
Sample Variance
N u
m b
e r
o f
S a
m p
le s
Figure 5: Distribution of the Estimated Variance over 100000 sequences of length 5
4 The Gamma Distribution
4.1 Determining the Normalising constant
It is required to find the normalizing constant C contained withing the following function:
f(x) = Cxα−1e−λx, x ≥ 0, C > 0
From research, the standard notation of the probability density function for gamma random variables is as follows:
f(x) = λα
Γ(α) xα−1e−λx, x > 0
Following Γ(α) =
∞̂
0
xa−1e−xdx
∴ C = λα
Γ(α)
4.2 Visualisation of the Gamma distribution with respect to α and λ
To generate plots of gamma densities for differing values of α and λ, the MATLAB code in Listing 5 was generated to show how varying these parameters affects the distribution. These variations are shown in Figure 6. Figure 6 shows the Gamma distributions generated by varying α with respect to x and the probability density at x, for 6 distinct values of λ.
4.3 Relationship between the Gamma Distribution and the Exponential Distribution
The Exponential Distribution is a special case of the Gamma Distribution that occurs when α = 1. This is shown by the following:
f(x) = λα
Γ(α) xα−1e−λx, x > 0
10
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
x
Gamma density for Lambda = 0.100
alpha
(a) h = 0.100
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
x
Gamma density for Lambda = 0.167
alpha
(b) h = 0.167
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.05
0.1
0.15
0.2
0.25
x
Gamma density for Lambda = 0.233
alpha
(c) h = 0.233
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.05
0.1
0.15
0.2
0.25
x
Gamma density for Lambda = 0.300
alpha
(d) h = 0.300
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
Gamma density for Lambda = 0.367
alpha
(e) h = 0.367
0
1
2
3
4
5
6
0 1
2 3
4 5
6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
Gamma density for Lambda = 0.433
alpha
(f) h = 0.433
Figure 6: Gamma Distributions with respect to parameters α and λ
11
where Γ(α) =
∞̂
0
xa−1e−xdx
when α = 1
f(x) = λ1
Γ(1) x1−1e−λx, x > 0
Γ(1) =
∞̂
0
x1−1e−xdx =
∞̂
0
e−xdx = [−xe−x]∞0 = 1
f(x) = λe−λx, x > 0
Hence when α = 1, the expression for the Gamma Distribution simplifies to f(x) = λe−λx, x > 0 which is the Exponential Distribution.
4.4 Derivation of the Mean and Variance
4.4.1 Mean
The following shows the calculation of E[X] for the Gamma Distribution in Equation 10 is derived:
f(x) = λα
Γ(α) xα−1e−λx (10)
E[X] =
ˆ ∞ 0
x λα
Γ(α) xα−1e−λxdx
E[X] = λα
Γ(α)
ˆ ∞ 0
xα−1e−λxdx
E[X] = λα
Γ(α)
ˆ ∞ 0
λα+1
Γ(α + 1) xαe−λxdx
Γ(α + 1)
λα+1
E[X] = λα
Γ(α)
ˆ ∞ 0
λα+1
Γ(α + 1) xαe−λxdx
Γ(α + 1)
λα+1
E[X] = λα
Γ(α)
Γ(α + 1)
λα+1
ˆ ∞ 0
λα+1
Γ(α + 1) xαe−λxdx
Where ´
λα+1
Γ(α+1) xαe−λxdx =
´ λα+1
Γ(α+1) x(α+1)−1e−λxdx is the definition of the PDF:
Γ(α + 1,λ) = 1
E[X] = λα
Γ(α)
Γ(α + 1)
λα+1
12
E[X] = Γ(α + 1)
λΓ(α)
Since Γ(α + 1) = αΓ(α)
E[X] = αΓ(α)
λΓ(α)
E[X] = α
λ
4.4.2 Variance
V ar(X) =
ˆ ∞ 0
x2f(x)dx− (E[X])2
V ar(X) =
ˆ ∞ 0
x2 λα
Γ(α) xα−1e−λxdx− (E[X])2
V ar(X) = λα
Γ(α)
ˆ ∞ 0
xα−1e−λxdx− (E[X])2
V ar(X) = λα
Γ(α)
ˆ ∞ 0
λα+2
Γ(α + 2) x((α+2)−1)e−λxdxλα+2 − (E[X])2
Where ´
λα+1
Γ(α+1) xαe−λxdx =
´ λα+1
Γ(α+1) x(α+1)−1e−λxdx is the definition of the PDF:
Γ(α + 1,λ) = 1
V ar(X) = λα
Γ(α)
Γ(α + 2)
λα+2 − (E[X])2
V ar(X) = λα
Γ(α)
Γ(α + 2)
λα+2 − α2
λ2 (11)
Γ(α) = (α− 1)Γ(α− 1) (12)
Γ(α + 2) = ((α + 2) − 1)Γ((α + 2) − 1)
Γ(α + 2) = (α + 1)Γ(α + 1) (13)
Γ(α + 1) = αΓ(α) (14)
Substituting Equation 14 into Equation 13
Γ(α + 2) = (α + 1)αΓ(α)
13
Γ(α + 2) = (α2 + α)Γ(α) (15)
Substituting Equation 15 into Equation 11
V ar(X) = (α2 + α)Γ(α)
λ2Γ(α) − α2
λ2
V ar(X) = α2 + α−α2
λ2
V ar(X) = α
λ2
4.5 Derivation of the Sample α and λ in terms of the random sample.
From the Sections 4.4.1 and 4.4.2, the actual values of m1 and m2 are defined in as follows:
α
λ = m1
α
λ2 = m2
Rearranging these Equations to give an expression in terms of α and λ.
λ = m1 m2
α = m21 m2
Expressing the moments α and λ in terms m̂1, m̂2 and the random sample
λ̂ = ( ∑n
i=1 xi)(∑n i=1 x
2 i
) α̂ =
( ∑n
i=1 xi) 2
n (∑n
i=1 x 2 i
) 4.6 Validation of Estimator Bias though Simulation
To show that the estimators from Part 4.5 are unbiased, a set of Gamma distributively random variables for three sets of α and λ are generated. The MATLAB code in Listing 6 generates the random variables following the Gamma distribution, compares the Estimators and plots a distribution of the sample mean and sample variance for each of the set of generated data as shown in Figures 7, 8 and 9. A comparison of the distributions to both the Estimated mean and Sample mean is shown in Table 2.
14
0 0.5 1 1.5 2 2.5 3
x 10 −3
0
100
200
300
400
500
600
700
Mean Variance
N u
m b
e r
o f
S a
m p
le s
(a) Sample Mean
0 1 2 3 4 5 6 7 8
x 10 −3
0
200
400
600
800
1000
1200
1400
Sample Variance
N u
m b
e r
o f
S a
m p
le s
(b) Sample Variance
Figure 7: Distributions of Sample Mean and Variance for α = 0.001 and λ = 1
0.115 0.12 0.125 0.13 0.135 0.14 0
100
200
300
400
500
600
Mean Variance
N u
m b
e r
o f
S a
m p
le s
(a) Sample Mean
0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0
100
200
300
400
500
600
700
Sample Variance
N u
m b
e r
o f
S a
m p
le s
(b) Sample Variance
Figure 8: Distributions of Sample Mean and Variance for α = 0.334 and λ = 2.667
0.146 0.148 0.15 0.152 0.154 0.156 0.158 0.16 0.162 0
100
200
300
400
500
600
700
Mean Variance
N u
m b
e r
o f
S a
m p
le s
(a) Sample Mean
0.03 0.032 0.034 0.036 0.038 0.04 0.042 0
100
200
300
400
500
600
700
Sample Variance
N u
m b
e r
o f
S a
m p
le s
(b) Sample Variance
Figure 9: Distributions of Sample Mean and Variance for α = 0.667 and λ = 4.333
15
Set 1 (α = 0.001,λ = 1) Set 2 (α = 0.334,λ = 2.667) Set 3 (α = 0.667,λ = 4.333) Estimated Mean 0.0010 0.1253 0.1539 Simulated Mean 0.0012 0.1248 0.1537
Estimated Variance 0.0010 0.0470 0.0355 Simulated Variance 0.0013 0.0463 0.0348
Table 2: Comparing the simulated and estimated means for 3 sets of different α and λ values
16
function [err_tbl] = napprox ()
% Values for ’number of trials ’ to be investigated n_arr = [1 2 3 4 5 10 20 30 40 50 100]; p_arr = [1/2 1/3];
% Function to calculate the standard normal distribution g = inline(’exp(-(x.^2 ./ 2))/ sqrt (2*pi)’); g_range = -5:0.001:5; g_range_b = 2:0.001:10;
err_tbl = zeros(size(n_arr ,2) ,6); err_tbl (1: size(n_arr ,2)) = n_arr;
% calculate q_n for normal err_tbl (1: size(n_arr ,2) ,2) = sum(g(g_range_b) * 0.001);
for j = 1:1: size(p_arr ,2) for i = 1:1: size(n_arr ,2)
% generate raw PDF x = 0:1: n_arr(i); dat = binopdf(x,n_arr(i),p_arr(j));
% normalise [m v] = binostat(n_arr(i),p_arr(j)); xn = (x - m) / sqrt(v); datn = dat * sqrt(v);
% plot and convert to PDF image figure (1), clf(1), hold on, axis([-5 5 0 0.5]); bar(xn, datn); plot(g_range ,g(g_range),’r’,’LineWidth ’ ,2); xlabel(’Number of Successes ’); ylabel(’Probability ’); saveas(gcf ,sprintf(’figures \\ binpdf(p=%.2f,n=%d).pdf’, p_arr(j), n_arr(i)));
% calculate q_n for binomial q_n = 0; for k = 1:1:( n_arr(i) + 1)
if(xn(k) > 2) q_n = q_n + (datn(k) / sqrt(v));
end end err_tbl(i,j+2) = q_n;
end end
% calculate errors err_tbl (1: size(n_arr ,2) ,5) = err_tbl (1: size(n_arr ,2),3) - err_tbl (1: size(n_arr ,2) ,2); err_tbl (1: size(n_arr ,2) ,6) = err_tbl (1: size(n_arr ,2),4) - err_tbl (1: size(n_arr ,2) ,2);
xlswrite(’err_tbl.xls’,err_tbl );
Listing 1: MATLAB Source Code to Generate the PDF’s of the Binomial Variables
17
%Number of random samples n=100000;
%For comparison , Matlab ’s Var(X) is calculated using sample variance X=rand ([1 n]); xVar=var(X) %using sample variance , plot variance against number of samples xVarn = []; for i=1:1:n
X=rand ([1 i]); %new set of random variables generated for each pass , ensures no bias xVarn = [xVarn Var(X)];
end x:1:1:100000; figure (1), clf(1), plot(xVarn ); ylabel(’Sample Size’); xlabel(’Sample Variance ’); saveas(gcf ,’part b - variance x sample no.pdf’);
Listing 2: MATLAB Source Code to generate a plot of the Variance of a data set contain n samples
%for code simplifacation ’s sake , we will use Var(X) n=5; sampleSetSize =100000; SamVarSet = []; for i=1:1: sampleSetSize a=0; X=rand ([1 n]); SamVarSet = [SamVarSet Var(X)]; end figure , clf , hist(SamVarSet ,100); axis ([0 0.25 0 2500]); xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’part c - variance v samples.pdf’)); mean(SamVarSet)
Listing 3: MATLAB Source Code to generate a plot of the Variance Distribution over 100000 samples
18
n=5; sampleSetSize =100000; SamVarSetEstimator = []; for i=1:1: sampleSetSize a=0; X=rand ([1 n]); for i=1:1:n
a=a+(X(i) -1/2)^2; end
a=a/n; SamVarSetEstimator = [SamVarSetEstimator a]; end figure , clf , hist(SamVarSetEstimator ,100); axis ([0 0.25 0 2500]); xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’part d - variance estimator v samples.pdf’)); mean(SamVarSetEstimator)
Listing 4: MATLAB Source Code to generate a plot of the estimated Variance Distribution over 100000 samples
%f = inline (’((h^a)/( factorial(a -1)))*(x^(a -1))* exp(-h*x)’,’x’,’a’,’h’); %Part B % Using x from -100 to 100 and a =3, h=5
f = inline(’((h^a)/( gamma(a)))*(x^(a-1))* exp(-h*x)’,’x’,’a’,’h’); min = 0; steps = 100; max = 6; step =((max -min)/ steps); hmin =0.1; hsteps =6; hmax =0.5; hstep =((hmax -hmin)/ hsteps ); x=min:step:max -step; a=min:step:max -step; h=hmin:hstep:hmax -hstep; aArray = zeros(steps ,steps ); for i=1:1: numel(h)
for j=1:1: steps for k=1:1: steps
aArray(k,j)=f(k,a(j),h(i)); end
end surfc(x,a,aArray ); view ([1 ,0.5 ,0.5]) ylabel(’alpha’) xlabel(’x’) title(sprintf(’Gamma density for Lambda = %.3f’,h(i))) saveas(gcf ,sprintf(’Q4b3D h=%.3f.pdf’,h(i))); end
Listing 5: MATLAB Source Code to generate a plot of the estimated Variance Distribution over 100000 samples
19
n=10000; %Number of random samples per set of variables sets =3; %number of sets of variables
amin =0.001; amax =1; hmin =1; hmax =6;
astep =((amax -amin)/sets); hstep =((hmax -hmin)/sets);
a=amin:astep:amax -astep; h=hmin:hstep:hmax -hstep;
grand=zeros(3,n);
for i=1:1: steps grand(i,:)= gamrnd(a(i) ,(1/(h(i))) ,[1 n]);
end %Creating the set of mean and var from calculation meanCalc =[]; varCalc =[]; for i=1:1: steps
meanCalc(i)=a(i)/h(i); varCalc(i)=a(i)/(h(i)^2);
end
%Creating a set of mean and var from simulation meanSim =[]; varSim =[]; for i=1:1: steps
meanSim(i)=mean(grand(i ,:)); varSim(i)=var(grand(i ,:));
end
%Code for generating a distribution plot of sample mean and var GrandMean = zeros(3,n); GrandVar = zeros(3,n); for j=1:1:n
for i=1:1: sets grand(i,:)= gamrnd(a(i) ,(1/(h(i))) ,[1 n]); GrandMean(i,j)=mean(grand(i ,:)); GrandVar(i,j)=var(grand(i,:));
end end
%Plotting the distribution of for i=1:1: sets
hist(GrandMean(i,:) ,50) xlabel(’Mean Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’Q4F - Mean a=%.3f h=%.3f.pdf’,a(i),h(i))) hist(GrandVar(i,:) ,50) xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’Q4F - Var a=%.3f h=%.3f.pdf’,a(i),h(i)))
end
%outputs Calculated and Simulated mean and var for comparison meanCalc meanSim varCalc varSim
Listing 6: MATLAB Source Code to Generate Gamma Random Variables and Compare the Estimators to the Sample Mean and Variance
20
- Normal Approximation to the Binomial Distribution
- Analytical investigation of the Exponential Distribution
- Derivation of CDF
- Mean Calculation
- Variance Calculation
- Kth Moment Expression
- Properties of Variance Estimators
- 2=Var(Xi) Calculation
- Estimation of 2 using Sample Variance
- Deriving the Mean of the distribution of the Sample Variance from simulation.
- Estimator Bias Considerations
- The Gamma Distribution
- Determining the Normalising constant
- Visualisation of the Gamma distribution with respect to and
- Relationship between the Gamma Distribution and the Exponential Distribution
- Derivation of the Mean and Variance
- Mean
- Variance
- Derivation of the Sample and in terms of the random sample.
- Validation of Estimator Bias though Simulation