Stats

assignment.pdf

Home >Business & Finance homework help >Accounting homework help >Stats

Statistics Assignment 1 HET551 – Design and Development Project 1 Michael Allwright Haddon O’Neill Tuesday, 24 May 2011

1 Normal Approximation to the Binomial Distribution

This section of the assignment shows how a normal curve can be used to approximate the binomial distribution. This section of the assignment was completed using a MATLAB function (shown in Listings 1) which would generate and save plots of the various Binomial Distributions after normalisation, and then calculate the errors between the standard normal curve and the binomial distribution. The plots in Figures 1 and 2 show the binomial distribution for various n trials with probability p = 1

3 and p = 1

2 respectively. These binomial plots have been normalised so that they can be compared with the standard normal distribution. From these plots it can be seen that once the binomial distribution has been normalised, the normal approximation is a good approach to estimating the binomial distribution. To determine its accuracy, the data in Table 1 shows the evaluation of qn = P(bn ≥ µn + 2σn) for both the normal curve and binomial distribution.

qn = P(bn ≥ µn + 2σn) Calculation Error

n N(0, 1) B(n, 1 2 ) B(n, 1

3 ) B(n, 1

2 ) B(n, 1

3 )

1 0.0228 0.0000 0.0000 -0.02278 -0.02278 2 0.0228 0.0000 0.0000 -0.02278 -0.02278 3 0.0228 0.0000 0.0370 -0.02278 0.01426 4 0.0228 0.0000 0.0123 -0.02278 -0.01043 5 0.0228 0.0313 0.0453 0.00847 0.02249 10 0.0228 0.0107 0.0197 -0.01203 -0.00312 20 0.0228 0.0207 0.0376 -0.00208 0.01486 30 0.0228 0.0214 0.0188 -0.00139 -0.00398 40 0.0228 0.0192 0.0214 -0.00354 -0.00134 50 0.0228 0.0164 0.0222 -0.00636 -0.00059

100 0.0228 0.0176 0.0276 -0.00518 0.00479

Table 1: Calculating the error of the normal approximation to the binomial for various n and p

2 Analytical investigation of the Exponential Distribution

For this part of the assignment the density function shown in Equation 1 was given.

f(x) = λe−λx for x ≥ 0 and λ ≥ 0 (1)

Before any calculations were attempted, the area under graph was checked to show that ´∞ −∞f(x) dx = 1. That is

that the total probability of all possible values was 1.

2.1 Derivation of CDF

To find the CDF of the given function, the function was integrated with 0 and x being the lower and upper bound respectively. This derivation is shown in Equations 2 to 4.

CDF =

ˆ x o f(x) dx =

ˆ x o λe−λx dx (2)

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 0.33)

P ro

b a

b ili

(a) n = 1

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 0.67)

P ro

b a

b ili

(b) n = 2

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 1.00)

P ro

b a

b ili

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 1.33)

P ro

b a

b ili

(d) n = 4

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 1.67)

P ro

b a

b ili

(e) n = 5

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 3.33)

P ro

b a

b ili

(f) n = 10

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 6.67)

P ro

b a

b ili

(g) n = 20

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 10.00)

P ro

b a

b ili

(h) n = 30

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 13.33)

P ro

b a

b ili

(i) n = 40

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 16.67)

P ro

b a

b ili

(j) n = 50

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 33.33)

P ro

b a

b ili

(k) n = 100

Figure 1: Binomial Distribution with p = 1 3

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 0.50)

P ro

b a

b ili

(a) n = 1

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 1.00)

P ro

b a

b ili

(b) n = 2

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 1.50)

P ro

b a

b ili

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 2.00)

P ro

b a

b ili

(d) n = 4

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 2.50)

P ro

b a

b ili

(e) n = 5

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 5.00)

P ro

b a

b ili

(f) n = 10

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 10.00)

P ro

b a

b ili

(g) n = 20

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 15.00)

P ro

b a

b ili

(h) n = 30

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 20.00)

P ro

b a

b ili

(i) n = 40

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 25.00)

P ro

b a

b ili

(j) n = 50

−5 −4 −3 −2 −1 0 1 2 3 4 5 0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Number of Successes (shifted left by u = 50.00)

P ro

b a

b ili

(k) n = 100

Figure 2: Binomial Distribution with p = 1 2

ˆ x o λe−λx dx =

[ −e−λxdx

]x 0

(3)

ˆ x o λe−λx dx = 1 −e−λx (4)

2.2 Mean Calculation

E[X] =

ˆ ∞ −∞

xf(x) dx

E[X] = λ

ˆ ∞ 0

xe−λx dx

To evaluate the integral on the right hand side, integration by parts must be applied as follows

ˆ f(x)g′(x)dx = f(x)g(x) −

ˆ f ′(x)g(x)dx

f(x) = x f ′(x) = 1

g′(x) = e−λx g(x) = −1 λ e−λx

E[X] = λ

ˆ ∞ 0

xe−λx dx =

[ −x λ e−λx

]∞ 0

+ 1

ˆ ∞ 0

e−λxdx

E[X] =

[ −x λ e−λx

]∞ 0

[ −1 λ2 e−λx

]∞ 0

E[X] = λ

[( − x

λ −

λ2

) e−λx

]∞ 0

(5)

E[X] = λ

( lim x→∞

[( − x

λ −

λ2

) e−λx

] − lim x→0

[( − x

λ −

λ2

) e−λx

]) as x → ∞ the e−λx of first part decays faster than x

λ grows. This allows the first term of the expression to be

considered zero.

E[X] = (−λ)(− 1

λ2 ) =

λ (6)

2.3 Variance Calculation

Var(X) =

ˆ (x−E[x])2 f(x) dx (7)

Substituting Equation 1 and 6 into the expression in Equation 7 yields and can be expanded to the following.

Var(X) =

ˆ ∞ 0

(x2 − 2x

λ +

λ2 ) λe−λx dx

(Note: to simplify working, the upper and lower bounds have been temporarily omitted)

Var(X) = λ

ˆ x2e−λx dx− 2

ˆ xe−λx dx +

ˆ e−λx dx (8)

To solve this expression, integration by parts was utilised to simplify the first term ˆ x2e−λx =

ˆ f(x)g′(x)dx = f(x)g(x) −

ˆ f ′(x)g(x)dx

where:

f(x) = x2 f ′(x) = 2x

g′(x) = e−λx g(x) = −1 λ e−λx

ˆ x2e−λx =

−x2

λ e−λx +

ˆ xe−λxdx + C1 (9)

Substituting Equation 9 into the expression in Equation 8.

Var(X) = λ

( −x2

λ e−λx +

ˆ xe−λxdx

) − 2 ˆ xe−λx dx +

ˆ e−λx dx

Var(X) = −x2e−λx + 2 ˆ xe−λxdx− 2

ˆ xe−λx dx +

ˆ e−λx dx

Var(X) = −x2e−λx + 1

ˆ e−λx dx

Var(X) = −x2e−λx − 1

λ2 e−λx + C2

Removing the constant of integration and applying the upper and lower limits of the integral allows this expression to be factorised.

Var(X) =

[ −e−λx

( x2 +

λ2

)]∞ 0

Finally evaluating the integral yields:

Var(X) = lim x→∞

[ −e−λx

( x2 +

λ2

)] − lim x→0

[ −e−λx

( x2 +

λ2

)] Var(X) =

λ2

2.4 Kth Moment Expression

Starting with the expression Mk = E[Xk] = ´∞

0 xk e−λx dx, integration by parts is used to develop an expression in

terms of Mk−1, where Mk−1 = ´∞

0 xk−1 e−λx dx.

Mk =

ˆ ∞ 0

xk f(x) dx

ˆ f(x)g′(x)dx = f(x)g(x) −

ˆ f ′(x)g(x)dx

f(x) = xk f ′(x) = kxk−1

g′(x) = e−λx g(x) = −1 λ e−λx

Mk =

[ − xk

λ e−λx

]∞ 0

+ k

ˆ ∞ 0

xk−1e−λxdx

Evaluating the limits of the first term and substituting second term for Mk−1 then yields:

Mk = lim x→∞

[ − xk

λ e−λx

] − lim x→0

[ − xk

λ e−λx

] + k

λ Mk−1

Mk = k

λ Mk−1

Hence the recursive expression for the Kth moment of the exponential distribution.

3 Properties of Variance Estimators

3.1 σ2 = Var(Xi) Calculation

For a uniform random variable in the range [0,1], the distribution is represented by:

F(x) =

 

x < 0

0 < x < 1

1 < x

From this, f(x) can be found given that f(x) = d dx F(x):

f(x) =

 

x < 0

0 < x < 1

1 < x

or, for simplification

f(x) = 1, 0 < x < 1 otherwise 0

0 1 2 3 4 5 6 7 8 9 10

x 10 4

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Sample Size

S a

m p

le V

a ri a

n ce

Figure 3: Variance across 100,000 samples

The expectation of X is defined as:

E[X] =

ˆ xf(x)dx

Substituting in the derived density function:

E[X] =

1ˆ

xdx =

[ 1

2 x2 ]1

= 1

Using this result, the Variance can now be defined such that:

V ar(Xi) =

ˆ 1 0

(x− 1

2 )2dx =

ˆ 1 0 x2 −x−

4 dx =

[ 1

3 x3 −

2 x2 +

4 x

]1 0

= 1

3 −

2 +

V ar(Xi) = 1

12 = 0.083̄3

3.2 Estimation of σ2 using Sample Variance

As stated before, the sample variance is defined as:

V ar(X) =

ˆ (x−E[X])2f(x)dx

To simplify the simulation, the MATLAB function Var(X) to calculate the sample variance. The MATLAB code estimating σ2 by generating a plot of variance of a independent uniform random variables against the number of samples is in Listing 2 MATLAB calculated the mean as µ = 0.0833. Comparing this to what the generated plot shows from the 105 samples that were generated, it is fair assumption for it to be approximately around this figure.

0 0.05 0.1 0.15 0.2 0.25 0

500

1000

1500

2000

2500

Sample Variance

N u

m b

e r

o f

S a

m p

le s

Figure 4: Distribution of Sample Variance over 100000 sequences where n = 5

3.3 Deriving the Mean of the distribution of the Sample Variance from simulation.

We have been asked to plot the distribution of the sample variance of a number of sequences of n = 5. To do this, an array of sequences of length 5 were generated and the variance of each sequence was calculated. The MATLAB code to first create the data sets and then plot the distribution is in Listing 3. The result of this simulation is shown in Figure 4. The mean of SamVarSet was calculated out:

σ2 = 0.0833 ≈ 1

σ2 ≈ V ar(Xi)

Using the generated Histogram of the distribution, the mean is observed to be approximately the calculated value.

3.4 Estimator Bias Considerations

For this section we are asked to prove that S̃2 = ∑n i=1(Xi−

1 2

n is an unbiased estimator of the variance, which is

defined asV ar(X) = ´

(x−E[X])2f(x)dx. To prove that it is unbiased, we must compare it against both the calculated variance and the sample variance and ensure that the results match.

To do this, we used the result from Part 3.3 using S̃2 = ∑n i=1(Xi−

1 2

n instead of V ar(x) in the MATLAB code in

Listing 3. The same number of samples were generated. This MATLAB code to generate the sample sequences and plot the distribution is shown in Listing 4. The mean of SamVarSetEstimator was calculated out:

S̃2 = 0.0836 ≈ 1

S̃2 ≈ V ar(Xi) ≈ σ2

From Figure 5 we can say that S̃2is an unbiased estimator of the variance, as proven by it’s close approximation to to calculated Variance found in Part 3.1 and the sample variance generated by MATLAB, Parts 3.2 and 3.3.

0 0.05 0.1 0.15 0.2 0.25 0

500

1000

1500

2000

2500

Sample Variance

N u

m b

e r

o f

S a

m p

le s

Figure 5: Distribution of the Estimated Variance over 100000 sequences of length 5

4 The Gamma Distribution

4.1 Determining the Normalising constant

It is required to find the normalizing constant C contained withing the following function:

f(x) = Cxα−1e−λx, x ≥ 0, C > 0

From research, the standard notation of the probability density function for gamma random variables is as follows:

f(x) = λα

Γ(α) xα−1e−λx, x > 0

Following Γ(α) =

∞̂

xa−1e−xdx

∴ C = λα

Γ(α)

4.2 Visualisation of the Gamma distribution with respect to α and λ

To generate plots of gamma densities for differing values of α and λ, the MATLAB code in Listing 5 was generated to show how varying these parameters affects the distribution. These variations are shown in Figure 6. Figure 6 shows the Gamma distributions generated by varying α with respect to x and the probability density at x, for 6 distinct values of λ.

4.3 Relationship between the Gamma Distribution and the Exponential Distribution

The Exponential Distribution is a special case of the Gamma Distribution that occurs when α = 1. This is shown by the following:

f(x) = λα

Γ(α) xα−1e−λx, x > 0

0 1

2 3

4 5

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

Gamma density for Lambda = 0.100

alpha

(a) h = 0.100

0 1

2 3

4 5

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Gamma density for Lambda = 0.167

alpha

(b) h = 0.167

0 1

2 3

4 5

0.05

0.1

0.15

0.2

0.25

Gamma density for Lambda = 0.233

alpha

0 1

2 3

4 5

0.05

0.1

0.15

0.2

0.25

Gamma density for Lambda = 0.300

alpha

(d) h = 0.300

0 1

2 3

4 5

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Gamma density for Lambda = 0.367

alpha

(e) h = 0.367

0 1

2 3

4 5

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Gamma density for Lambda = 0.433

alpha

(f) h = 0.433

Figure 6: Gamma Distributions with respect to parameters α and λ

where Γ(α) =

∞̂

xa−1e−xdx

when α = 1

f(x) = λ1

Γ(1) x1−1e−λx, x > 0

Γ(1) =

∞̂

x1−1e−xdx =

∞̂

e−xdx = [−xe−x]∞0 = 1

f(x) = λe−λx, x > 0

Hence when α = 1, the expression for the Gamma Distribution simplifies to f(x) = λe−λx, x > 0 which is the Exponential Distribution.

4.4 Derivation of the Mean and Variance

4.4.1 Mean

The following shows the calculation of E[X] for the Gamma Distribution in Equation 10 is derived:

f(x) = λα

Γ(α) xα−1e−λx (10)

E[X] =

ˆ ∞ 0

x λα

Γ(α) xα−1e−λxdx

E[X] = λα

Γ(α)

ˆ ∞ 0

xα−1e−λxdx

E[X] = λα

Γ(α)

ˆ ∞ 0

λα+1

Γ(α + 1) xαe−λxdx

Γ(α + 1)

λα+1

E[X] = λα

Γ(α)

ˆ ∞ 0

λα+1

Γ(α + 1) xαe−λxdx

Γ(α + 1)

λα+1

E[X] = λα

Γ(α)

Γ(α + 1)

λα+1

ˆ ∞ 0

λα+1

Γ(α + 1) xαe−λxdx

Where ´

λα+1

Γ(α+1) xαe−λxdx =

´ λα+1

Γ(α+1) x(α+1)−1e−λxdx is the definition of the PDF:

Γ(α + 1,λ) = 1

E[X] = λα

Γ(α)

Γ(α + 1)

λα+1

E[X] = Γ(α + 1)

λΓ(α)

Since Γ(α + 1) = αΓ(α)

E[X] = αΓ(α)

λΓ(α)

E[X] = α

4.4.2 Variance

V ar(X) =

ˆ ∞ 0

x2f(x)dx− (E[X])2

V ar(X) =

ˆ ∞ 0

x2 λα

Γ(α) xα−1e−λxdx− (E[X])2

V ar(X) = λα

Γ(α)

ˆ ∞ 0

xα−1e−λxdx− (E[X])2

V ar(X) = λα

Γ(α)

ˆ ∞ 0

λα+2

Γ(α + 2) x((α+2)−1)e−λxdxλα+2 − (E[X])2

Where ´

λα+1

Γ(α+1) xαe−λxdx =

´ λα+1

Γ(α+1) x(α+1)−1e−λxdx is the definition of the PDF:

Γ(α + 1,λ) = 1

V ar(X) = λα

Γ(α)

Γ(α + 2)

λα+2 − (E[X])2

V ar(X) = λα

Γ(α)

Γ(α + 2)

λα+2 − α2

λ2 (11)

Γ(α) = (α− 1)Γ(α− 1) (12)

Γ(α + 2) = ((α + 2) − 1)Γ((α + 2) − 1)

Γ(α + 2) = (α + 1)Γ(α + 1) (13)

Γ(α + 1) = αΓ(α) (14)

Substituting Equation 14 into Equation 13

Γ(α + 2) = (α + 1)αΓ(α)

Γ(α + 2) = (α2 + α)Γ(α) (15)

Substituting Equation 15 into Equation 11

V ar(X) = (α2 + α)Γ(α)

λ2Γ(α) − α2

λ2

V ar(X) = α2 + α−α2

λ2

V ar(X) = α

λ2

4.5 Derivation of the Sample α and λ in terms of the random sample.

From the Sections 4.4.1 and 4.4.2, the actual values of m1 and m2 are defined in as follows:

λ = m1

λ2 = m2

Rearranging these Equations to give an expression in terms of α and λ.

λ = m1 m2

α = m21 m2

Expressing the moments α and λ in terms m̂1, m̂2 and the random sample

λ̂ = ( ∑n

i=1 xi)(∑n i=1 x

2 i

) α̂ =

( ∑n

i=1 xi) 2

n (∑n

i=1 x 2 i

) 4.6 Validation of Estimator Bias though Simulation

To show that the estimators from Part 4.5 are unbiased, a set of Gamma distributively random variables for three sets of α and λ are generated. The MATLAB code in Listing 6 generates the random variables following the Gamma distribution, compares the Estimators and plots a distribution of the sample mean and sample variance for each of the set of generated data as shown in Figures 7, 8 and 9. A comparison of the distributions to both the Estimated mean and Sample mean is shown in Table 2.

0 0.5 1 1.5 2 2.5 3

x 10 −3

100

200

300

400

500

600

700

Mean Variance

N u

m b

e r

o f

S a

m p

le s

(a) Sample Mean

0 1 2 3 4 5 6 7 8

x 10 −3

200

400

600

800

1000

1200

1400

Sample Variance

N u

m b

e r

o f

S a

m p

le s

(b) Sample Variance

Figure 7: Distributions of Sample Mean and Variance for α = 0.001 and λ = 1

0.115 0.12 0.125 0.13 0.135 0.14 0

100

200

300

400

500

600

Mean Variance

N u

m b

e r

o f

S a

m p

le s

(a) Sample Mean

0.038 0.04 0.042 0.044 0.046 0.048 0.05 0.052 0.054 0.056 0

100

200

300

400

500

600

700

Sample Variance

N u

m b

e r

o f

S a

m p

le s

(b) Sample Variance

Figure 8: Distributions of Sample Mean and Variance for α = 0.334 and λ = 2.667

0.146 0.148 0.15 0.152 0.154 0.156 0.158 0.16 0.162 0

100

200

300

400

500

600

700

Mean Variance

N u

m b

e r

o f

S a

m p

le s

(a) Sample Mean

0.03 0.032 0.034 0.036 0.038 0.04 0.042 0

100

200

300

400

500

600

700

Sample Variance

N u

m b

e r

o f

S a

m p

le s

(b) Sample Variance

Figure 9: Distributions of Sample Mean and Variance for α = 0.667 and λ = 4.333

Set 1 (α = 0.001,λ = 1) Set 2 (α = 0.334,λ = 2.667) Set 3 (α = 0.667,λ = 4.333) Estimated Mean 0.0010 0.1253 0.1539 Simulated Mean 0.0012 0.1248 0.1537

Estimated Variance 0.0010 0.0470 0.0355 Simulated Variance 0.0013 0.0463 0.0348

Table 2: Comparing the simulated and estimated means for 3 sets of different α and λ values

function [err_tbl] = napprox ()

% Values for ’number of trials ’ to be investigated n_arr = [1 2 3 4 5 10 20 30 40 50 100]; p_arr = [1/2 1/3];

% Function to calculate the standard normal distribution g = inline(’exp(-(x.^2 ./ 2))/ sqrt (2*pi)’); g_range = -5:0.001:5; g_range_b = 2:0.001:10;

err_tbl = zeros(size(n_arr ,2) ,6); err_tbl (1: size(n_arr ,2)) = n_arr;

% calculate q_n for normal err_tbl (1: size(n_arr ,2) ,2) = sum(g(g_range_b) * 0.001);

for j = 1:1: size(p_arr ,2) for i = 1:1: size(n_arr ,2)

% generate raw PDF x = 0:1: n_arr(i); dat = binopdf(x,n_arr(i),p_arr(j));

% normalise [m v] = binostat(n_arr(i),p_arr(j)); xn = (x - m) / sqrt(v); datn = dat * sqrt(v);

% plot and convert to PDF image figure (1), clf(1), hold on, axis([-5 5 0 0.5]); bar(xn, datn); plot(g_range ,g(g_range),’r’,’LineWidth ’ ,2); xlabel(’Number of Successes ’); ylabel(’Probability ’); saveas(gcf ,sprintf(’figures \\ binpdf(p=%.2f,n=%d).pdf’, p_arr(j), n_arr(i)));

% calculate q_n for binomial q_n = 0; for k = 1:1:( n_arr(i) + 1)

if(xn(k) > 2) q_n = q_n + (datn(k) / sqrt(v));

end end err_tbl(i,j+2) = q_n;

end end

% calculate errors err_tbl (1: size(n_arr ,2) ,5) = err_tbl (1: size(n_arr ,2),3) - err_tbl (1: size(n_arr ,2) ,2); err_tbl (1: size(n_arr ,2) ,6) = err_tbl (1: size(n_arr ,2),4) - err_tbl (1: size(n_arr ,2) ,2);

xlswrite(’err_tbl.xls’,err_tbl );

Listing 1: MATLAB Source Code to Generate the PDF’s of the Binomial Variables

%Number of random samples n=100000;

%For comparison , Matlab ’s Var(X) is calculated using sample variance X=rand ([1 n]); xVar=var(X) %using sample variance , plot variance against number of samples xVarn = []; for i=1:1:n

X=rand ([1 i]); %new set of random variables generated for each pass , ensures no bias xVarn = [xVarn Var(X)];

end x:1:1:100000; figure (1), clf(1), plot(xVarn ); ylabel(’Sample Size’); xlabel(’Sample Variance ’); saveas(gcf ,’part b - variance x sample no.pdf’);

Listing 2: MATLAB Source Code to generate a plot of the Variance of a data set contain n samples

%for code simplifacation ’s sake , we will use Var(X) n=5; sampleSetSize =100000; SamVarSet = []; for i=1:1: sampleSetSize a=0; X=rand ([1 n]); SamVarSet = [SamVarSet Var(X)]; end figure , clf , hist(SamVarSet ,100); axis ([0 0.25 0 2500]); xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’part c - variance v samples.pdf’)); mean(SamVarSet)

Listing 3: MATLAB Source Code to generate a plot of the Variance Distribution over 100000 samples

n=5; sampleSetSize =100000; SamVarSetEstimator = []; for i=1:1: sampleSetSize a=0; X=rand ([1 n]); for i=1:1:n

a=a+(X(i) -1/2)^2; end

a=a/n; SamVarSetEstimator = [SamVarSetEstimator a]; end figure , clf , hist(SamVarSetEstimator ,100); axis ([0 0.25 0 2500]); xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’part d - variance estimator v samples.pdf’)); mean(SamVarSetEstimator)

Listing 4: MATLAB Source Code to generate a plot of the estimated Variance Distribution over 100000 samples

%f = inline (’((h^a)/( factorial(a -1)))*(x^(a -1))* exp(-h*x)’,’x’,’a’,’h’); %Part B % Using x from -100 to 100 and a =3, h=5

f = inline(’((h^a)/( gamma(a)))*(x^(a-1))* exp(-h*x)’,’x’,’a’,’h’); min = 0; steps = 100; max = 6; step =((max -min)/ steps); hmin =0.1; hsteps =6; hmax =0.5; hstep =((hmax -hmin)/ hsteps ); x=min:step:max -step; a=min:step:max -step; h=hmin:hstep:hmax -hstep; aArray = zeros(steps ,steps ); for i=1:1: numel(h)

for j=1:1: steps for k=1:1: steps

aArray(k,j)=f(k,a(j),h(i)); end

end surfc(x,a,aArray ); view ([1 ,0.5 ,0.5]) ylabel(’alpha’) xlabel(’x’) title(sprintf(’Gamma density for Lambda = %.3f’,h(i))) saveas(gcf ,sprintf(’Q4b3D h=%.3f.pdf’,h(i))); end

Listing 5: MATLAB Source Code to generate a plot of the estimated Variance Distribution over 100000 samples

n=10000; %Number of random samples per set of variables sets =3; %number of sets of variables

amin =0.001; amax =1; hmin =1; hmax =6;

astep =((amax -amin)/sets); hstep =((hmax -hmin)/sets);

a=amin:astep:amax -astep; h=hmin:hstep:hmax -hstep;

grand=zeros(3,n);

for i=1:1: steps grand(i,:)= gamrnd(a(i) ,(1/(h(i))) ,[1 n]);

end %Creating the set of mean and var from calculation meanCalc =[]; varCalc =[]; for i=1:1: steps

meanCalc(i)=a(i)/h(i); varCalc(i)=a(i)/(h(i)^2);

end

%Creating a set of mean and var from simulation meanSim =[]; varSim =[]; for i=1:1: steps

meanSim(i)=mean(grand(i ,:)); varSim(i)=var(grand(i ,:));

end

%Code for generating a distribution plot of sample mean and var GrandMean = zeros(3,n); GrandVar = zeros(3,n); for j=1:1:n

for i=1:1: sets grand(i,:)= gamrnd(a(i) ,(1/(h(i))) ,[1 n]); GrandMean(i,j)=mean(grand(i ,:)); GrandVar(i,j)=var(grand(i,:));

end end

%Plotting the distribution of for i=1:1: sets

hist(GrandMean(i,:) ,50) xlabel(’Mean Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’Q4F - Mean a=%.3f h=%.3f.pdf’,a(i),h(i))) hist(GrandVar(i,:) ,50) xlabel(’Sample Variance ’); ylabel(’Number of Samples ’); saveas(gcf ,sprintf(’Q4F - Var a=%.3f h=%.3f.pdf’,a(i),h(i)))

end

%outputs Calculated and Simulated mean and var for comparison meanCalc meanSim varCalc varSim

Listing 6: MATLAB Source Code to Generate Gamma Random Variables and Compare the Estimators to the Sample Mean and Variance

Normal Approximation to the Binomial Distribution
Analytical investigation of the Exponential Distribution

Derivation of CDF
Mean Calculation
Variance Calculation
Kth Moment Expression

Properties of Variance Estimators

2=Var(Xi) Calculation
Estimation of 2 using Sample Variance
Deriving the Mean of the distribution of the Sample Variance from simulation.
Estimator Bias Considerations

The Gamma Distribution

Determining the Normalising constant
Visualisation of the Gamma distribution with respect to and
Relationship between the Gamma Distribution and the Exponential Distribution
Derivation of the Mean and Variance

Mean
Variance

Derivation of the Sample and in terms of the random sample.
Validation of Estimator Bias though Simulation