python stats
Statistics and Data Analysis
Evaluation Exercise for Parameter Estimation
Let us consider the following pdf:
f(x) = 4x2 (
(3− 3x) + 2
3 ρ (4x− 3)
) with x limited to the range [0, 1]
1. Check analytically that f(x) is a properly normalized pdf. Find analytical expressions for the distribution (or population) mean and variance, as a function of ρ.
2. Produce a Monte Carlo sample following the pdf f(x), assuming ρ = 0.75, with N = 1000 entries.
3. Compute the sample mean and variance of the Monte Carlo sample from exercise 2 and compare them to the values obtained in exercise 1. (Remember that the sample mean and variance are estimators of the distribution mean and variance, respectively). Compute also the variance of the mean estimator.
4. The expression of the distribution mean obtained in exercise 1 can be written as 〈x〉 = a+ bρ. Generate 5 Monte Carlo samples, for ρ = 0, 0.25, 0.5, 0.75, 1.0 respectively, each with N entries, and use the estimated means (and associated uncertainties) to estimate the values of a and b, and their covariance matrix, using the least-squares method. Do it analytically (taking into account that 〈x〉 is linear in a and b) and by numerical minimization of the χ2
expression. Compare both results between them and with the values obtained in exercise 1.
5. Since the sample mean, x̄, is an estimator of the population mean, we can use the relation between the population mean and ρ from exercise 1 to define an estimator for ρ, ρ̂1: x̄ = a+ bρ̂1
(a) Find analytical expressions for the variance and bias of ρ̂1.
(b) For ρ = 0.75, compute numerically the variance of ρ̂1, using: n = 100 independent Monte Carlo simulations of N entries each.
(c) Compare the values obtained in parts 5a and 5b.
6. Let ρ̂2 be the maximum likelihood estimator of ρ for L(ρ|x) = ∏N
i=1 f(xi|ρ). Assuming one of the Monte Carlo samples generated in exercise 5 is the result from the real experiment, compute:
(a) The estimated value ρ̂2
(b) The variance of the estimator ρ̂2 using the logLmin + 1/2 rule, compare it with the values obtained in exercise 5.
(c) The 95% confidence level interval for the estimated ρ̂2 value
1