Network Science (Probability Modeling)

profilekomra123
7210-sp21-PS12.pdf

CS 7210: Network Science Spring 2021

Problem Set 1: Probability Modeling

We will use this problem set to present additional concepts from probability theory that will prove useful later in the course. Scan or otherwise ‘back-up’ any pen and paper submission.

1. Practice with Expectation. Define Cov(X,Y ) as the expected value of the random variable (X −E[X])(Y −E[Y ]). This is called the covariance of the random variables X and Y . Show that Cov(X,Y ) = E[XY ]− E[X]E[Y ]. Then explain why Cov(X,Y ) = 0 if and only if X and Y are independent.

2. Conditional Expectation. Consider the following random number generator. First, a fair coin is flipped. On heads, the generator chooses a random number with uniform probability between 2 and 6. On tails, the generator chooses a random number with uniform probability between 8 and 12. Let X be the outcome of the coin flip and Y the number generated.

(a) Find an analytical form for the CDF of Y . Use it to derive E[Y ]. (b) Introduce E[Y |X] as the conditional expectation of Y . It is a random variable whose

support is the set of Y ’s expectations for each value of X. Present the pdf of E[Y |X].

3. Total Expectation. The law of total probability says

P(A) =

∫ B P(A|B)P(B)

where ∫ B

defines the integral (sum if B is discrete) over the full support of B (e.g. ∫ B

= ∫ ∞ −∞

if B ∈ R). Understand this law as the definition of the marginal of A: P(A) = ∫ B P(A,B)

where P(A,B) = P(A|B)P(B) by the definition of conditional probability. Further note how we used the discrete version of the law when defining the normalizing constant of Bayes Rule.

The law of total expectation says something analogously: E[B] = E[E[B|A]] (Note that E[B|A] is a random variable since it is a function of A). Derive E[Y ] from Problem 1 again using the law of total expectation.

4. Parameterized Density Functions. Consider a random variable X with pdf

fX(x ; λ,θ) = κe −λ|x−θ|

where x ∈ R and has fixed parameters λ > 0 and θ ∈ R (In the network science and ML literature we use ; to note fixed parameters. Fixed parameters are also called user-defined or hyper-parameters in the sense that their value is assumed to be provided and fixed.

(a) Show why the constant κ must be equal to λ/2.

1-1

1-2 Homework 1

(b) Derive FX(x). Hint: Note that fX(x) is symmetric around θ.

(c) Show that E[X] = θ.

5. Joint Distributions and Entropy. Next is a primer on some basics of information theory that will come in handy later, embedded with some problems involving expectations and joint distributions.

Consider a random variable X having support {x1,x2, ...} and pmf fX(x) = p(x). For some realization xi of X, Shannon’s theory of information defines the information conveyed by observing xi as I(X = xi) = − log p(xi). Think as if X is a source of information and its support as the set of different types of information it can send. If this source tells you the same thing often, or tells you information that it would send with high probability, then the source is not really conveying significant information. But if the source tells you something surprising (informaation xi with low p(xi)), that is a much richer signal. For example, an alarm system X sending the symbol ‘no intruder’ is a very high probability event, so high (we hope) that in practice it may as well not send you anything. If X sends ‘intruder present’, a low probability message, surely you agree that this is a far more informative message. The form of I(xi) is chosen to satisfy some intuitive properties of information.

The entropy H of a random variable X is the expected information conveyed when you observe a realization of X:

H(X) = E[I(X)] = −E[log p(x)] = − ∑ xi

p(xi) log p(xi)

(a) Let X be the result of a fair 6-sided die roll and Y the result of a fair coin flip where Y = 1 if the coin flips heads and 0 if it flips tails. Define A = X + Y and B = X −Y . Show that A and B have identical entropy.

(b) The joint entropy of two random variables H(X,Y ) is defined naturally:

H(X,Y ) = −E[log p(x,y)] = − ∑ xi,yj

p(xi,yj) log p(xi,yj)

Find the joint entropy of A and B.

(c) For two random variables X,Y the entropy of X conditioned on realizing Y = yj is

H(X|yj) = −EX[log p(x|yj)] = − ∑ xi

p(xi|yj) log p(xi|yj)

Then the conditional entropy of X given Y is the expected entropy of X over all possible realizations of Y :

H(X|Y ) = EY [H(X|yi)]

Show that H(X|Y ) = ∑

x,y p(x,y) log p(y) p(x,y)

and that H(X,Y ) = H(X|Y ) + H(Y ). (The latter is the chain rule of entropy which says, intuitively, that the expected infor- mation content of observations from the joint distribution of X,Y equals the expected information content of observations from Y alone plus the expected additional informa- tion from observing X given an observation of Y ).

Homework 1 1-3

(d) Return to the scenario in part (a). Compute H(A|B) and H(B|A) and verify that H(A,B) = H(A|B) + H(B) = H(B|A) + H(A).

6. Simulation and Visualization. Consider the following scenario: we have n people in a room who do not know each other. A coin is flipped exactly once for each pair of people, and if the coin flips heads, the pair exchange business cards with each other. The coin flips heads with probability p.

(a) Code a simulation of the above scenario. Present a plot of the average number of business cards each person receives in your simulation for p = 0.1, 0.3, 0.5, and 0.8 as n varies. Choose n = {10, 20, 50, 100, 200, 500, 1000}. Include bars around each point that illustrates the range of business card exchanges that happen 95% of the time. Show curves (not just points) for different values of p on the same figure. Be sure to run the simulation for a fixed n and p a large number of times so that you have enough data to compute a meaningful average.

(b) For p = 0.3 and n = 20, visualize three simulation outcomes as a network where an edge denotes a business card exchange between two individuals.

Very Important: All plots and visualizations should be attractive, meaning the plots and networks should be presentable in a text book or academic research paper and be aesthetically pleasing. Take the time to learn to create an attractive plot. You will lose points for submitting “lazy looking” plots.

7. A modeling problem. There are n different types of prizes you can win at a festival game. Suppose that every time you win it is equally likely that you get any type of these prizes. Let X be the random variable corresponding to the number of wins you need before obtaining one of every type of prize.Let Xj be the random variable corresponding to the number of wins needed after j different types of prizes have been won until a new type of prize is obtained for j = 0, 1, ...n− 1. What is E[X]? To answer this:

(a) First show that X = ∑n−1

j=0 Xj.

(b) Now, after j types of prizes have been won, on your next win, what is the probability of getting a new type of prize?

(c) Let p be the answer from above. Use p to present the probability mass function of Xj.

(d) Combine (a) and (c) to derive E[X].