Stats Discussion

profilecsht
replies.docx

Elizabeth Boissy 

Week 3: Probability Distributions

COLLAPSE

Top of Form

1. A discrete probability distribution examines any discrete variable, variables that are confined to a finite number of values. A simple example that has real world implications would be classroom size or number of students per class. The number of students is a discrete variable; you cannot have 1.5 or 1.75 students in your classroom. We can examine a hypothetical data set gathered from numerous elementary schools county-wide over a single school year. The Schools and Staffing Survey (SASS) gathers this type of information and lets us know that the average class-size for a self-contained class that you might see in an elementary public school in Massachusetts is 20 students. Public schools are structured based around state mandates so we can assume that the data is normally distributed around the central average. Any extreme outliers or skew in the data would indicate a violation of state education protocols. In this scenario, an approximate standard deviation could be 3 students. This means that the average deviation from the mean (20 students) is plus or minus 3 students. Class size is often debated as a possible variable in student success; understanding the discrete probability distribution could help specific elementary schools, school districts, or the county as a whole, examine stressors or successes to students and schools. Smaller classrooms allow for more one-on-one attention between the teacher and the student which can translate to greater engagement and more retentive learning. As COVID19 remains a prevalent threat across the U.S., reopening schools and matters of class size are even more important for student and teacher safety.

2. To stay with the theme of education and public school, an example of a continuous probability distribution could be the physical space available within the school buildings. Measuring the size of each room in feet and inches provides continuous variables for the data set, length and width are not contained and can take on any value within a range greater than 0. New safety standards being quickly adopted in public spaces during COVID19 includes staying at least 6 feet apart from one another. If you wanted to maintain the average class size of 20 students, you would need to know how much physical space is available in each classroom to maintain social distance. A hypothetical data set containing classroom dimensions is also likely to follow a normal distribution with perhaps a standard deviation of 15 square feet. While school buildings vary in size and style, classrooms are traditionally a regular rectangular shape that fits 20 to 30 desks comfortably. In this hypothetical data set let’s say average classroom size across the county is 900 square feet. Therefore, this probability distribution can show the probability of how many classrooms will fall below or above the minimum size requirement for new social distancing measures. With the 6 feet separation marker as a guide and a goal of 20 students plus at least one instructor per classroom, schools are estimating that classrooms will need at least 1029 square feet. The continuous probability distribution will help school administrators understand the likely number of classrooms that can be safely utilized while still practicing preventative safety.

Bottom of Form

Shannon Song 

Hypergeometric & Exponential Distributions

COLLAPSE

Top of Form

1. Discrete Probability Distribution. 

Hypergeometric probability distribution is a discrete probability distribution that is used to predict the probability of succeeding picks without replacement out of a finite population. A brief example of how this distribution is used and why it is different from others: say we have 5 balls in a container, with 3 being glaucos and the remaining 2 being eburnean. If we wanted to know the probability of picking glaucos balls in the first three picks without replacing them, then we would use the hypergeometric distribution. Some might ask with the binomial distribution is not used instead, since it is simpler to calculate and seems like it might work for this situation. One of the main reasons is one of the conditions that must be met to use this distribution requires consistent probability across trials. We could use the binomial distribution if we were replacing the balls after use. A real-life example of the hypergeometic distribution: let us say that a manager at a company wishes to take a random survey of their employees for job satisfaction. The company is relatively small with only 13 male employees and 18 female employees. The manager decides to sample 8 employees at random and wants to know how likely it would be that 4 of these employees are male. Using the formula, we get K = 13, k = 4, N = 31, n = 8. Our probability comes out to 0.277, that 4 of these picks would be male. Our expected value would come out to 3.35 males for 8 picks. This expected value is clearly impossible for one session of these picks, but it is the expected value over time. If we were to continue taking 8 pick trials, eventually we expect that the value would become 3.35 per 8 picks. The variance for this situation comes out to 1.488 and taking the square root, we end up with a standard deviation of 1.22. Why might this be helpful to know for the manager? Perhaps the manager suspects there to be differences between the two genders, or wishes to know what the distribution between male and female would be for the 8 picks. If the distribution is too skewed, the manager might wish to make two drawing pools, or decide to use an alternate method for sampling. For example, with a standard deviation of 1.22 and an expected value of 3.35, the manager might decide the results would be too skewed in favor of female and decide to opt for two different drawing pools.  

Reference:

Stat Trek. (2020). Retrieved July 25, 2020, from https://stattrek.com/probability-distributions/hypergeometric.aspx 

 

2. Continuous Probability Distribution 

 

The exponential distribution is a continuous distribution that deals with predicting the time waited between events. It is similar to the Poisson distribution (a discrete distribution), however the Poisson deals with the number of events during a time period. One example I see very often with the exponential distribution is how much time will pass between earthquakes. Calculating the expected value is very straight forward with the (gamma)^-1, where gamma stands for the average. Similarly, finding the standard deviation is very simple, starting with the variance which is the same form as the expected value, however this value is squared. From variance we know that obtaining the standard deviation is a matter of taking the square root. Returning to our previous example, let’s say on average in a region on earth there is an average waiting time of 6.3 years between earthquakes. This value is our gamma. Our expected value then would come out to 0.158. Our standard deviation would come out to 0.158. We would expect there to be 0.158 of an earthquake per year. Of course, as seen earlier, this doesn’t make sense. However, over a large period of time, the average per year would come out to 0.158. We can use the exponential formula to calculate the likelihood of there being an earthquake in any given time frame or how long between earthquakes we are likely to wait. Why is this useful? One industry that might find this immensely useful would be the insurance industry. Being able to accurately predict when natural disasters or any disaster for that matter, occur helps the company ensure they have enough payouts on hand when they need it. Knowing when payouts are likely and in what quantity helps insurance companies price their products and decide whether they wish to offer a particular line of insurance. Let’s say there is a region that expects an earthquake per year – insurance companies likely won’t offer coverage in this area. Let's say this same region also has a standard deviation of 1. Insurance companies may be more likely then to offer insurance in this region at a premium.  

Reference:

Taboga, M. (2010). Exponential distribution. Retrieved July 25, 2020, from https://www.statlect.com/probability-distributions/exponential-distribution 

Bottom of Form