statistics assignment lab 2

profileChekwube00
week5labassignmentdescriptionvideotranscript.docx

Hello. So today what I want to do is just give you a brief overview of what you're going to be doing. Fear of week five lab. To get your week five lab course, you want to hit modules. And then you're going to come down to course week five. And you'll see the week five lab assignment era. When we opened the week five lab assignment, a couple things you wouldn't now is that you are going to be using your week three and week five excel spreadsheets. So to get the week three spreadsheet in case you haven't used. Meanwhile, you can go to Modules. And under week three, you want to go to the lesson and where the lessons are were all the spreadsheets are located. So we have the week three lesson. Here's the spreadsheet. And of course, if you've got a week five, week five lesson, and here's the week five spreadsheet. Okay, so you're going to need both for the lab for week five. So not only do I want to give you an overview of the lab, but first I wanted to talk about just the content that the ladders assessing. So you want to make sure that you're downloading the week five lecture notes. Give you a moment to download that, and then follow along with the lecture. Feel free to manipulate these. These are your notes. So anything that you want to write down to help you kind of remember the contents, feel free to do so. And of course, you can use this as a resource when you actually complete your lab. So as I said before, you'll need the two Excel spreadsheets, the week three spreadsheet and The Week five spreadsheet. And so what I wanna do is I want to just talk about data. So for example, let's say that I want to know I'm, I'm the principal of a school, and I want to know how my seniors have done on their math midterm exam. Ok. So my seniors, all of them a good representation of how about how they did on the exam. And so what I did is I, I picked a class and said, hey, there's ten people in this class and this is their, their grades. So I have these ten students in one classroom, and I pull their grades. And here they are. They need a 506874778086, 6801790. And I'm going to base what all of my seniors did based on this class. Okay? So the first thing we're gonna do is we're gathering this data. You want to create a description spreadsheet to describe the students surveyed. So what could we do? So I'm gonna open up a new sprite, new XL spreadsheet. I'm going to make it bigger because my eyes are so bad, are heavier. So we could put a name, we could put gender. We could put age. What else can we, you know, where they live, city in which they live? I know that a school here in my town was consists of students that live in three different cities. They're very close to each other. We don't have these huge big cities, but three different towns and make up this, the school so we could put city and then of course, grade. But there are other things that we can look at, right? But these are just a few. So I'm going to make this kind of a worksheet here, this Excel worksheet just kinda describe the people in my sample, okay? And so how does you choose? It's for your study. What was the sampling method? Okay. Well, in order to determine the sampling method, you have to know the different types. And so I'm just gonna go over a few here. One of them is cluster, a cluster sample, or cluster sample takes place. Let me draw. Here we go. Takes place when a population is divided into groups. One group is chosen, and every person in that chosen group is the part of the sample. Okay? So if I were to draw a picture to kind of illustrate this, it's like if you have, let's say, an entire area. And I take this area and I divide it. And I divide it into different groups. Have this group here, this group here. And then I pick a group and I sample every single person in that group. That's a cluster. Something that you can think about when you think about clusters are like zip codes. That's one. Everybody who lives in this town is going to have this zip code. Oh man, everybody moves in. This town's going to have another zip code. Those are clusters. If your town has more than one high school, for example, and you're trying to determine which school your kid is going to go to. Maps, probably broken up our clusters. Everyone who lives here is going to go to this high school and everyone who lives here is going to get bussed to this high school. And so that's how they figure out who goes where it's by clusters. Stratified. The population is divided. And a lot of students get cluster and stratified, confused. So it's good that we're talking about it now. It's the population is divided into groups. So, so far the same. But a sample is chosen for each group. So if you go back to our picture and you have this population and the population is broken up. So not one group is chosen. Every single group has chosen to take a few from here. I'm going to take a few from here. I'm going to sit here and I'm going to take a few from here. That's stratified. And the way that you're grouping usually has some type of label and those are called strata. So for example, let me give you an example. Let's say that I needed to draw blood. I needed to draw blood for to check my I'm not a nurse or doctor. That kind of OK. And I'm making this up, but let's say that you have to draw my blood in the morning, in the afternoon and in the evening. Okay. To check glucose levels or something. I don't know. But I'm splitting my data. Are Ty I'm splitting it up into morning, afternoon, and evening. And then you're going to pick some times in here to draw my blood. You're going to pick some times in here to draw my bread. And you're going to pick some times here to draw my blood. You divided my day, right? The hours in my day. So that's you dividing into groups and then you're picking times within each group to test my blood. Okay. So that would be stratified. Systematic is if you choose every Kth person to add to your set, to add to your sample. So I was looking at every third person that walked in the door and I ask them to be a part of my sample that would be systematic. Can lead is honestly the easiest one to do. The easiest one to do. But it is not usually a good representation of your population, not usually the best representation of your population. So for example, if you had an assignment where you have to go and talk to ten adults, which ten adults would you choose? You would probably choose like your significant other or require mom or, you know, you would talk to your siblings, something like that. Some adults that you would know say, hey, I have this assignment helping out real quick and answer these questions. Of course they're going to do it because they're trying to help you write. And so air. And sometimes it's just not a great representation of the overall population of what you're trying to make an inference on. So that's why convenient, even though it's the easiest one to do, is usually knocker best, Simple Random issues when everyone has the same chance of being selected. Okay? So simple random, let me think of something like that. It's like when you put names in a hat right here taking everybody knew putting their name. Shaking it up and you're gonna draw me now. And so everyone pretty much has the same chance of being selected. So that would be simple granted. Ok, so I want you to think about the scenario that I just came up with. A principal at a high school. All of my students are in class and I wanted to see, and I'm going down the Mac wing obeys that everybody's in math class. And I want to see how my seniors are doing in on their midterm exam. So I choose one classroom and I tests every single person in that classroom and I pull their midterm grade. So what sampling method will that be? If you said cluster, you are correct. Okay. Because if you think about it, I have this whole hallway that's divided into classes, math classes. And then I chose one math class, and I pulled grades of every single person in that math class in order to help me with this study. Okay. What part of the country did your study take place then? I'm in Texas. What are the age ranges of your participant? 16 to 18 years old. How many of each gender did you have in your study? Mmm. Okay. So then I can go back to my description and I'm pulling these the questions, these answers to these questions from the description sheet there we're coming up with. Okay, so that's why we want to come up with this kind of description sheets so we can tell a little bit about our sample, ok. And then one other interesting factors about your group. Okay, so you want to enlist some interesting facts. Take a screenshot of your description spreadsheet and provided below. So it's very important that you learn how to take a screenshot. There's plenty of resources you can Google. I'm pretty sure you can reach out to your instructor and they'll let you know how to take a screenshot. It's different for a Mac versus a PC. But here I have a student, B, C, D. And then I'll have all run their gender. Ages 18. And feel free to do that. Feel free to open up a Sheets. Odd speed. Ok, hopefully you get the point. And so we're having this, this description of your data. And then I'm going to take a screenshot and now I have a Mac. So I'm just using the application grab to kinda come over here and copy and paste, copy, copy. And then I'm gonna come over here and provide my screenshot below. So I want you hopefully to do a description XL spreadsheet and then bring it to your lab and see if you could paste it. Okay, so we have Oliver knows and I posted the little bit that I had. I am sheet, but hopefully you can do more. Okay, because I want you to come up with ten students. I want you to come up with. So you come up with ten different names, their genders, age, the city they live in. And then I want you to post these grades next to them. So you're going to look a little bit different than mine, but I'm giving you an idea of what you should be tribe and then take a screenshot and provide it. Now, we're going to take a screenshot of our preliminary calculations. So we're going to do that together, okay? You want to get your week three spreadsheet, and this is which are weak three spreadsheet looks like. Okay. And when you do your week, the spreadsheet, I'm going to get rid of this one. Okay? When you have your R33 spreadsheet, you're going to enter in the data points. So let's enter in our data into the week three spreadsheet. So we're going to have 50 and I'm gonna go across 5068. Make sure that you have your spreadsheet and you're entering the min as well. Seven or 77. And eighty six. Eighty six, seventy eight, and ninety. And of course, I just wanted to take a little look just to make sure that I put them incorrectly. Okay, let's step. Okay, here we go. So here are the ten values. And as you can see, as I was entering those values, the yellow values were changing, right? Because it's an automatic calculation. So we have our mean, median and mode, okay? And then we also have sample variance and population variance, which we're going to talk about here in just a sec it, so if you come here, I want to take a screenshot of my preliminary calculations. Someday come over here and grab my preliminary calculations. Copy. Paste. There we go. Okay, and then let's see, head and pull values from the table. So what is our mean? Okay, well our mean is 76.9. So 76.9, our sample standard deviation is 11.396 for the score on the midterm. Oh, and what's cool about it here, just to say, okay, so I want to talk to you about something, sample standard deviation versus population. It's not Bo, you either have a sample or you have a population. Ok. You don't want to add both of these values into your report. So you have to know is what you pulled a sample or is what you pulls a population. Okay, so whenever you're pulling a study, whenever you're doing a study, you want to make sure it wait, did I just talked to a sample or did I talked to everybody that could possibly be talked to for this representation. So this is a sample. We pulled a sample we didn't poll the grade of every single senior at school, every single senior who took mid-term. We do not do that. We only pulled ten of the seniors pay. So this would be a sample. So no at no time, what I use the population variance and population standard deviation. Okay, so this is a sample. Therefore, I disregard the population values. If it was a population, I would disregard the sample values. You do not use Bo. I know it calculates both, but it's because remember, technology only takes you so far. It's going to calculate both the, you have to know the difference between whether you have a sample or whether you have a population. Okay, so let's pretend we also took the midterm and we're a senior. Ok. So this is not me, I'm the principal. So you took the midterm, you're a senior, you to get what did you make? So let's say that we all scored 85. We will look at us, okay, so we scored an 85. Say everyone put 85 groundwork to pretend that your senior uses the midterm and you score an 85. So let's compare our score. How does your score compare to the mean? Right? We did better or higher than the mean. We are writing in capitals, as you can see us see, well. So we're above the mean. M. Is your score more or less the same? I would say we're about we're above the mean score or the mean was 76.9 and we sorted in 85. So you want to write a little sentence. You're saying that you were above the mean. The mean being 76.9 and use scoring and 85. Okay, so now we're gonna do is we're going to look at the empirical rule. The empirical rule, we are going to determine where sixty eight percent, ninety five percent, and ninety nine point seven percent of the values of the data lies. Still. Go back to your wheat three spreadsheet. No, just kidding, week five. So now I'm gonna do Week five spreadsheet. So here's the week five spreadsheet. And if you look at the bottom you have for the empirical rule. So I'm gonna click on the empirical rule. I'm going to make it a little bit larger just because my eyes are bad. Alright, here we go. Now, again, in this spreadsheet, you do not touch anything in yellow, you only touch the blue. Ok. So if you notice our blue is asking for the mean. Let me go back to our lab here. Are mean was 76.9. So 76.9, check how the yellow changes. There we go. And our standard deviation is 11.3964. And there we go. Okay, so there's the Empirical Rule. Let's take a screenshot. Okay, we know how to do that now. Okay, so to the screenshot of my data. Now, where do these values tell you? Why do we even do the empirical rule? What is this telling us? A lot of students make the mistake of saying, oh, well, that means that 68% of my sample fall between here and here. So remember, we're taking this class, these ten students in this class. And we're going to make an inference. About our entire population. So what we're saying is that we are sure that 68% of the data, I'm going to write this out. Fill attrition rate inside here. Okay? So I believe that 68% of test scores of the midterm, the midterm test scores maybe I should say, will fall between what do we think? 65.588.1976400.3. Okay. So 68 queen, I'm sorry, 60% of the scores will fall between those two values. 95%, that's almost the majority of the test scores, right? Not almost it is the majority of the test. 95% of the midterm test scores will fall between Hobbes nulla 50 or 0.199.7. Okay? 95% of the scores are going to fall between there. And then last but not least, 99.7% of the midterm test scores will fall between a 42.7 and a 100 or 11.01, which is probably way above the algebra scent mark, okay, for this, for, so we think that almost all of your data is going to fall between here. Is there possibility that fall outside of that? Sure. And that's why it's only 99.7. But we're thinking net in between here. Here, and here is the majority Look at that, that's a huge chunk of our curve. Of the data scores are going to fall between a 42.7 and 0.01111. That for those values tell you, Okay. So I'm pretty sure as I get all of these scores and that 95% of the scores are gonna fall between 54.199%.7. So that's kind of how my senior her doing. Alright, then what we do is we can look at a normal distribution. So I already took a screenshot of my empirical rule, so we'll know which, again, the distance should be at the normal exam. So take the normal distribution sheet. Ok, so where's the normal distribution sheet located in the same place? So let's go back to our week five lab. Week five-star cheap. And if you come down here, you'll see a tab for normal probability. So here's normal probability. Okay, so this is the sheet. Now over here. For this z score, we're gonna put our score. So our score was an 85. So whenever I'm asked to find F there talking about your score, was the knee I keep forgetting. The mean was 76.9. So 76 women. And our standard deviation was 11.396411.964. Again, you only touch data in blue, not the data in yellow. So what it did is it changed our score, which we scored an 85 into a z-score. So our z-score is 0.71, meaning that we scored above average. But we're still with, with within one standard deviation of the mean. So we're pretty normal, slightly nothing. Yeah, we're right in here and the 60% and data, okay, because the here's the mean. Okay? And we're 0.7, so we're right up here above the mean. So we're still within, in the 68% of data, it within one standard deviation, which is completely normal. We scored unnormal. It's not a shocking score of air. And then we can take a screenshot of our normal distribution. So let's do that real quick. And we want to now, based on our study, what percent of students scored lower than us and what scored higher? So before I took a screenshot, I'm going to put those answers and this right here, as you can see, the normal distribution, BellKor and it's shaded red. The latch to this then it tells us the percentage that's below the X value. Now remember our x values in 85, right? Our mean was 76.9. Our standard deviation was 11.3964. So as you can see, 76.104% of people score lower. The probability of xi, I should say this. The probability that they will score lower is 76.1%. What about more than, okay, well, our score was an 85. The mean was 76.9, and our standard deviation was 11.3964. So make sure that you're putting these in as I'm doing it as well, make sure that you're getting these answers over here on the right. We're not going to use in between. You can use it. But all it means is the probability of scoring between two values, okay, but we're only going to use the less than and write, write more them for this lab. So now I'm going to take a screenshot. My answers and then we're going to talk about them. Back to happen there. So based on my study, what percent of study participants or lower bid me, percentage score lower than me? Get my pencil is 76.14. So 76.14% or purchase the pits. Score lower than my score of 85. What about better? Here's the better. So 23.86% of participants scored higher than my score. And that's what that tells us. And that's it. So those are all of the components of the lab. So let's just kind of review. Everyone should have the same answer if you've been following along and filling in your sheets, we pretty much have the same answer. So hopefully go back to the top. You understood that we have this data. We made a quick description. She hopefully yours is a little bit more detail, but you can put your names, gender, you could make it up Age, city that they live in. And then they're great. Ci You should have ten, and you can list these ten values out as their grade. We talked about different sampling methods. Then we actually dives into the calculations. We took the preliminary calculations for the exam for, and it gave us the mean and the sample standard deviation, which we wrote down. We picked a score, we scored an 85, and we compared our score to the mean. Then we completed the empirical rule by putting the mean and the standard deviation in which told us, ask these results come in where that data is going to fall or pay. So 60% of the data fell between here and here. 95% of scores fall between here and here and so on. Then we did a normal distribution where we use our score to compare. So we had an E5 and I put in the mean and the standard deviation in blue. And so then that means the percentage that's going to score less than I did is the 76.14%. And the percentage that score higher than I did was the 23.8 and 6%. So now I'm gonna do is I'm going to give you a really brief recap of what you're going to do in the week five lab k. So you can see these for your notes and use them when you complete the lab. So let's look at the lab. So what you're gonna do is your instructor will provide you with ten values to use for this lab. Ten heights, okay, of people. See your shrapnel provided me with ten to give you a good start. And you are going to gather ten more of your own to add to your professors tab. So you will have 20 values of people's heights. So your professor will provide you with ten people's heights and you need to gather ten more. So you need to go gather data, go measure people, and find out what their heights are to bring to the lab. Then you're going to post a screenshot of your spreadsheet. You're preliminary values to determine. So for example, if your professor gave you these numbers, you're going to add ten more. So your spreadsheet should have 20 values. And then remember, you're going to find the mean and the sample standard deviation. Give us a background. People that you chose maybe come up with an Excel spreadsheet. Where do you have? You don't have to give names like personal information, but you could do gender, age, location. And then of course, let us know their heights. How did you choose your participants? Was it a systematic sampling method, was a convenience? Was it cluster stratified? And let me clarify one thing. I know that I said in my lecture, the convenience is not always the best, but it's still, It's still used very octets. So don't feel that you're going to get points taken off just because he used a convenient sample. Okay. But I want you to know what a convenience sample is and make sure that you're choosing the right sampling method that describes what you did to gather data. Okay, so you're going to answer these questions. Then you're going to use the empirical rule. So you're very comfortable with this now, since we just did it, you're going to put in the mean and the sample standard deviation. And tell me what these values tell you about the heights of the people in your study posted a screenshot of your work, then you're going to use the normal distribution, okay, which we've used. And you're going to post a screenshot comparing your height. So you want to make sure to know how tall you are. And so you're going to put how tall you are in for x, the mean, and a standard deviation based off the 20 values. And then tell me what this means, and that's it. And then you're going to save your report and upload your lab. So hopefully going through the example with the midterm exams will help you a bit of what you're supposed to do with 20 peoples heights. If you have any questions, feel free to reach out.