Discussion 4: How Fast is Google?
STA 3111 Activity 4: How fast is Google?
According to an article from Google’s official blog, “Google is fast — a typical search returns results in less than 0.2 seconds.” (Source: http://googleblog.blogspot.com/2009/01/powering-google-search.html) But based experience, some people doubt that the average search time is more than 0.2 seconds. Now, we are going to estimate the average search time of Google, and test the hypothesis.
Part 1. Collect data by asking Google questions.
We would like to estimate the average search time to all possible questions, but that’s a mission impossible. So we have to randomly choose some questions to ask Google. For example, if you search “How do I heal a broken heart?” at Google, it returns with “About 28,200,000 results (0.47 seconds)”, which is shown before all the search results. Here, the “0.47 seconds” is the search time of this question. Fill in the following table of 10 questions, and record the search time. If you asked more than 10 questions, simply add more rows to the table. In case that you find it hard to compose the questions, here is an interesting article on the world’s most popular questions that may give you some ideas. ( http://mkweb.bcgsc.ca/topquestions/ ). Note that the complexity of the terms you ask Google affects the search time, so please compose the question in a complete sentence with a question mark rather than only a word or a phrase.
Based on the knowledge we’ve learned, large sample size has great advantages. Firstly, large sample size is a condition for statistical procedures that are fundamentally based on Central Limit Theorem. Without a large sample size, the decision-making procedure may not be valid. Secondly, the larger the sample size, the more accurate of the inference. In terms of confidence interval, larger sample size leads to narrower interval. In terms of hypothesis test, larger sample size lead to smaller chance of Type II error. Therefore, large sample size is definitely encouraged in all statistical study, including this one. You are encouraged to work with other students on collecting as large sample as you can. If you work with other students, include their names here ______________________________.
|
Question Number |
Questions for Google |
Search Time (Second) |
|
1 |
|
|
|
2 |
|
|
|
3 |
|
|
|
4 |
|
|
|
5 |
|
|
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
|
9 |
|
|
|
10 |
|
|
*If you asked more than 10 questions, attach the rest of them and the search time to the table.
Part 2. Use StatCrunch to find a confidence interval. Answer the following questions.
1. Choose a confidence level ___________. The commonly used levels include 90%, 95%, and 99%. Based on the sample size, which confidence interval procedure will be used, Z-statistic or T-statistic? ____________________.
2. Let StatCrunch do the calculation.
STEP1: Load the data into StatCrunch by the following steps:
a. Copy the column of “Search Time (Second)” in above table including the header.
b. Launch StatCrunch in MyLab website.
c. Go to “My Data -> Paste data into a form”. Paste the data in the empty box, check “Use first line as variable names”, and choose “Delimiter:” Tab. Click “Load data”. You will see one column in the workspace with 10 rows if you asked 10 questions.
STEP2: Find the confidence interval of the search time.
a. From the worksheet, go to “Stat -> Z Stats (or T Stats)-> One Sample -> With data”. Choose Z stats or T stats according to your answer in question 1.
b. Select column “Search Time (Second)” by clicking it. Under “Perform” select “Confidence interval”. Input the level you chose in question 1. Click “Compute!”.
c. Fill in the following table with the output. Round numbers in two decimal places.
95% confidence interval results: μ : mean of search time
|
Variable |
n |
Sample Mean |
Std. Err. |
L. Limit |
U. Limit |
|
Search Time |
|
|
|
|
|
3. Write a complete interpretation of this confidence interval:
4. Based on the confidence interval, can you conclude that the average search time is more than 0.2 seconds? Explain.
Part 3. Use StatCrunch to perform a hypothesis test.
5. Choose a significance level α=_______. The commonly used levels include 0.01, 0.05, and 0.10. Based on the sample size, which hypothesis test should you use, Z test or T test? _______________.
6. We want to test the claim that the average search time is actually more than 0.2 seconds. What are the hypotheses? Denote the mean of search time by μ.
Ho:
Ha:
7. Perform the hypothesis test in StatCrunch.
d. Back to the worksheet, go to “Stat -> Z Stats (or T Stats)-> One Sample -> With data”. Choose Z stats or T stats according to your answer in question 5.
a. Select column “Search Time (Second)” by clicking it.
b. Under “Perform”, select “Hypothesis Test”. Input the null mean and select the sign for the alternative according to the Ho and Ha in question 6. Click “Compute!”.
c. Fill in the following table with the output. Round numbers in two decimal places.
Hypothesis test results: μ : mean of search time
|
Variable |
n |
Sample Mean |
Std. Err. |
Z-Stat/T-Stat |
P-value |
|
Search Time |
|
|
|
|
|
8. Based on the p-value, at the significance level that you chose in question 5, what is your decision, to reject Ho or not to reject Ho? Explain.
9. Write a complete conclusion of the hypothesis test.
Part 4. Use StatCrunch to check conditions of inferential procedures.
10. Create a dotplot of search times in StatCrunch by the following steps:
a. Go to “Graph->Dotplot”.
b. Select column “Search Time (Second)” by clicking it. Click “Compute!”.
Briefly describe the shape of the dotplot. Is the shape close being Normal?
11. Considering the sample size and the data distribution shown in the dotplot, do you think it is appropriate to use the previous confidence interval and hypothesis test procedure?