2 pages question
1
TAMS #2 Assignment Fall 2020: Using Software for Graphical Analysis, Numerical
Analysis, Correlation and Regression
Purpose: In this assignment you will explore how to use statistical software to create graphical and numerical summaries of data. Once you have created these summaries, you will use them to draw conclusions and answer questions about the data.
The mastery standards covered by this assignment are
S3 – Graphical Summaries of Data: Students will be able to interpret graphical summaries of sample data.
S4 – Numerical Summaries of Data: Students will be able to interpret numerical summaries and boxplots of sample data.
S5 – Regression and Correlation: Students will be able to use correlation and regression to describe relationships between two variables in a set of data.
MP1 – Quantitative Reasoning: Students will be able to apply basic mathematical skills to interpret real world quantitative information in context.
MP3 – Correct Use of Statistical Tools: Students will be able to utilize tools such as empirical rules, statistical software, calculators and z-score tables with precision and accuracy.
Instructions: To get started, open a copy of the dataset WW GDP Growth Rates.xls1 in Google sheets. The link to the data set is in Canvas. The data set provides economic data about GDP growth rates for countries worldwide and some additional country data. We recommend that you SAVE a copy of the dataset in your Google account.
Note that this is an INDIVIDUAL assignment, and you are responsible to do your own write-up. You may have started to work on this TAMS in class as a pair or group, but you are to work on this on your own at home. Although you are allowed to discuss this assignment with other class members, you may NOT work together on an electronic document. “Working together” in this context is ONLY by talking, not by typing. You MUST do your own work in the software, save your own copies of the graphics, and produce your OWN written document. Documents that begin as a shared document and are modified will NOT be accepted and will result in a mastery grade of 0 for everybody involved.
You are to submit your answers using the template posted with the assignment. Your template needs to be submitted online through Canvas.
1 Data from Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer (2015), "The Next Generation of the Penn World Table" American Economic Review, 105(10), 3150-3182, available for download at www.ggdc.net/pwt
2
Part I – Exploring Data with Histograms
Much of our understanding of the world comes from the economic headlines provided on the nightly news or online newspaper websites. Some of what we think we understand might be out of date, misinformation, or misunderstood. In this project we are going to examine how world economic data demonstrates different kinds of information about the world. Begin by opening the WW GDP Growth Rates data file in Google sheets. This data set contains the GDP (Gross Domestic Product) growth rate for the years 1950, 2000, and 2003 – 2013 for the countries of the world, together with additional information for each country. 1) In order to begin exploring data, we need to start by understanding what we are looking at when we
look at a dataset. a) What are the “individuals” or “subjects” in this dataset? b) In the dataset, find the 138th row. What individual does this row contain data about? c) Describe the data that is contained in the columns labeled “1950” through “2013”. Is this data
quantitative or categorical data? What information does it provide about the individuals in the dataset?
d) List the name of a column that contains categorical data and explain why this data is categorical. e) What was the population of the country for the 138th row in 1950? In 2011?
2) In the dataset find the column labeled “World Bank regions”.
a) What data is shown in this column? b) What kinds of graphs would you use to display the data in this column?
Watch the video on “How to get started with Geogebra” (posted on Canvas in Week 5). 3) We are going to create a bar chart of the column “World Bank, four income groups 2017”.
a) What are the values in the “World Bank, four income groups 2017” column? b) What does the value in the “World Bank, four income groups 2017” column tell you about the
country in row 54 (Please include the name of the country in your answer.) c) Use the “COUNTIF” function in Google Sheets to make a table of how many countries there are
in each of the four income groups described in the dataset. Put a screenshot of the table in your answer template.
4) Following the directions in the video that we provided about how to make a bar chart in Geogebra, make a bar chart in Google Sheets representing the data that you calculated in part 3c. Label the bars on your chart. Also please add a title for your Bar chart, as well as labels for your x- and y- axes.
a) Insert a screenshot of the bar chart into your answer template.
3
b) What does each “bar” on your bar chart represent?
c) How many of the countries in the data are in the “Lower Middle Income” group? What percentage of all the countries are in this income group? Explain how you did your percentage calculations.
5) Next, copy the one of the columns labeled 2005 through 2012 (your choice) into Geogebra from the Google Sheet. Create a Histogram of that column using the “One Variable Analysis” tool in Geogebra and use it to answer the following questions:
a) What information is represented on the horizontal axis? What are the units? (Hint: Be sure to look at the “About This Data” tab in the spreadsheet for help in understanding the data.)
b) Insert this histogram into your answer template.
Adjust your histogram and use it to answer the question “What percentage of countries have a yearly growth rate in year XXXX of 12% or less?”, where XXXX is the year you chose. c) Insert the modified histogram into your answer template. d) Explain what you did to modify the histogram. e) Explain why your choice of bin/starting value allows you to answer the question. f) Answer the question “What percentage of countries have a yearly growth rate in this year of 12%
or less?”
Part II – Exploring Data with Summary Statistics
Sometimes the important features of a histogram can be summarized by a few statistics such as the mean (or average), the median, the standard deviation, or the five-number summary. These quantities allow us to make comparisons of groups and help us to understand the story behind the data. 6) Copy the GDP growth rate columns for 1950, 2000, and 2011 into Geogebra. Create the Summary
Statistics (also known as Descriptive Statistics) for the three years 1950, 2000, and 2011. a) Use the “Multiple Variable Analysis” selection to create a Stacked BoxPlot of the three years showing
outliers and insert it into your answer template. b) Create the summary statistics (also known as descriptive statistics) using the Sum symbol on the right
and insert a screenshot of them into the answer template. c) For which year do the top 25% of countries (as measured by GDP growth) have growth rates
in excess of 7%? What piece of data did you use from the data summaries to answer this question and why does this value answer it?
d) In what year did a country have the smallest growth rate in GDP? How much was that growth rate? 7) Now we want to dig a little deeper into the GDP growth rate and focus on just the year 2011. Grouping the
data by income level, we can explore questions such as: Did countries in the “Upper Middle Income” group have higher levels of GDP growth in the year 2011 compared with the other income groups? Processing the data for Geogebra: a) Use the Google sheet to group the variables using the “World Bank, 4 income groups 2017” column.
4
(Note that 2017 refers to the classification year, not the year in which the data was collected.) i) Create a new tab in the Google sheet ii) Copy the “World Bank, 4 income groups 2017” and the “year 2011” columns into the new tab. iii) Use the method shown in the video to split the data labeled “year 2011” data into separate
columns, one for each of the 4 income groups. You should end up with a column of year 2011 data for each of the 4 groups.
Insert the following into your answer templates b) Copy the four columns of growth rate data into Geogebra and create a stacked boxplot of all four
columns. Insert a screenshot of the stacked boxplot into your answer template.
c) Create the summary statistics for the GDP growth rates in low-, lower middle-, upper middle- and high-income countries and insert a screenshot of it into your answer template.
d) Based on your summary statistics, did “Upper Middle Income” countries have higher GDP growth rates than the other income groups in 2011? Give reasons for your answer. Make sure to not just compare max and min values, but rather use all the values of the 5-number summary. You might also argue based on the shape of the distribution and discuss the impact of outliers in your analysis.
e) Based on the boxplot, which income group has the largest variability? Explain what feature(s) of the boxplots support(s) your answer.
Part III – Regression and Correlation 8) Many people believe that wealthier countries (measured by using gross domestic product (GDP) per
capita) will continue to be wealthy and that poorer countries will continue to be poor over the years. We would like to investigate that claim by doing a regression analysis for the relationship between GDP in 1960 and to GDP in 2017. Processing the data for Geogebra: a) Make a new tab in the Google Sheet and copy the columns for GDP per capita in 1960 and GDP per
capita in 2017 into the new tab. b) Delete the rows for which there is no data for 1960 (rows with a “0” in the “GDP per capita in 1960
column). c) Copy the remaining data into Geogebra. Create a scatter plot for GDP per capita in 1960 (on the x-
axis) vs GDP per capita in 2017 (on the y-axis) that also shows the regression line.
Insert the following into your answer template and answer the questions below. d) Insert the scatterplot with the regression line into your answer template. e) Copy the formula for the regression equation into the answer template. Use the regression equation to
predict the GDP per capita in 2017 for a country that had a GDP per capita of $9,897 in 1960.
f) Using the R2 value, explain how much the GDP per capita in 1960 predicts the GDP per capita in 2017 for a country.
g) Are GDP per capita in 1960 and GDP per capita in 2017 highly correlated? Why or why not? h) Should you use the regression equation from part e) to predict the 2017 per capita income for a
country that had a 1960 per capita income of $42,128? If so, what do you predict the 2017 per capita
5
income to be? If not, why is this something you should not do?
9) Sometimes, looking at the overall data does not give you the entire picture. Instead, you need to drill down
a little further. Below is a scatterplot grouped by the variable “four regions”.
a) Compute the correlations for each region. Show your work, explain how you computed these, and list
the correlation coefficients for each region. b) How do the correlation coefficients of the individual regions compare with the correlation coefficient
of the dataset as a whole? c) For the region whose correlation coefficient is most different from the correlation coefficient for the
whole world, which regression equation would you use to predict a value for a country in that region? Would you use the equation for the whole world or the equation for just that region? Give an argument for your choice of equation and state which region has the most difference in correlation compared to the whole world.