Statistics

profileasmr_004ss
Karuna.docx

Guidelines for the Final Exam:

You will be allowed to have more than one submission with code and a pdf or word document as output.

Warning: If two exam submissions are highly similar, the instructor will take an appropriate action.

Problem #1 (Total 8 points)

Use the age group data: https://www.health.state.mn.us/diseases/coronavirus/situation.html#ageg1

 

to answer the following questions:

(a) (3 points) Use the "rvest" package to read the data into a data frame named "mn".

(b) (2 points) Add a new column "deathRate" to the data frame "mn".

(c) (3 points) Make a bar graph similar to the one on the webpage except that the height of each bar represents a death rate.

Problem #2 (Total 18 points)

Use the presidential election data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX

 to answer the following questions:

(a) (3 points) Read the data into R as a data frame named "election".

(b) (3 points) Create a data frame that is for democratic and republican candidates only. Name the data frame "candidates".

(c) (3 points) Determine those states that constantly support a particular party.

(d) (3 points) Determine those states that support a particular party with a chance between 40% and 60%. We can call them swing states. Plot them along with the numbers of times they supported each party (D or R). This should be a bar graph.

(e) (3 points) Create a map that shows which party wins which of those states in year 2016.   

(f) (3 points) Create a scatterplot that shows the percent of votes for each party versus the election year variable. There should be two curves in one plot, one curve for each party.

Problem #3 (Total 10 points)

Refer to the data at the bottom of the page https://worldpopulationreview.com/state-rankings/high-school-graduation-rates-by-state

.

(a) (2 points) Create a data frame for the data.

(b) (5 points) Create a data frame that is a subset of the data frame in part (a). This data frame has data only for midwest states (Google the states that are in the midwest). Sort the data frame by the graduation rate in descending order. For the sorted data, make the state column a factor with levels the same as the state names occurring in order (not necessarily alphabetical!). Hint: check out the function factor() in R.

(c) (3 points) Create a bar plot using the ggplot2 package.

Problem #4 (Total 10 points)

Cancer is a leading cause of death worldwide, accounting for an estimated 9.6 million deaths in 2018.

The most common cancers are:

•        Lung (2.09 million cases)

•        Breast (2.09 million cases)

•        Colorectal (1.80 million cases)

•        Prostate (1.28 million cases)

•        Skin cancer (non-melanoma) (1.04 million cases)

•        Stomach (1.03 million cases)

Create a single graph that displays all numeric information for the most common cancers provided here.

Problem #5 (Total 10 points)

The following shows traffic deaths on Minnesota roads in 2018, compared with those in 2017:

100 were speed-related, compared with 88 in 2017.

121 were alcohol-related, compared with 113 in 2017.

90 were not wearing their seat belts, compared with 78 in 2017.

58 were motorcyclists, compared with 53 in 2017.

7 were bicyclists, compared with 6 in 2017.

42 were pedestrians, compared with 42 in 2017.

Use the a ggplot to create a single graph that displays all numeric information provided here.

Problem #6 (Total 14 points)

Use the Minnesota Water Quality Data in D2L to answer the following questions:

(a)    (7 points) Create a Physical Condition-colored map based on the longitude and latitude variables.

(b)    (7 points) Create side-by-side boxplots for Secchi_Depth_Result for different Physical Condition Results. Write one or two sentences to explain your graph.

Problem #7 (Total 10 points)

Use the worldwide smartphone market share data from https://www.idc.com/promo/smartphone-market-share/vendor

.

Create a graph similar to the one shown on the webpage. 

Problem #8 (Total 20 points)

2013 sales data (under "Content" on D2L) for more than 1500 products across 10 stores in different cities. Also, certain attributes of each product and store have been defined.

Note: Sheet 2 of the data on D2L contains a description of the variables.

(a)   (4 points) Create a data frame, called Sales.

(b)   (4 points) Make a bar graph for IFC. How many levels does IFC have? Any possible problem with the data? If any, correct it. 

(c)    (4 points) Make a graph for IFC vs OT, colored by OLT.

(d)    (4 points) Make a graph for IV vs IFC, colored by OT.

(e)    (4 points) All graph should have an appropriate title.

 

Guidelines for the Final Exam:

You will be allowed to have more than one submission with

code and a pdf

or

word

document as

output.

Warning: If two exam submissions are highly similar, the instructor will take an appropriate action.

Problem

#1

(Total 8 points)

Use the age group data:

https://www.health.state.mn.us/diseases/coronavirus/situation.html#ageg1

to answer the following questions:

(a) (3 points) Use the "rvest" package to read the data into a data frame named "mn".

(b) (2 points) Ad

d a new column "deathRate" to the data frame "mn".

(c) (3 points) Make a bar graph similar to the one on the webpage except that the height of each bar

represents a death rate.

Problem #2

(Total 18 points)

Use the presidential election

data:

https://datave

rse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX

to answer the following questions:

(a) (3 points) Read the data into R as a data frame named "election".

(b) (3 points) Create a data frame that is for democratic and republican candidates o

nly. Name the data

frame "candidates".

(c) (3 points) Determine those states that constantly support a particular party.

(d) (3 points) Determine those states that support a particular party with a chance between 40% and

60%. We can call them swing states.

Plot them along with the numbers of times they supported each

party (D or R). This should be a bar graph.

(e) (3 points) Create a map that shows which party wins which of those states in year 2016.

(f) (3 points) Create a scatterplot that shows the per

cent of votes for each party versus the election year

variable. There should be two curves in one plot, one curve for each party.

Problem #3

(Total 10 points)

Refer to the data at the bottom of the page

https://worldpopulationreview.com/state

-

rankings/hig

h

-

school

-

graduation

-

rates

-

by

-

state

.

(a) (2 points) Create a data frame for the data.

Guidelines for the Final Exam:

You will be allowed to have more than one submission with code and a pdf or word document as

output.

Warning: If two exam submissions are highly similar, the instructor will take an appropriate action.

Problem #1 (Total 8 points)

Use the age group data: https://www.health.state.mn.us/diseases/coronavirus/situation.html#ageg1

to answer the following questions:

(a) (3 points) Use the "rvest" package to read the data into a data frame named "mn".

(b) (2 points) Add a new column "deathRate" to the data frame "mn".

(c) (3 points) Make a bar graph similar to the one on the webpage except that the height of each bar

represents a death rate.

Problem #2 (Total 18 points)

Use the presidential election

data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/42MVDX

to answer the following questions:

(a) (3 points) Read the data into R as a data frame named "election".

(b) (3 points) Create a data frame that is for democratic and republican candidates only. Name the data

frame "candidates".

(c) (3 points) Determine those states that constantly support a particular party.

(d) (3 points) Determine those states that support a particular party with a chance between 40% and

60%. We can call them swing states. Plot them along with the numbers of times they supported each

party (D or R). This should be a bar graph.

(e) (3 points) Create a map that shows which party wins which of those states in year 2016.

(f) (3 points) Create a scatterplot that shows the percent of votes for each party versus the election year

variable. There should be two curves in one plot, one curve for each party.

Problem #3 (Total 10 points)

Refer to the data at the bottom of the page https://worldpopulationreview.com/state-rankings/high-

school-graduation-rates-by-state

.

(a) (2 points) Create a data frame for the data.