Analyzing Data Discussion
The visual charts to demonstrate misrepresentation and factual are shown below. They represent the life expectance in five countries across the Americas. The countries selected are Brazil, Canada, Mexico, Panama, and the United States. Life expectancy can change based on many factors, such as economics, health, and education. However, this representation strictly focuses on how does in life expectancy changes over a period of time. It does not attempt to explain or give reasons for the changes. This study strictly answers the following question:
How does life expectancy between Brazil, Canada, Mexico, Panama, and the United States change from 1960 to 2011?
Both of the charts show the same data extracted from the file "Country_Data_V1" provided with this discussion. The first chart, titled "Misrepresented" indicates a gradual increase in life expectancy for all five countries, with similar upward trends. It also shows, from 1960 to 2011, Canada had the highest life expectancy while Brazil is at the lowest end. Looking at this chart, the life expectancy for each of the countries does not seem to vary much, and at around 2011, the changes in life expectancies for all countries appears negligible. The reason for this minor change is the scale used for such measurement. This scale starts at 0 to 100, smashing the trends together between 50 to 80, hiding essential variations.
The chart labeled "Correct" accurately represents the life expectancy. This scale for this chart was modified to display the numbers between 55 and 86, nicely depicting the life expectancy trends for all five countries. Here we see a significant difference between Brazil and the United States.
Below is the R Program to generate and save the charts.
# Discussion Week 3 Point. Create two plots using same data,
# one will demonstrate a misrepresentation and the other will
# properly present the information.
# 1/18/2021
# Joseph Atalla
# Load the libraries
library(tidyverse)
library(funModeling)
library(ggthemes)
# Set the default working directory on Windows
setwd("C:/Users/joe/Documents/R/IT 530")
# Read the csv file retrieved from:
# https://ucumberlands.blackboard.com/bbcswebdav/pid-7489992-dt-message-rid-41763679_1/xid-41763679_1
df_all <- read.csv("Country_Data.csv")
# Select the columns of interest
# I calculated the GPD per Capita thinking I will be using it,
# However, after exploring the data I decided not to use it.
df <- select(df_all,
continent,
country,
year,
gross_domestic_product,
population,
lifeExpectancy) %>%
filter(country == "United States" |
country == "Panama" |
country == "Canada" |
country == "Brazil" |
country == "Mexico") %>%
mutate(GPD_PK = gross_domestic_product/population)
# Check the data for NA values
df_status(df)
# Summary to learn more about data extremes.
summary(df)
# Create the misrepresented chart
p1 <- ggplot(df) +
geom_line(aes(x = year, y = lifeExpectancy, colour = country), size = 2) +
scale_colour_brewer(name= "Country", type = 'qual', palette="Set2") +
labs(title = "The Americas: Life Expectancy Between 1960 to 2011 (Misrepresented)",
caption = "data source: https://ucumberlands.blackboard.com/bbcswebdav/pid-7489992-dt-message-rid-41763679_1/xid-41763679_1",
x = "Year",
y = "Age") +
scale_y_continuous(limits = c(0, 100)) +
theme_bw()
# Create the correct chart
p2 <- ggplot(df) +
geom_line(aes(x = year, y = lifeExpectancy, colour = country), size = 2) +
scale_colour_brewer(name= "Country", type = 'qual', palette="Set2") +
labs(title = "The Americas: Life Expectancy Between 1960 to 2011 (Corrected)",
caption = "data source: https://ucumberlands.blackboard.com/bbcswebdav/pid-7489992-dt-message-rid-41763679_1/xid-41763679_1",
x = "Year",
y = "Age") +
theme_bw()
# Display Chart 1
p1
# Display Chart 2
p2
# Save the charts to default folder
ggsave(filename = "Misrepresented.png",
plot = p1,
bg = "transparent",
path = ".")
ggsave(filename = "Corrected.png",
plot = p2,
bg = "transparent",
path = ".")