Analyzing and visualizing Data - Overview with 12 slides

profilesrinivas15
ITS530RAdvancedGraphsggplot2.pptx

School of Computer & Information Sciences

ITS530 Analyzing and Visualizing Data

Introduction: R Advanced Graphs ggplot 2

ITS530 R Advanced Graphs

1

What is ggplot2?

ITS530 R Advanced Graphs

2

Grammar of graphics:

Independently specify plot building blocks

combine them to create a graphical display to your liking.

Building blocks of a graph include:

data

aesthetic mapping

geometric object

statistical transformations

scales

coordinate system

position adjustments

faceting

What is a geom

Use a geom to

represent data points,

geom’s aesthetic properties to represent variables.

each function returns a layer.

ITS530 R Advanced Graphs

3

ITS530 R Advanced Graphs

4

ITS530 R Advanced Graphs

5

2 Variable

3 variable

ITS530 R Advanced Graphs

6

2 variable contd.

Different Geoms

You can get a list of available geometric objects using the code below:

>help.search("geom_", package = "ggplot2")

ITS530 R Advanced Graphs

7

ggplot2::geom_abline Reference lines: horizontal, vertical, and diagonal
ggplot2::geom_bar Bars charts
ggplot2::geom_bin2d Heatmap of 2d bin counts
ggplot2::geom_blank Draw nothing
ggplot2::geom_boxplot A box and whiskers plot (in the style of Tukey)
ggplot2::geom_contour 2d contours of a 3d surface
ggplot2::geom_count Count overlapping points
ggplot2::geom_density Smoothed density estimates
ggplot2::geom_density_2d Contours of a 2d density estimate
ggplot2::geom_dotplot Dot plot
ggplot2::geom_errorbarh Horizontal error bars
ggplot2::geom_hex Hexagonal heatmap of 2d bin counts
ggplot2::geom_freqpoly Histograms and frequency polygons
ggplot2::geom_jitter Jittered points
ggplot2::geom_crossbar Vertical intervals: lines, crossbars & errorbars
ggplot2::geom_map Polygons from a reference map
ggplot2::geom_path Connect observations
ggplot2::geom_point Points
ggplot2::geom_polygon Polygons
ggplot2::geom_qq A quantile-quantile plot
ggplot2::geom_quantile Quantile regression
ggplot2::geom_ribbon Ribbons and area plots
ggplot2::geom_rug Rug plots in the margins
ggplot2::geom_segment Line segments and curves
ggplot2::geom_smooth Smoothed conditional means
ggplot2::geom_spoke Line segments parameterised by location, direction and distance
ggplot2::geom_label Text
ggplot2::geom_raster Rectangles
ggplot2::geom_violin Violin plot
ggplot2::update_geom_defaults Modify geom/stat aesthetic defaults for future plots

Importing csv dataframes

setwd(“/Users/…./UC/ITS530/Rprog”)

dpc <- read.csv("dataset_price_personal_computers.csv", na.string="")

Commands:

read.csv(file, header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "", ...)

na.string=“”

a character vector of strings which are to be interpreted as NA values. Blank fields are also considered to be missing values in logical, integer, numeric and complex fields. Note that the test happens after white space is stripped from the input, so na.strings values may need their own white space stripped in advance.

ITS530 R Advanced Graphs

8

Factor command in r

dpc <-read.csv ("dataset_price_personal_computers.csv", na.strings ="")

library(ggplot2)

table(dpc$ram)

table(dpc$price)

str(dpc)

# factors

dpc$speed <- as.factor(dpc$speed)

dpc$hd <- as.factor(dpc$hd)

dpc$ram <- as.factor(dpc$ram)

dpc$screen <- as.factor(dpc$screen)

dpc$cd <- as.factor(dpc$cd)

dpc$multi <- as.factor(dpc$multi)

dpc$premium <- as.factor(dpc$premium)

ITS530 R Advanced Graphs

9

str(dpc)

'data.frame': 6259 obs. of 11 variables:

$ X : int 1 2 3 4 5 6 7 8 9 10 ...

$ price : int 1499 1795 1595 1849 3295 3695 1720 1995 2225 2575 ...

$ speed : int 25 33 25 25 33 66 25 50 50 50 ...

$ hd : int 80 85 170 170 340 340 170 85 210 210 ...

$ ram : int 4 2 4 8 16 16 4 2 8 4 ...

$ screen : int 14 14 15 14 14 14 14 14 14 15 ...

$ cd : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 2 1 1 1 ...

$ multi : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

$ premium: Factor w/ 2 levels "no","yes": 2 2 2 1 2 2 2 2 2 2 ...

$ ads : int 94 94 94 94 94 94 94 94 94 94 ...

$ trend : int 1 1 1 1 1 1 1 1 1 1 ...

str(dpc)

'data.frame': 6259 obs. of 11 variables:

$ X : int 1 2 3 4 5 6 7 8 9 10 ...

$ price : int 1499 1795 1595 1849 3295 3695 1720 1995 2225 2575 ...

$ speed : Factor w/ 6 levels "25","33","50",..: 1 2 1 1 2 4 1 3 3 3 ...

$ hd : Factor w/ 59 levels "80","85","100",..: 1 2 9 9 24 24 9 2 11 11 ...

$ ram : Factor w/ 6 levels "2","4","8","16",..: 2 1 2 3 4 4 2 1 3 2 ...

$ screen : Factor w/ 3 levels "14","15","17": 1 1 2 1 1 1 1 1 1 2 ...

$ cd : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 2 1 1 1 ...

$ multi : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

$ premium: Factor w/ 2 levels "no","yes": 2 2 2 1 2 2 2 2 2 2 ...

$ ads : int 94 94 94 94 94 94 94 94 94 94 ...

$ trend : int 1 1 1 1 1 1 1 1 1 1 ...

Before

factor

After

Factor

command

ggplot() – geom_bar()

#bar plots

ggplot(dpc, aes(x=ram)) + geom_bar()

ggplot(dpc, aes(x=trend)) + geom_bar()

ggplot(dpc, aes(x=ram)) + geom_bar()

ggplot(dpc, aes(x=ram)) + theme_bw() + geom_bar()

ggplot(dpc, aes(x=ram)) + theme_bw() + geom_bar() + labs(x="Ram (GB)", y="Counts", title="Computer Ram")

ggplot(dpc, aes(x=ram, fill=screen)) + theme_bw() + geom_bar() + labs(x="Ram (GB)", y="Counts", title="Computer Ram by Screen")

#Bar plot %

ggplot(dpc, aes(x=ram, fill=screen)) + theme_bw() + geom_bar(position="fill") + labs(x="Ram (GB)", y="Counts by %", title="Computer Ram by Screen")

ITS530 R Advanced Graphs

10

ggplot() – geom_bar, facet command

ITS530 R Advanced Graphs

11

# side by side bars

ggplot(dpc, aes(x=ram, fill=screen)) + theme_bw() + geom_bar(position="dodge") + labs(x="Ram (GB)", y="Counts", title="Computer Ram by Screen")

#drill using facet

ggplot(dpc, aes(x=ram, fill=screen)) + theme_bw() + facet_wrap(~premium) + geom_bar(position="dodge") + labs(x="Ram (GB)", y="Counts", title="Computer Ram by Screen")

#breakdown premium and cd

ggplot(dpc, aes(x=ram, fill=screen)) + theme_bw() + facet_wrap(premium~cd) + geom_bar(position="dodge") + labs(x="Ram (GB)", y="Counts", title="Computer Ram by Screen")

ggplot(), geom_histogram()

ITS530 R Advanced Graphs

12

#histogram

ggplot(dpc, aes(x=price)) + theme_bw() + geom_histogram() + labs(x="Price", y="Freq", title="ComputerPrices")

ggplot(dpc, aes(x=price)) + theme_bw() + geom_histogram(binwidth =10) + labs(x="Price", y="Freq", title="ComputerPrices")

ggplot(dpc, aes(x=price)) + theme_bw() + geom_histogram(binwidth =50) + labs(x="Price", y="Freq", title="ComputerPrices")

ggplot(dpc, aes(x=price)) + theme_bw() + geom_histogram(binwidth =100) + labs(x="Price", y="Freq", title="ComputerPrices")

ggplot(), geom_histogram()

ITS530 R Advanced Graphs

13

#histogram

ggplot(dpc, aes(x=price)) + theme_bw() + geom_histogram(binwidth =100) + labs(x="Price (Binwidth=100)", y="Freq", title="Histogram-Computer Prices")

ggplot(dpc, aes(x=price, fill=ram)) + theme_bw() + geom_histogram(binwidth =100) + labs(x="Price (Binwidth=100)", y="Freq", title="Histogram-Computer Prices")

ggplot(dpc, aes(x=price, fill=screen)) + theme_bw() + geom_histogram(binwidth =100) + labs(x="Price (Binwidth=100)", y="Freq", title="Histogram-Computer Prices")

ggplot(dpc, aes(x=price, fill=screen)) + theme_bw() + facet_wrap(~premium) + geom_histogram(binwidth =100) + labs(x="Price (Binwidth=100)", y="Freq", title="Histogram-Computer Prices")

ggplot() – geom_boxplot()

ITS530 R Advanced Graphs

14

#Box plot

ggplot(dpc, aes(x=screen, y=price)) + theme_bw() + geom_boxplot() + labs(x="Screen", y="Price", title="Box Plot-Computer Screen vs Price")

ggplot(dpc, aes(x=screen, y=price, fill=ram)) + theme_bw() + geom_boxplot() + labs(x="Screen", y="Price", title="Box Plot-Computer Screen vs Price")

ggplot(dpc, aes(x=screen, y=price, fill=ram)) + theme_bw() + facet_wrap(~cd) + geom_boxplot() + labs(x="Screen", y="Price", title="Box Plot-Computer Screen vs Price")

ggplot(), geom_point() “scatter”

#scatter plot

ggplot(dpc, aes(x=price, y=speed)) + geom_point() + theme_bw() + labs(x="Price", y="Speed", title="Speed vs Price")

ggplot(dpc, aes(x=price, y=ram)) + theme_bw() + geom_point() + labs(x="Price", y="RAM", title="RAM vs Price")

ITS530 R Advanced Graphs

15

Questions?

ITS530 R Advanced Graphs

16