machine learning project

asifj1990
finalprojectq11.doc

image1.jpg

Final Projects

image65.jpg
image66.jpg
image67.jpg
image68.jpg
image69.jpg
image70.jpg
image71.jpg
image72.jpg
image73.jpg
image74.jpg
image75.jpg
image76.jpg
image77.jpg
image78.jpg
image79.jpg
image80.jpg
image81.jpg
image82.jpg

image83.jpg
image84.jpg
image85.jpg
image86.jpg
image87.jpg
image88.jpg
image89.jpg
image90.jpg
image91.jpg
image92.jpg
image93.jpg
image94.jpg

Data Sets

Data Sets in R Packages

· List the Data Sets in an R Package

install.packages("gcookbook")

library(gcookbook)

data("gcookbook")

try(data(package = "gcookbook"))

· Obtain Information about a specific Data

Sets in an R Package

??gcookbook::countries

© Patricia Hoffman PhD

Package gcookbook Data Sets

aapl

anthoming

cabbage_exp

climate

corneas

Apple stock data

Homing in desert ants

Summary of cabbages data set

Global climate temperature anomaly data from 1800 to 2011

Corneal thickness of eyes

countries

heightweight

isabel

marathon

pg_mean

Health and economic data about countries around the world

Height and weight of schoolchildren

Data from simulation of hurricane Isabel

Marathon and half-marathon times

Means of results from an experiment on plant growth

tophitters2001 uspopage uspopchange And Many More

Batting averages top hitters in Major League Baseball

Age distribution of population in the United States, 1900-2002

Change in population of states in the U.S. between 2000 -2010

© Patricia Hoffman PhD

Package: gcookbook

Data Set: countries

· Description

 Health and economic data about countries around the world from 1960-2010

· Variables

 Name: Name of country

 Code: Short country code

 Year

 GDP: Per capita Gross Domestic Product, in adjusted 2011 U.S. Dollars

 laborrate: Labor rate.

 healthexp: Health expenditures in U.S. Dollars.

 infmortality: Infant mortality per 1000 live births.

· Source

 World Bank: http://data.worldbank.org/

© Patricia Hoffman PhD

Resources: Datasets

· UCI Repository:

http://www.ics.uci.edu/~mlearn/MLRepository.html

· UCI KDD Archive:

http://kdd.ics.uci.edu/summary.data.application.html

· Statlib:

· Delve:

http://lib.stat.cmu.edu/

http://www.cs.utoronto.ca/~delve/

· SVM

· http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

· http://www.csie.ntu.edu.tw/~cjlin/papers/guide/data/

© Patricia Hoffman PhD

Resources: Datasets

· Stanford Large Network Dataset Collection:

http://snap.stanford.edu/data/

· Gapminder http://www.gapminder.org/

© Patricia Hoffman PhD

Specialized Collections

· KDD Nuggets:

· http://www.kdnuggets.com/datasets/index.html

· CMU Statlab

· http://lib.stat.cmu.edu/datasets/

· Gene Expression

· http://www.ncbi.nlm.nih.gov/geo/

· Cornell’s arXiv Bulk Data Access

· http://arxiv.org/help/bulk_data

· Amazon Web Services Public Data Sets

· http://aws.amazon.com/publicdatasets/

/

© Patricia Hoffman PhD

Government Data Sources

 http://www.data.gov/

· US Department of Housing and Urban Development Housing scorecard at

image95.jpg
http://portal.hud.gov Housing data

· Census bureau at http://www.census.gov for Economic and social demographic indicators

· The Federal Reserve Bank of St Louis at http://research.stlouisfed.org/fred2/ for economic

image2.jpg
and interest rate related statistics.

· US Department of the Treasury at www.treasury.gov/resource-center/data-chart-center for

image3.jpg
detailed information on yield curves and treasury bills, bonds price.

· Bureau of Economic Analysis at http://www.bea.gov for comprehensive data sets on GDP

image4.jpg
and Trade.

· Medicare Data:

image5.jpg
http://www.medicare.gov/hospitalcompare/search.html?AspxAutoDetectCookieSupport=1

· Bureau of Labor http://www.bls.gov/

· Bureau of Transportation

image6.jpg
http://www.transtats.bts.gov/DatabaseInfo.asp?DB_ID=120&Link=0

© Patricia Hoffman PhD

Open Government Sites

· USA Survey Data

· http://www.asdfree.com

· U.S. http://www.data.gov/

· List of cities/states with open data

· United Kingdom http://data.gov.uk/

· France http://www.data.gouv.fr/

· Australia http://data.gov.au/

· Germany https://www.govdata.de/

· Many more http://www.data.gov/opendatasites

© Patricia Hoffman PhD

Open Government Sites

· United Nations http://data.un.org/

· U.S. http://www.data.gov/

· List of cities/states with open data

· United Kingdom http://data.gov.uk/

· France http://www.data.gouv.fr/

· Ghana http://data.gov.gh/

· Australia http://data.gov.au/

· Germany https://www.govdata.de/

Using Slides & Text authors, Tan,

Steinbach, and Kumar

Open Government Sites

· Hong Kong http://www.gov.hk/en/theme/psi/datasets/

· Japan http://www.data.go.jp/

· England http://data.london.gov.uk/dataset

· Many more http://www.data.gov/opendatasites

Using Slides & Text authors, Tan,

Steinbach, and Kumar

Yahoo! Data Sets

http://webscope.sandbox.yahoo.com/

· Advertising and Market Data

· Competition Data

· Computing Systems Data

· Graph and Social Data

· Image Data

· Language Data

· Ratings and Classification Data

© Patricia Hoffman PhD

More Data Sites

· Info Chimps Market Place

· http://www.infochimps.com/marketplace

· Kaggle

· http://www.kaggle.com/

· Data Scientist

· (( http://blog.mortardata.com/post/67652898761/6-dataset-lists-curated-by-data-scientists )

· Hilary Mason

· http://bitly.com/bundles/hmason/1

· Peter Skomoroch

· https://delicious.com/pskomoroch/dataset

· Jeff Hammerbacher

· http://www.quora.com/Jeff-Hammerbacher/Introduction-to-Data-Science-Data-Sets

· Gregory Piatetsky-Shapiro

http://www.kdnuggets.com/gps.html

© Patricia Hoffman PhD

Even More Data Sites

· http://factfinder2.census.gov/faces/nav/jsf/pa

ges/index.xhtml

· https://archive.ics.uci.edu/ml/machine- HYPERLINK "https://archive.ics.uci.edu/ml/machine-learning-databases/00215/" learning-databases/00215/

· Google’s Public Data Sets

· http://www.google.com/publicdata/directory

· List of Data Sites & Contests

· http://www.rdatamining.com/resources/data

© Patricia Hoffman PhD

Free Financial Data Sources

· Yahoo finances at http://finance.yahoo.com for stocks and fundamental analysis data.

· Online Data Robert Shiller at http://www.econ.yale.edu/~shiller/data.htm for predictive models for housing and stock market confidence.

· RBS Group databank at http://www.databank.rbs.com for currency exchange rates and commodity prices.

· Simian Savants Charts at http://www.sschart.com/cmedata.shtml for currency and stock market indices.

· Kumo at http://pages.swcp.com/stocks/ for all S&P 500 stocks.

· StockCharts at http://www.stockcharts.com

· Reuters at http://www.reuters.com for stocks and market indices.

· Nasdaq at http://www.nasdaq.com for corporate data financial metrics

© Patricia Hoffman PhD

API’s with R interfaces

· twitter and twitteR package

· figshare and rfigshare

· PLoS and rplos

· rOpenSci

· Facebook and RFacebook

· Google maps and RGoogleMaps

· Tutorials

http://thinktostart.com/category/r-tutorials/

© Patricia Hoffman PhD

image7.jpg
Financial Data using REST API

image8.jpg
Quandl at http://www.quandl.com

image9.jpg
Xignite Global Real Time at
image10.jpg
http://www.xignite.com/product/global-real-
image11.jpg
time-stock-quote-data/

image12.jpg
Golb at

http://www.goldb.org/ystockquote.html

© Patricia Hoffman PhD

Journal Paper

Resources: Journals

image13.jpg
Journal of Machine Learning Research

www.jmlr.org

image14.jpg
Machine Learning

http://jmlr.csail.mit.edu/papers/

image15.jpg
Annals of Statistics

image16.jpg
Journal of the American Statistical Association

http://www.jstatsoft.org/

© Patricia Hoffman PhD

Neural Networks

image17.jpg
Neural Computation also Neural Networks
image18.jpg
IEEE Transactions

Neural Networks

Pattern Analysis and Machine Intelligence

© Patricia Hoffman PhD

© Patricia Hoffman PhD

Extra Topics To Explore

· New Packages

· Extra Functions

3000 + Packages

· List of Tasks: http://cran.r-project.org/web/views

· Keyword Search for Tasks or Functions:

http://rseek.org

· Search, Tag, and Review Packages:

http://crantastic.org/

· Package Downloads Over Time – App

· https://dgrtwo.shinyapps.io/cranview/

© Patricia Hoffman PhD

Packages

image19.jpg
List of All Packages

http://cran.r-

project.org/web/packages/available_packages_by_name.html

image20.jpg
Popular Packages

http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan- HYPERLINK "http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/" may/

image21.jpg
Task Views

http://cran.r-project.org/web/views/ReproducibleResearch.html

image22.jpg
R Studio Quick List

 https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages

© Patricia Hoffman PhD

Package: caret

image23.jpg
caret package(short for classication and regression training)
image24.jpg
image25.jpg
contains functions to streamline the model training process

· Evaluate effect of model tuning parameters on performance

 Choose “optimal” model across these parameters

Estimate model performance from training set♦ Help Pages

 http://caret.r-forge.r-project.org/

Vignettes in package caret

A Short Introduction to the caret Package - PDF HYPERLINK "http://127.0.0.1:14772/library/caret/doc/caret.Rnw" source HYPERLINK "http://127.0.0.1:14772/library/caret/doc/caret.R" R code  Manual

cran.r-project.org/web/packages/caret/caret.pdf

image26.jpg
Introduction

cran.r-project.org/web/packages/caret/vignettes/caret.pdf

image27.jpg
Tutorial

www.edii.uclm.es/.../user_caret_2up.pd ...

image28.jpg
A Short Introduction to the caret Package - PDF  R code

© Patricia Hoffman PhD

Caret Functions

image29.jpg
createDataPartition()

A series of test/training partitions

image30.jpg
createResample()

creates one or more bootstrap samples.

image31.jpg
createFolds()

splits the data into k groups

image32.jpg
createTimeSlices()

creates cross-validation sample informationto be used with time series data.

© Patricia Hoffman PhD

plyr: Tools for splitting, applying and combining data

plyr is a set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together. For example, you might want to fit a model to each spatial location or time point in your study, summarise data by panels or collapse high-dimensional arrays to simpler summary

statistics.

ddply(Orange, .(Tree), summarize, COVARIANCE = cov(age, circumference), CORRELATION = cor(age, circumference))

http://cran.r-project.org/web/packages/plyr/

© Patricia Hoffman PhD

zoo: Time Series

 An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors. zoo's key design goals are independence of a particular index/date/time class and consistency with ts and base R by providing methods to extend standard generics.

 http://cran.r-project.org/web/packages/zoo/

© Patricia Hoffman PhD

image33.jpg
tseries and timeSeries: Time series analysis and computational finance

image34.jpg
Time Series manipulations for

Exonometircs

Environmetrics

Finance

Time Series

Web Technologies

image35.jpg
 http://cran.r-project.org/web/packages/tseries/

image36.jpg
http://cran.r-project.org/web/packages/timeSeries/

© Patricia Hoffman PhD

Package: “stringr”

image37.jpg
Description stringr is a set of simple wrappers that

image38.jpg
make R's string functions more consistent, simpler and

image39.jpg
easier to use. It does this by ensuring that: function and

image40.jpg
argument names (and positions) are consistent, all

image41.jpg
functions deal with NA's and zero length character

image42.jpg
appropriately, and the output data structures from each

image43.jpg
function matches the input data structures of other
image44.jpg
functions

http://cran.r-project.org/web/packages/stringr/stringr.pdf

© Patricia Hoffman PhD

Text Mining Packages

· tm

· Vignettes:Extensions - PDF HYPERLINK "http://127.0.0.1:14772/library/tm/doc/extensions.Rnw" source HYPERLINK "http://127.0.0.1:14772/library/tm/doc/extensions.R" R code

· Introduction to the tm Package - PDF source R code

· Rstem

· openNLP

· Isa

© Patricia Hoffman PhD

API’s

· twitteR

Download twitter data using R

http://www.r-bloggers.com/getting-started-with-twitter-in-r/

· Rfacebook

Download facebook data using R

http://thinktostart.com/analyzing-facebook-with-r/

· Rlinkedin

https://github.com/mpiccirilli/Rlinkedin

· httr: Talk to web API’s from R

 http://cran.r-project.org/web/packages/httr/vignettes/quickstart.html

· jsonlite: converts a data frame to JSON formatted data

© Patricia Hoffman PhD

R Parallel Computing

image45.jpg
snow:

http://mran.revolutionanalytics.com/packages/ info/?snow

image46.jpg
parallel: http://www.inside-r.org/r-
image47.jpg
doc/parallel

image48.jpg
pnmath:

http://homepage.stat.uiowa.edu/~luke/R/exper imental/

© Patricia Hoffman PhD

sparkR

image49.jpg
Interactive R programs at Scale

https://www.youtube.com/watch?v=CUX1SG9zT

kU&index=1&list=PL-

x35fyliRwiuc6qy9z2erka2VX8LY53x

http://blog.revolutionanalytics.com/2015/01/a-first-look-at-spark.html

© Patricia Hoffman PhD

Package: H2O

· Big Data Analytics Package

· Includes many model building tools

· The Open Source In-Memory, Prediction Engine for Big Data Science

· More

· Package Dependences include:

· RCurls, bitops, rjson, statmod, and tools

© Patricia Hoffman PhD

© Patricia Hoffman PhD

More Packages

· XML

· RMySql : SQL

· xml2 : work with html and xml

· install.packages("xml2")

· RapidXML : work with excel

· install.packages("readxl")

· http://rapidxml.sourceforge.net/

· readr : easy access to many types of tabular

data

readr Package

· readr : easy access to many types of tabular data

· read_lines() works the same way as readLines(), but is a lot faster.

· read_file() reads a complete file into a string.

· type_convert() attempts to coerce all character columns to their appropriate type. This is useful if you need to do some manual munging (e.g. with regular expressions) to

turn strings into numbers. It uses the same rules as the read_* functions.

· write_csv() writes a data frame out to a csv file. It’s quite a bit faster

than write.csv() and it never writes row.names. It also escapes " embedded in strings in a way that read_csv() can read.

© Patricia Hoffman PhD

haven

· Haven makes it easy to read data from SAS, SPSS and Stata. Haven has the same goal as the foreign package, but it:

· Can read binary SAS7BDAT files.

· Can read Stata13 files.

· Always returns a data frame.

· Haven is a binding to the excellent ReadStat C library

© Patricia Hoffman PhD

igraph: Network analysis and visualization

· Routines for simple graphs and network analysis. igraph can handle large graphs very well and provides functions for generating random and regular graphs, graph visualization, centrality indices and much more.

· http://cran.r-project.org/web/packages/igraph/

© Patricia Hoffman PhD

Iterators and Object Orientation

· iterators: Support for iterators, which allow a programmer to traverse through all the elements of a vector, list, or other collection of data.

· http://cran.r-project.org/web/packages/iterators/

· proto: Prototype object-based programming

· sp: classes and methods for spatial data

© Patricia Hoffman PhD

Package Iterators

· Vignettes in package iterators

· iterators Manual - PDF HYPERLINK "http://127.0.0.1:14772/library/iterators/doc/iterators.Rnw" source HYPERLINK "http://127.0.0.1:14772/library/iterators/doc/iterators.R" R code

· Writing Custom Iterators - PDF HYPERLINK "http://127.0.0.1:14772/library/iterators/doc/writing.Rnw" source HYPERLINK "http://127.0.0.1:14772/library/iterators/doc/writing.R" R code

© Patricia Hoffman PhD

Dates and Times

 chron

Chronological objects which can handle dates

and times

 timeDate

Rmetrics–Chronological and Calendar Objects

© Patricia Hoffman PhD

Visualization

 scales: Scale functions for graphics

 labeling: Axis Labeling

 maps: Draw Geographical Maps

 maptools: Tools for reading and handling spatial objects

 rgdal: Bindings for the Geospatial Data Abstraction Library

 vcd: Visualizing Categorical Data

© Patricia Hoffman PhD

Package: ggmap

· Install from github

· install_github("dkahle/ggmap")

downtown <- subset(crime, -95.39681 <= lon &

lon <= -95.34188 & 29.73631 <= lat & lat <= 29.78400 )

qmplot(lon, lat, data = downtown,

maptype = "toner-background", color = I("blue"))

© Patricia Hoffman PhD

Package: rworldmap

· mapping of country level and gridded user datasets

· joins modern world maps with visualization options.

· Country borders

· Ukraine Map using rworldmap code:

· #install.packages('rworldmap', dep = TRUE)

· library(rworldmap)

· mapUkr <- get_map(location = 'Ukraine', zoom = 5)

· ggmap(mapUkr)

© Patricia Hoffman PhD

image50.jpg
image51.jpg
More Visualization Packages

 lattice

 iplot

 ggplot2

http://wiki.stdout.org/rcookbook/Graphs/

 GGobi

 rggobi

 hexbin

 rCharts

© Patricia Hoffman PhD

Package rgl

· Package rgl: real-time 3D engine written in C++

· http://rgl.neoscientists.org/about.shtml

· http://cran.r- HYPERLINK "http://cran.r-project.org/src/contrib/Descriptions/rgl.html" project.org/src/contrib/Descriptions/rgl.html

© Patricia Hoffman PhD

Interactive Graphics

· GGobi Home Page

· http://www.ggobi.org/

· GGobi Manual

· http://www.ggobi.org/docs/manual.pdf

· rggobi Introduction

· www.ggobi.org/rggobi/introduction.pdf

· rggobi Manual

· http://cran.r- HYPERLINK "http://cran.r-project.org/web/packages/rggobi/rggobi.pdf" project.org/web/packages/rggobi/rggobi.pdf

© Patricia Hoffman PhD

Model Building Packages

· rattle

· Rweka

· Rcmdr

· leaps

· forcast

· http://robjhyndman.com/hyndsight/revolutionr2013/

© Patricia Hoffman PhD

Rcmdr: R Commander

 A platform-independent basic-statistics GUI (graphical user interface) for R, based on the tcltk package.

 http://cran.r-

project.org/web/packages/Rcmdr/

© Patricia Hoffman PhD

Extra Topics To Explore

· Timing Functions

· More Efficient Code

· apply and aggregate Functions

· Caching

Timing Code

image52.jpg
proc.time() returns the current time.

image53.jpg
system.time() times the evaluation of expression

image54.jpg
R has a proler; records which functions are being

run, many times per second. Rprof(filename) turns on the proler,

image55.jpg
Rprof(NULL) turns it off.

image56.jpg
summaryRprof(filename) reports how much time
image57.jpg
was spent in each function.

© Patricia Hoffman PhD

Available Models in R

linear models (lm)

 linear mixed effects models

generalized linear models (glm)

(nlmer)

· generalized additive models (gam)  linear discriminant analysis (lda)

linear mixed effects models (lme)

quadratic discriminate analysis

(qda)

quantile regression (qr)

vector general additive models

trees (tree) (rpart)

(vgam)

random forests (randomForrest)

lasso, ridge, and elastic net models

support vector machines (svm)

(glmnet)

neural networks (nnet)

non-linear models (nlm)

k-nearest neighbors (knn)

Boosted Ensemble (gbm)

Naïve Bayes (NaiveBayes)

· Support Vector Machine (svm)

© Patricia Hoffman PhD

Apply Functions

· base::apply Apply Functions Over Array Margins

· base::by Apply a Function to a Data Frame Split by

image58.jpg
Factors

· base::eapply Apply a Function Over Values in an

image59.jpg
Environment

· base::lapply Apply a Function over a List or Vector

· base::mapply Apply a Function to Multiple List or

image60.jpg
Vector Arguments

· base::rapply Recursively Apply a Function to a List

· base::tapply Apply a Function Over a Ragged Array

© Patricia Hoffman PhD

Applying functions to matrices and

data frames (2)

· Description: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.”

· apply(x, MARGIN, FUN, ...)

· x is an matrix or data frame

· MARGIN = 1 (rows) or 2 (column)

· FUN is a function

· … are optional parameters passed to FUN

© Patricia Hoffman PhD

Applying functions to matrices and

data frames (3)

© Patricia Hoffman PhD

apply Example

(MyFirstRLesson.r)

A

apply(A,1,sum) # sum of rows

#

[,1]

[,2]

#[1] 7 4

# [1,]

2

5

apply(A,2,mean) # mean of columns

# [2,]

1

3

#[1] 1.5 4.0

apply(A,1,function(x) min(x))

# [1] 2 1

© Patricia Hoffman PhD

Aggregating data

· aggregate(x, by, FUN)

· x is the data object to be collapsed

· by is a list of variables that will be cross to form new observations

· FUN is a scalar function used to calculate summary statistics that will make up the new observation values

© Patricia Hoffman PhD

Apply Functions

· base::apply Apply Functions Over Array Margins

· base::by Apply a Function to a Data Frame Split by

image61.jpg
Factors

· base::eapply Apply a Function Over Values in an

image62.jpg
Environment

· base::lapply Apply a Function over a List or Vector

· base::mapply Apply a Function to Multiple List or

image63.jpg
Vector Arguments

· base::rapply Recursively Apply a Function to a List

· base::tapply Apply a Function Over a Ragged Array

© Patricia Hoffman PhD

apply Function

· apply(A, MARGIN, FUN, ...)

· A an array, including a matrix

· MARGIN for matrix

· 1 indicates rows

· 2 indicates columns

· c(1,2) indicates rows and columns

· FUN indicates a function

· … are optional arguments to FUN

© Patricia Hoffman PhD

Aggregate example

options(digits=3)

attach(mtcars)

aggdata <-aggregate(mtcars, by=list(cyl,gear), FUN=mean,

image64.jpg
na.rm=TRUE)

detach(mtcars)

aggdata

Group.1 Group.2

mpg cyl disp

hp drat

wt qsec

vs

am gear carb

1

4

3

21.5

4

120

97 3.70

2.46

20.0

1.0

0.00

3

1.00

6

3

19.8

6

242

108 2.92

3.34

19.8

1.0

0.00

3

1.00

2

8

3

15.1

8

358

194 3.12

4.10

17.1

0.0

0.00

3

3.08

3

4

4

4

26.9

4

103

76 4.11

2.38

19.6

1.0

0.75

4

1.50

6

4

19.8

6

164

116 3.91

3.09

17.7

0.5

0.50

4

4.00

5

4

5

28.2

4

108

102 4.10

1.83

16.8

0.5

1.00

5

2.00

6

7

6

5

19.7

6

145

175 3.62

2.77

15.5

0.0

1.00

5

6.00

8

5

15.4

8

326

300 3.88

3.37

14.6

0.0

1.00

5

6.00

8

© Patricia Hoffman PhD

Melt data

md <- melt(mydata, id=c("id", "time"))

mydata

ID Time X1

X2

ID

Time Variable Value

1

5

6

1

1

X1

5

1

2

3

5

1

1

2

X1

3

1

6

1

2

2

1

X1

6

2

2

4

2

2

2

X1

2

1

1

X2

6

1

2

X2

5

2

1

X2

1

2

2

X2

4

© Patricia Hoffman PhD

Cast data without Aggregation

cast(md, id + time ~ variable)

md

<- melt(mydata, id=c("id", "time"))

ID

Time

Variable

Value

1

1

X1

5

1

2

X1

3

2

1

X1

6

2

2

X1

2

1

1

X2

6

1

2

X2

5

2

1

X2

1

2

2

X2

4

cast(md, id ~ variable + time)

ID

X1

X1

X2

X2

Time1

Time2

Time1

Time2

5

3

6

5

1

6

2

1

4

2

ID

Time

X1

X2

1

1

5

6

1

2

3

5

2

1

6

1

2

2

2

4

cast(md, id + variable ~ time)

ID

Variable

Time1

Time 2

1

X1

5

3

1

X2

6

5

2

X1

6

2

2

X2

1

4

© Patricia Hoffman PhD

Cast with Aggregation

md <- melt(mydata, id=c("id", "time"))

ID

Time

Variable

Value

1

X1

5

1

1

2

X1

3

2

1

X1

6

2

X1

2

2

1

1

X2

6

1

2

X2

5

2

1

X2

1

2

2

X2

4

cast(md, id ~ variable, mean)

ID X1 X2

1 4 5.5

2 4 2.5

cast(md, time ~ variable, mean)

cast(md, id ~ time, mean)

ID Time1 Time2

1 5.5 4

2 3.5 3

Time X1 X2

1 5.5 3.5

2 2.5 4.5

© Patricia Hoffman PhD

grep

· SCCCoal<-

SCC$SCC[grepl("Coal",SCC$EI.Sector)]

· coalEISectors <-grep("[Cc]oal",SCC$EI.Sector)

© Patricia Hoffman PhD

More Examples

· Sampling

· Working with Graphs

© Patricia Hoffman PhD

More to Explore

· help(plotmath)

· help(image)

· help(grep)

© Patricia Hoffman PhD

Caching

· For Calculations Which take a long time Caching is Recommended

· http://userprimary.net/papers/weaver-paper-falcon.pdf

© Patricia Hoffman PhD

· leaps

© Patricia Hoffman PhD