data analysis and viz

profileyesh777
ITS530-Week2ProgrammingforStatisticalAnalysis.pdf

How to Use the R Programming

Language for Statistical Analyses Part I: An Introduction to R

What Is R?

◼ a programming “environment”

◼ object-oriented

◼ similar to S-Plus

◼ freeware

◼ provides calculations on matrices

◼ excellent graphics capabilities

◼ supported by a large user network

What is R Not?

◼ a statistics software package

◼ menu-driven

◼ quick to learn

◼ a program with a complex graphical interface

Installing R

◼ www.r-project.org/

◼ download from CRAN

◼ select a download site

◼ download the base package at a minimum

◼ download contributed packages as needed

Tutorials

◼ From R website under “Documentation”

– “Manual” is the listing of official R documentation

• An Introduction to R

• R Language Definition

• Writing R Extensions

• R Data Import/Export

• R Installation and Administration

• The R Reference Index

Tutorials cont.

– “Contributed” documentation are tutorials and

manuals created by R users

• Simple R

• R for Beginners

• Practical Regression and ANOVA Using R

– R FAQ

– Mailing Lists (listserv)

• r-help

Tutorials cont.

◼ Textbooks

– Venables & Ripley (2002) Modern Applied

Statistics with S. New York: Springer-Verlag.

– Chambers (1998). Programming With Data: A

guide to the S language. New York: Springer-

Verlag.

R Basics

◼ objects

◼ naming convention

◼ assignment

◼ functions

◼ workspace

◼ history

Objects

◼ names

◼ types of objects: vector, factor, array, matrix,

data.frame, ts, list

◼ attributes

– mode: numeric, character, complex, logical

– length: number of elements in object

◼ creation

– assign a value

– create a blank object

Naming Convention

◼ must start with a letter (A-Z or a-z)

◼ can contain letters, digits (0-9), and/or

periods “.”

◼ case-sensitive

– mydata different from MyData

◼ do not use use underscore “_”

Assignment

◼ “<-” used to indicate assignment

– x<-c(1,2,3,4,5,6,7)

– x<-c(1:7)

– x<-1:4

◼ note: as of version 1.4 “=“ is also a valid assignment operator

Functions

◼ actions can be performed on objects using

functions (note: a function is itself an object)

◼ have arguments and options, often there are

defaults

◼ provide a result

◼ parentheses () are used to specify that a

function is being called

Let’s look at R

R Workspace & History

Workspace

◼ during an R session, all objects are stored in

a temporary, working memory

◼ list objects

– ls()

◼ remove objects

– rm()

◼ objects that you want to access later must be

saved in a “workspace”

– from the menu bar: File->save workspace

– from the command line: save(x,file=“MyData.Rdata”)

History

◼ command line history

◼ can be saved, loaded, or displayed

– savehistory(file=“MyData.Rhistory)

– loadhistory(file=“MyData.Rhistory)

– history(max.show=Inf)

◼ during a session you can use the arrow keys

to review the command history

Two most common object types

for statistics:

matrix

data frame

Matrix

◼ a matrix is a vector with an additional attribute

(dim) that defines the number of columns and

rows

◼ only one mode (numeric, character, complex,

or logical) allowed

◼ can be created using matrix()

x<-matrix(data=0,nr=2,nc=2)

or

x<-matrix(0,2,2)

Data Frame

◼ several modes allowed within a single data

frame

◼ can be created using data.frame() L<-LETTERS[1:4] #A B C D

x<-1:4 #1 2 3 4

data.frame(x,L) #create data frame

◼ attach() and detach() – the database is attached to the R search path so that the database is

searched by R when it is evaluating a variable.

– objects in the database can be accessed by simply giving their names

Data Elements

◼ select only one element

– x[2]

◼ select range of elements

– x[1:3]

◼ select all but one element

– x[-3]

◼ slicing: including only part of the object

– x[c(1,2,5)]

◼ select elements based on logical operator

– x(x>3)

Data Import & Entry

Importing Data

◼ read.table()

– reads in data from an external file

◼ data.entry()

– create object first, then enter data

◼ c()

– concatenate

◼ scan()

– prompted data entry

◼ R has ODBC for connecting to other programs

Data entry & editing

◼ start editor and save changes

– data.entry(x)

◼ start editor, changes not saved

– de(x)

◼ start text editor

– edit(x)