Data Analysis and Analytics THis 2 parts

Chan28
CreatingaSASTablefromaCSVFileTranscript1.docx

: Creating a SAS Table from a CSV File Transcript

Watch the "Creating a SAS Table from a CSV File" video.

Creating a SAS table from a CSV file

1

In this video, you learn to

save an Excel worksheet as a

2

comma-separated-value file, use

the DATA step to read that

3

file, and create a SAS table.

Comma-separated-value, or CSV,

4

files are a common format for

data files that are stored as

5

plain text. Many software

applications, including SAS,

6

provide methods to export data

as CSV files. In order to use

7

the power of SAS to manage,

analyze, and present the data

8

in a CSV file, you need to

convert this text file to a

9

format that SAS can use: a SAS

table. Let's look at an example

10

using a Microsoft Excel file. I

want to use SAS to analyze the

11

data in testscores.xlsx. You

can see that testscores.xlsx

12

contains four columns of data.

The first row contains the

13

headers for each column. Note

that there is only one

14

worksheet in this Excel

workbook. To save this file in

15

CSV format, from the File menu,

I select Save As and navigate

16

to the location where I want to

save the file. Then in the Save

17

as type drop-down list, I

select CSV (Comma delimited)

18

and click Save. I know that

formatting and other Excel-

19

specific features will be lost

when I save the file in this

20

format, so I'll click Yes in

the pop-up dialog box. Now

21

let's close Excel. Because CSV

files are text files, they can

22

be viewed using any text

editor. I'll open

23

testscores.csv in Notepad. You

can see that each row is a

24

record with four fields, and

each field is separated with a

25

comma. As in Excel, the very

first record has a list of the

26

field identifiers, in this case

separated with a comma. You can

27

use a DATA step to create a SAS

table. The CSV file serves as

28

the input source. Let's go

ahead and write this program.

29

I'll start by typing a DATA

statement, beginning with the

30

keyword DATA. By the way, I've

disabled the autocomplete

31

feature to make it easier for

you to see what I'm typing. You

32

might choose to leave this

feature enabled as you work. I

33

specify the name of the new

table, testscores, and end the

34

statement with a semicolon.

Next I write an INFILE

35

statement to specify the CSV

file to be read. I type the

36

keyword infile, followed by the

location and name of the CSV

37

file in quotation marks. I need

to make sure that the path and

38

name are exactly correct. In

SAS Studio, I can select the

39

file in the Folders area,

right-click, and select

40

Properties. The paths match, so

I can continue with my program.

41

Before I end the INFILE

statement with a semicolon,

42

there are two options that I

need to include. First, I need

43

to tell SAS that this is a

delimited file, where each

44

field is separated by a comma.

The DLM= option in SAS

45

specifies an alternate

delimiter other than a blank. I

46

type dlm=, followed by the

comma character in quotation

47

marks. Second, I need to let

SAS know not to read the first

48

row of field identifiers as

data. I'll use the option

49

FIRSTOBS=2. This tells SAS to

read the data starting with the

50

second row. Otherwise, SAS

would start reading the first

51

row as data values, rather than

as field names. I'll type a

52

semicolon to complete the

statement. Now I'll write the

53

INPUT statement, which defines

the fields in each row. I

54

specify the keyword input,

followed by the list of columns

55

that SAS must define, in the

order in which they appear in

56

the CSV file from left to

right. To create a character

57

column, I specify a dollar sign

after the column name. Keep in

58

mind that you must follow the

SAS naming conventions when

59

naming columns and tables. SAS

column names can be 1 to 32

60

characters long. The name must

start with a letter or

61

underscore and can continue

with any combination of

62

numbers, letters, or

underscores. No blanks are

63

allowed. SAS column names can

be uppercase, lowercase, or

64

mixed case. SAS creates each

name in the same case that you

65

first specify it, and that is

how it appears in reports.

66

However, after a column has

been created, you can use the

67

name in any case in your

programming code without

68

affecting how it is stored. You

apply these same naming

69

conventions to SAS table names.

After I define the four columns

70

to be included, I'll end the

INPUT statement with a

71

semicolon. Then I complete the

program with a RUN statement.

72

Let's submit the code. I'm

going to check the log to make

73

sure that the program ran

without error. There are no

74

errors or warnings, so that's

good. Let's look at the notes.

75

I can see that SAS read 80

records from the file

76

testscores.csv and created the

table work.testscores. There

77

are four columns, or variables,

in the table. I can also look

78

at work.testscores by going to

the Libraries area in the

79

navigation pane. I'll drill

down, open the work library,

80

and double-click the new table,

testscores, to open it. I can

81

see I've created a SAS table

that is a copy of the original

82

CSV file. I have four columns

based on the names I used in

83

the INPUT statement. In this

video, you learned to save an

84

Excel worksheet as a comma-

separated-value file, use the

85

DATA step to read that file,

and create a SAS table.