Data Analysis and Analytics THis 2 parts
: Creating a SAS Table from a CSV File Transcript
Watch the "Creating a SAS Table from a CSV File" video.
Creating a SAS table from a CSV file
1
In this video, you learn to
save an Excel worksheet as a
2
comma-separated-value file, use
the DATA step to read that
3
file, and create a SAS table.
Comma-separated-value, or CSV,
4
files are a common format for
data files that are stored as
5
plain text. Many software
applications, including SAS,
6
provide methods to export data
as CSV files. In order to use
7
the power of SAS to manage,
analyze, and present the data
8
in a CSV file, you need to
convert this text file to a
9
format that SAS can use: a SAS
table. Let's look at an example
10
using a Microsoft Excel file. I
want to use SAS to analyze the
11
data in testscores.xlsx. You
can see that testscores.xlsx
12
contains four columns of data.
The first row contains the
13
headers for each column. Note
that there is only one
14
worksheet in this Excel
workbook. To save this file in
15
CSV format, from the File menu,
I select Save As and navigate
16
to the location where I want to
save the file. Then in the Save
17
as type drop-down list, I
select CSV (Comma delimited)
18
and click Save. I know that
formatting and other Excel-
19
specific features will be lost
when I save the file in this
20
format, so I'll click Yes in
the pop-up dialog box. Now
21
let's close Excel. Because CSV
files are text files, they can
22
be viewed using any text
editor. I'll open
23
testscores.csv in Notepad. You
can see that each row is a
24
record with four fields, and
each field is separated with a
25
comma. As in Excel, the very
first record has a list of the
26
field identifiers, in this case
separated with a comma. You can
27
use a DATA step to create a SAS
table. The CSV file serves as
28
the input source. Let's go
ahead and write this program.
29
I'll start by typing a DATA
statement, beginning with the
30
keyword DATA. By the way, I've
disabled the autocomplete
31
feature to make it easier for
you to see what I'm typing. You
32
might choose to leave this
feature enabled as you work. I
33
specify the name of the new
table, testscores, and end the
34
statement with a semicolon.
Next I write an INFILE
35
statement to specify the CSV
file to be read. I type the
36
keyword infile, followed by the
location and name of the CSV
37
file in quotation marks. I need
to make sure that the path and
38
name are exactly correct. In
SAS Studio, I can select the
39
file in the Folders area,
right-click, and select
40
Properties. The paths match, so
I can continue with my program.
41
Before I end the INFILE
statement with a semicolon,
42
there are two options that I
need to include. First, I need
43
to tell SAS that this is a
delimited file, where each
44
field is separated by a comma.
The DLM= option in SAS
45
specifies an alternate
delimiter other than a blank. I
46
type dlm=, followed by the
comma character in quotation
47
marks. Second, I need to let
SAS know not to read the first
48
row of field identifiers as
data. I'll use the option
49
FIRSTOBS=2. This tells SAS to
read the data starting with the
50
second row. Otherwise, SAS
would start reading the first
51
row as data values, rather than
as field names. I'll type a
52
semicolon to complete the
statement. Now I'll write the
53
INPUT statement, which defines
the fields in each row. I
54
specify the keyword input,
followed by the list of columns
55
that SAS must define, in the
order in which they appear in
56
the CSV file from left to
right. To create a character
57
column, I specify a dollar sign
after the column name. Keep in
58
mind that you must follow the
SAS naming conventions when
59
naming columns and tables. SAS
column names can be 1 to 32
60
characters long. The name must
start with a letter or
61
underscore and can continue
with any combination of
62
numbers, letters, or
underscores. No blanks are
63
allowed. SAS column names can
be uppercase, lowercase, or
64
mixed case. SAS creates each
name in the same case that you
65
first specify it, and that is
how it appears in reports.
66
However, after a column has
been created, you can use the
67
name in any case in your
programming code without
68
affecting how it is stored. You
apply these same naming
69
conventions to SAS table names.
After I define the four columns
70
to be included, I'll end the
INPUT statement with a
71
semicolon. Then I complete the
program with a RUN statement.
72
Let's submit the code. I'm
going to check the log to make
73
sure that the program ran
without error. There are no
74
errors or warnings, so that's
good. Let's look at the notes.
75
I can see that SAS read 80
records from the file
76
testscores.csv and created the
table work.testscores. There
77
are four columns, or variables,
in the table. I can also look
78
at work.testscores by going to
the Libraries area in the
79
navigation pane. I'll drill
down, open the work library,
80
and double-click the new table,
testscores, to open it. I can
81
see I've created a SAS table
that is a copy of the original
82
CSV file. I have four columns
based on the names I used in
83
the INPUT statement. In this
video, you learned to save an
84
Excel worksheet as a comma-
separated-value file, use the
85
DATA step to read that file,
and create a SAS table.