Biostatistics
DATA ANALYSIS ASSIGNMENT #1: Analysis of Heart Data Set
This assignment uses a data set that is included as part of the SAS software for training purposes. For this assignment, you will demonstrate that you can use the SAS OnDemand Web Editor to:
· open a data set
· print out information on the format of the data
· compute basic descriptive statistics, graphs, and tables
NOTE: Before you complete this assignment you must sign up for SAS On-Demand for Academics for the course COH 602 offered at National University. A letter with a link on how to do this is posted on the course website.
1. Go to the SAS OnDemand for Academics website and log in to your account.
2. In your SAS OnDemand profile page, click on “SAS® Studio,” (under “Applications”) and the SAS Web Editor should open.
3. In the left window of the Web Editor, you will see “Libraries,” (under the “Server Files and Folders”) click on this to expand it (by clicking the right arrow). Then click on the right arrow next to “My Libraries” to expand this section.
4. Click on the right arrow next to the library named “SASHELP” and continue scrolling down until you see the dataset named “HEART” and double-click on it.
- A Table View of the data set will open. This just lets you see what the data set looks like, which will make the next exercise make more sense for you
- If you click on the dropdown arrow for the HEART dataset, you will be able to see all the variables in the dataset within the Libraries window. This will help you when writing your code.
5. In the right window, click on the tab that says “Code”. This will bring up a blank screen.
6. Type PROC CONTENTS DATA = SASHELP.HEART; RUN;
- NOTE: Do not forget the semi-colons after “HEART” and “RUN”!! The procedure will not run without them.
- This statement will print the contents of the data set with the name sashelp.heart, including the number of variables, the number of records, and information such as the variable name, type, and length. A partial example of this is shown below. FYI The sashelp.heart data set comes from the Framingham Heart Study.
7. Click on the icon at the top that looks like a running man.
- Congratulations! You have now produced your first output.
8. You are now going to produce some frequency distributions. Select three of the variables in the HEART dataset (do not select ID as one of these). At least two of the variables must be numeric. You know the variable names and whether they are numeric by looking at the output from your PROC CONTENTS. A frequency table will give you every data point that is contained within a variable. Type the following in your code window:
PROC FREQ DATA = SASHELP.HEART;
TABLES variable1 variable2 variable3;
RUN;
(Type the names of the variables you have chosen instead of variable1 etc.)
9. For the variables you selected under in step 8 create vertical bar charts to give you a graphical representation of the data values you produced in the frequency table:
PROC CHART DATA = SASHELP.HEART;
VBAR variable1 variable2 variable3;
RUN;
(Type the names of the variables you have chosen instead of variable1 etc.)
10. For the numeric variables only that you selected in step 8 compute descriptive statistics that will give you a clearer picture about your sample population, for each variable. Try both SAS codes below to see the differences in results they produce.
PROC UNIVARIATE DATA = SASHELP.HEART;
VAR variable1 variable2 variable3;
HISTOGRAM / NORMAL;
RUN;
OR YOU CAN USE
PROC MEANS DATA = SASHELP.HEART MEAN STD MEDIAN MODE;
VAR variable1 variable2 variable3;
RUN;
11. Write a one page summary of your analysis in which you discuss the results of the statistical procedures you ran in SAS. Report the mean, standard deviation, median, interquartile range, and mode for each numeric variable, and report the count and percentage of each category (or level) for each categorical variable. Interpret what these values tell you about the sample population of people you are studying, based upon these variables. Using the histogram from PROC UNIVARIATE, as well as the mean and median values you produced, explain if each numeric variable is positively skewed, negatively skewed, or symmetric. If you use the “skewness” statistic as part of your evidence for this part, then make sure that you understand how to interpret it correctly (see notes below). For every statement you make about a variable, in your interpretation, make sure to support that statement with evidence from the descriptive statistics you have produced. “Evidence” consists of actual statistics (report the numbers) that support what you are stating.
This paper should be typed and double-spaced in a Word document and submitted in Blackboard, in the same section where you downloaded this prompt. Do not forget to type your name in the document!
Below is the grading rubric with the criteria that I will be using for the grading. Please use this as a guide when writing your summary to ensure that you do not miss anything important.
|
Grading Item |
Points Possible |
Points Earned |
Comments |
|
Descriptive statistics reported for each type of variable are appropriate & accurate |
2 |
|
|
|
Interpretation of descriptive statistics for each variable is accurate |
4 |
|
|
|
Discussion of skewness of distribution for each numeric variable is accurate |
4 |
|
|
|
Summary has the correct format and length; writing is clear and understandable |
4 |
|
|
|
TOTAL |
14 |
|
|
Interpreting Skewness
In everyday language, the terms “skewed” and “askew” are used to refer to something that is out of line or distorted on one side. When referring to the shape of frequency or probability distributions, “skewness” refers to asymmetry of the distribution. A distribution with an asymmetric tail extending out to the right is referred to as “positively skewed” or “skewed to the right,” while a distribution with an asymmetric tail extending out to the left is referred to as “negatively skewed” or “skewed to the left.” Skewness can range from minus infinity to positive infinity.
Interpreting the Skewness Statistic in SAS
1. If Skewness = 0, then the frequency distribution is normal and symmetrical.
2. If Skewness > 0 (has a positive value), then the frequency distribution is positively skewed.
3. If Skewness < 0 (has a negative value), then the frequency distribution is negatively skewed.