SAS job
Lecture 8 Distribution Analysis in SAS Week 8.ppt
Data Management and Business Intelligence
5CC519
www.derby.ac.uk/engtech
Lecture 5 – Picturing Distributions and Outliers in SAS
Dr John Panneerselvam
College of Engineering and Technology, University of Derby
*
Distributions
*
Descriptive Analytics
The goals when you are describing data are to
- screen for unusual data values
- inspect the spread and shape of continuous variables
- characterize the central tendency
- draw preliminary conclusions about your data.
*
A Normal Distribution
Can use the skewness and kurtosis along with the histogram to help us decide if your sample follows a normal distribution or not.
*
*
The UNIVARIATE Procedure
General form of the UNIVARIATE procedure:
PROC UNIVARIATE DATA=SAS-data-set <options>;
VAR variables;
CDFPLOT <variables> < / options> ;
HISTOGRAM variables </ options>;
PROBPLOT variables </ options>;
QQPLOT <variables> < / options> ;
VAR variables ;
WEIGHT variable ;
RUN;
*
Descriptive Statistics
- Mean
- Sum
- Sum of the Weights
- Standard Deviation
- Variance
- Covariance
- Skewness
- Kurtosis
*
Creating Histogram
title 'Analysis of Plating Thickness';
proc univariate data=Trans noprint;
histogram Thick;
run;
*
Two Way Comparative Histogram
title 'Results of Supplier Training Program';
proc univariate data=Disk noprint;
class Supplier Year;
histogram Width / intertile = 1.0
vaxis = 0 10 20 30
ncols = 2
nrows = 2;
run;
*
Histogram with a Fit
proc univariate data=Robots;
histogram Length /
beta(theta=10 scale=0.5 fill)
href = 10
hreflabel = 'Lower Bound'
odstitle = 'Fitted Beta
Distribution of Offsets';
inset n = 'Sample Size' /
pos=ne cfill=blank;
run;
*
The UNIVARIATE Procedure
*
CDF with a Fit
proc univariate data=Cord noprint;
cdf Strength / normal;
inset normal(mu sigma);
run;
*
Quantile Fit
symbol v=plus;
title 'Two-Parameter Lognormal Q-Q Plot for Diameters';
ods graphics off;
proc univariate data=ModifiedMeasures noprint;
qqplot LogDiameter / normal(mu=est sigma=est)
square
vaxis=axis1;
inset n mean (5.3) std (5.3) / pos = nw header = 'Summary Statistics';
axis1 label=(a=90 r=0);
run;
*
Outliers
*
Box-and-Whisker Plots
The mean is denoted by a ◊.
largest point <= 1.5 I.Q. from the box
the 75th percentile
the 25th percentile
the 50th percentile (median)
smallest point <= 1.5 I.Q. from the box
outliers > 1.5 I.Q. from the box
1.5 I.Q.
*
Made up of percentiles.
Blue box is the interquartile range.
*
The BOXPLOT Procedure
General form of the UNIVARIATE procedure:
PROC BOXPLOT options ;
BY variables ;
INSET keywords </options> ;
INSETGROUP keywords </ options> ;
PLOT analysis-variable*group-variable <(block-variables)> <=symbol-variable> </ options > ;
Run;
*
Creating a Box Plot
proc boxplot data=Turbine;
plot KWatts*Day / boxstyle = schematic
outbox = OilSchematic;
run;
*
Enhancing a Box Plot
title 'Box Plot for Power Output';
proc boxplot data=Turbine;
plot KWatts*Day;
inset min mean max stddev / header = 'Overall Statistics‘ pos = tm;
insetgroup min max /
header = 'Extremes by Day';
run;
*
*
- Big Data Fundamentals Concepts, Drivers and Techniques (Available in Week 5)
- SAS Programming 1: Essentials- SAS Institute.
- SAS: The Power to Know- Official documentation.
www.derby.ac.uk/engtech
References
*
www.derby.ac.uk/engtech
Software testing week 8.pptx
Data Management and Business Intelligence 5CC519
www.derby.ac.uk/engtech
Software Testing
Dr John Panneerselvam
College of Engineering and Technology, University of Derby
Sensitivity: Internal
1
www.derby.ac.uk/engtech
Overview
▪ Definition of Software Testing
▪ Problems with Testing
▪ Benefits of Testing
▪ Effective Methods for Testing
Sensitivity: Internal
2
www.derby.ac.uk/engtech
Definition
Software testing is the process of executing a software system to determine whether it matches its specification and executes in its intended environment.
“Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” [Dijkstra, 1972]
Sensitivity: Internal
3
www.derby.ac.uk/engtech
Cost of Delaying the Release of a Software
Product
Timing is another important factor to consider.
New products: The first to the market often sells better than superior products that are released later.
Sensitivity: Internal
4
www.derby.ac.uk/engtech
Cutting Testing Costs can Increase other
Costs
Customer support can be very expensive.
Less bugs = less calls.
Customers will look for more reliable solutions.
Software organizations must perform cost benefit analysis’ to determine how much to spend on testing
Sensitivity: Internal
5
www.derby.ac.uk/engtech
Problems with Testing
Since it is impossible to find every fault in
a software system, bugs will be found by
customers after the product is released
Sensitivity: Internal
6
www.derby.ac.uk/engtech
Reasons that Bugs Escape Testing
User executed untested code.
User executed statements in a different order than was tested.
User entered an untested combination of inputs.
User’s operating environment was not tested
Sensitivity: Internal
7
www.derby.ac.uk/engtech
Why Can’t Every Bug be Found
Too many possible paths.
Too many possible inputs.
Too many possible user environments.
Sensitivity: Internal
8
www.derby.ac.uk/engtech
Too Many Possible Paths
Sensitivity: Internal
9
www.derby.ac.uk/engtech
Too Many Possible Inputs
Programs take input in a variety of ways: mouse, keyboard, and other devices.
Must test Valid and Invalid inputs.
Most importantly, there are an infinite amount of sequences of inputs to be tested.
Sensitivity: Internal
10
www.derby.ac.uk/engtech
Too Many Possible User Environments
Difficult to replicate the user’s combination of hardware, peripherals, OS, and applications.
Impossible to replicate a thousand-node network to test networking software.
Sensitivity: Internal
11
www.derby.ac.uk/engtech
Phases of the Software Process
Sensitivity: Internal
12
www.derby.ac.uk/engtech
Why No Testing Phase?
Testing must be done at every phase.
Testing of a phase must be built upon and checked against the results of the previous phase.
Non-execution based testing is done in early phases (before executable code is produced).
Execution and non-execution based testing can be done in later phases.
Sensitivity: Internal
13
www.derby.ac.uk/engtech
Black-Box / White-Box Testing
Black-box tests are driven by the program’s specification
White-box tests are driven by the program’s implementation
Sensitivity: Internal
14
www.derby.ac.uk/engtech
Sensitivity: Internal
15