SAS job

profilestrength
week8.zip

Lecture 8 Distribution Analysis in SAS Week 8.ppt



Data Management and Business Intelligence
5CC519

www.derby.ac.uk/engtech



Lecture 5 – Picturing Distributions and Outliers in SAS

Dr John Panneerselvam

College of Engineering and Technology, University of Derby


*

Distributions

*

Descriptive Analytics

The goals when you are describing data are to

  • screen for unusual data values
  • inspect the spread and shape of continuous variables
  • characterize the central tendency
  • draw preliminary conclusions about your data.

*

A Normal Distribution

Can use the skewness and kurtosis along with the histogram to help us decide if your sample follows a normal distribution or not.

*

*

The UNIVARIATE Procedure

General form of the UNIVARIATE procedure:

PROC UNIVARIATE DATA=SAS-data-set <options>;
VAR variables;

CDFPLOT <variables> < / options> ;

HISTOGRAM variables </ options>;

PROBPLOT variables </ options>;

QQPLOT <variables> < / options> ;

VAR variables ;

WEIGHT variable ;

RUN;

*

Descriptive Statistics

  • Mean
  • Sum
  • Sum of the Weights
  • Standard Deviation
  • Variance
  • Covariance
  • Skewness
  • Kurtosis

*

Creating Histogram

title 'Analysis of Plating Thickness';

proc univariate data=Trans noprint;

histogram Thick;

run;

*

Two Way Comparative Histogram

title 'Results of Supplier Training Program';

proc univariate data=Disk noprint;

class Supplier Year;

histogram Width / intertile = 1.0

vaxis = 0 10 20 30

ncols = 2

nrows = 2;

run;

*

Histogram with a Fit

proc univariate data=Robots;

histogram Length /

beta(theta=10 scale=0.5 fill)

href = 10

hreflabel = 'Lower Bound'

odstitle = 'Fitted Beta

Distribution of Offsets';

inset n = 'Sample Size' /

pos=ne cfill=blank;

run;

*

The UNIVARIATE Procedure

*

CDF with a Fit

proc univariate data=Cord noprint;

cdf Strength / normal;

inset normal(mu sigma);

run;

*

Quantile Fit

symbol v=plus;

title 'Two-Parameter Lognormal Q-Q Plot for Diameters';

ods graphics off;

proc univariate data=ModifiedMeasures noprint;

qqplot LogDiameter / normal(mu=est sigma=est)

square

vaxis=axis1;

inset n mean (5.3) std (5.3) / pos = nw header = 'Summary Statistics';

axis1 label=(a=90 r=0);

run;

*

Outliers

*

Box-and-Whisker Plots

The mean is denoted by a ◊.

largest point <= 1.5 I.Q. from the box

the 75th percentile

the 25th percentile

the 50th percentile (median)

smallest point <= 1.5 I.Q. from the box

outliers > 1.5 I.Q. from the box

1.5 I.Q.

*

Made up of percentiles.

Blue box is the interquartile range.

*

The BOXPLOT Procedure

General form of the UNIVARIATE procedure:

PROC BOXPLOT options ;

BY variables ;

INSET keywords </options> ;

INSETGROUP keywords </ options> ;

PLOT analysis-variable*group-variable <(block-variables)> <=symbol-variable> </ options > ;

Run;

*

Creating a Box Plot

proc boxplot data=Turbine;

plot KWatts*Day / boxstyle = schematic

outbox = OilSchematic;

run;

*

Enhancing a Box Plot

title 'Box Plot for Power Output';

proc boxplot data=Turbine;

plot KWatts*Day;

inset min mean max stddev / header = 'Overall Statistics‘ pos = tm;

insetgroup min max /

header = 'Extremes by Day';

run;

*

*

  • Big Data Fundamentals Concepts, Drivers and Techniques (Available in Week 5)
  • SAS Programming 1: Essentials- SAS Institute.
  • SAS: The Power to Know- Official documentation.

www.derby.ac.uk/engtech

References

*

www.derby.ac.uk/engtech

Software testing week 8.pptx

Data Management and Business Intelligence 5CC519

www.derby.ac.uk/engtech

Software Testing

Dr John Panneerselvam

College of Engineering and Technology, University of Derby

Sensitivity: Internal

1

www.derby.ac.uk/engtech

Overview

▪ Definition of Software Testing

▪ Problems with Testing

▪ Benefits of Testing

▪ Effective Methods for Testing

Sensitivity: Internal

2

www.derby.ac.uk/engtech

Definition

Software testing is the process of executing a software system to determine whether it matches its specification and executes in its intended environment.

“Program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” [Dijkstra, 1972]

Sensitivity: Internal

3

www.derby.ac.uk/engtech

Cost of Delaying the Release of a Software

Product

Timing is another important factor to consider.

New products: The first to the market often sells better than superior products that are released later.

Sensitivity: Internal

4

www.derby.ac.uk/engtech

Cutting Testing Costs can Increase other

Costs

Customer support can be very expensive.

Less bugs = less calls.

Customers will look for more reliable solutions.

Software organizations must perform cost benefit analysis’ to determine how much to spend on testing

Sensitivity: Internal

5

www.derby.ac.uk/engtech

Problems with Testing

Since it is impossible to find every fault in

a software system, bugs will be found by

customers after the product is released

Sensitivity: Internal

6

www.derby.ac.uk/engtech

Reasons that Bugs Escape Testing

User executed untested code.

User executed statements in a different order than was tested.

User entered an untested combination of inputs.

User’s operating environment was not tested

Sensitivity: Internal

7

www.derby.ac.uk/engtech

Why Can’t Every Bug be Found

Too many possible paths.

Too many possible inputs.

Too many possible user environments.

Sensitivity: Internal

8

www.derby.ac.uk/engtech

Too Many Possible Paths

Sensitivity: Internal

9

www.derby.ac.uk/engtech

Too Many Possible Inputs

Programs take input in a variety of ways: mouse, keyboard, and other devices.

Must test Valid and Invalid inputs.

Most importantly, there are an infinite amount of sequences of inputs to be tested.

Sensitivity: Internal

10

www.derby.ac.uk/engtech

Too Many Possible User Environments

Difficult to replicate the user’s combination of hardware, peripherals, OS, and applications.

Impossible to replicate a thousand-node network to test networking software.

Sensitivity: Internal

11

www.derby.ac.uk/engtech

Phases of the Software Process

Sensitivity: Internal

12

www.derby.ac.uk/engtech

Why No Testing Phase?

Testing must be done at every phase.

Testing of a phase must be built upon and checked against the results of the previous phase.

Non-execution based testing is done in early phases (before executable code is produced).

Execution and non-execution based testing can be done in later phases.

Sensitivity: Internal

13

www.derby.ac.uk/engtech

Black-Box / White-Box Testing

Black-box tests are driven by the program’s specification

White-box tests are driven by the program’s implementation

Sensitivity: Internal

14

www.derby.ac.uk/engtech

Sensitivity: Internal

15