statguru
Assignment brief and tutorials/Assignment Brief and Example presentation/~$C519 DMBI Assignment Brief.doc
Assignment brief and tutorials/Assignment Brief and Example presentation/5CC519 DMBI Assignment Brief.doc
|
University of Derby School of Computing and Mathematics
|
||
|
In-course Assignment Specification
|
||
|
Module Code and Title: 5CC519 Data management and Business Intelligence
|
||
|
Assignment No. and Title: A SAS-based information analysis and presentation system
|
||
|
Assessment Tutor: John Panneerselvam
|
Weighting Towards Module Grade: 100%
|
|
|
Date Set: 25/01/2018
|
Hand-In Deadline Date: Various: Design Justification: 29 March 2018, 11.59pm. Demonstration: 4 May 2018 |
|
|
|
||
|
Penalty for Late Submission Assignments are submitted electronically. The electronic submission area will continue to accept work after the due date. The date and time of submission will be recorded to identify late submissions. Please visit the regulations pages (http://www.derby.ac.uk/regs) for more information. Between the opening of the assignment submission and the deadline date, in most cases, you can submit and overwrite with a resubmission repeatedly. It's important you try to submit as early as you can to take advantage of the plagiarism checking function prior to your final submission. This will also mean that you've submitted something for your lecturer to mark if circumstances beyond your control mean that you can't complete your final submission. For more information on this please see our exceptional extenuating circumstances (http://www.derby.ac.uk/EEC) page. See http://www.derby.ac.uk/esub for more information.
|
||
|
|
||
|
|
Level of Collaboration: None |
|
|
|
||
|
Learning Outcomes covered in this Assignment: 1. Identify and critically evaluate the key requirements for effective data management, analysis and presentation to meet users’ needs 2. Develop a small management information reporting system that meets the users’ needs using a prototyping / Agile approach
|
||
|
Criteria for Assessment: See Portfolio Specification
|
The Task
Overall, you will specify, design and build a small management information analysis and presentation system to analyse the NHS Transparency RTT (Referral to Treatment Time) datasets from the following location https://www.england.nhs.uk/statistics/statistical-work-areas/rtt-waiting-times/rtt-data-2016-17/ and use the XLS spreadsheets for the Admitted Adjusted Commissioner statistics.
The first part of the Portfolio, due on Thursday 29th March 2018 at 11:59 pm in Turnitin, is a justified specification of the Management Information System (MIS).
The second part of the portfolio, which will be assessed on Friday 4th May 2018, will be a demonstration of your SAS based MIS. The demonstration schedule will be published by Week 10.
The relative weightings of the two components is:-
Design Justification 30%
SAS based MIS 70%
Part 1 - Design and Justification - 30%
This will be a short business style report of approximately 1000 words which provides an outline of the MIS and its design. In order to prepare yourself to design the system and write the report, you will need to carry out at the following steps.
You are required to use the template provided for the Outline Design Specification. Please do not remove the section marking criteria. They provide you with a checklist of what you need to achieve for each section.
Step 1
Understand the structure and meaning of the data from the RTT Monthly spreadsheets; you will be able to do this using both Excel and SAS. From this, identify and justify the key stakeholders who have an interest in the data presented in the spreadsheets.
Step 2
Choose one of the stakeholders from the above analysis and identify and fully justify no more than three key questions that can be answered by a suitable selection and presentation of data from up to five of the RTT monthly sets of data.
Step 3
Identify the relevant data and the appropriate selection, summarisation and presentation necessary to allow the stakeholder to pose their question and then gain a suitably presented answer. You may need to consider performance issues in designing a reasonably efficient means of storing your data (this could be done using the Log statistics from the SAS 9.4 system when you experiment with different approaches).
Step 4
Present and justify simple mock-ups of the relevant data entry and presentation interfaces. User interface can be developed using MENU, SELECT, FRAME, PROJECT etc. Select and deploy the appropriate user interface using SAS.
Step 5
Identify and justify how you will prove that the MIS system that will be developed based on this specification and provide suitable test schedule specifications and test conditions.
Report Content
The key aspects that must be addressed in the specification report are:
1) Identification of the Stakeholder chosen for this MIS and justification of up to three types of questions
2) Identification and justification for the chosen Data Architecture to support the selection and presentation of the data necessary to provide answers to the questions
3) Identification and justification for the chosen Systems Logic to support the selection and presentation of the data necessary to provide answers to the questions
4) Identification and justification for the chosen User Interface to support the selection and presentation of the data necessary to provide answers to the questions
5) Identification and justification of the Test Data and Schedule to ensure that the system meets the Stakeholders’ needs
Assessment Criteria for the Design Report
Each of the assessment section will have a weighting of 20%.
|
Section |
100%-70% |
60%-69% |
59%-50% |
49%-40% |
39%-35% |
34%-0% |
|
Stakeholders |
Identifies stakeholder and 3 Key Questions - comprehensive justification |
Identifies stakeholder and 3 Key Questions and good justification |
Identifies stakeholder and 2 Key Questions and some justification |
Identifies stakeholder and 1 Key Question - basic justification |
Identifies either stakeholder or some questions |
Little of merit |
|
Data Architecture |
Justifies best option for RTT data based on SAS Log metrics for best performance |
Justifies SAS structure based on 3rd Normal concepts, i.e. storage efficiency |
Basic justification of the SAS storage structure in relation to questions |
Defines single repository in SAS |
Defines RTT sources as XL |
Little of merit |
|
Systems Logic |
Clearly justified specification of logic, using appropriate techniques such as UML or Flow-charts
|
Clearly justified specification of key logic |
Clear specification of logic |
Basic selection and processing criteria |
Identification of simple criteria |
Little of merit |
|
User Interface |
Fully justified definition of all aspects of the user interface including “criteria” input screens and output reports and graphics |
Justified definition of most aspects of the user interface including output reports and graphics |
Definition of some aspects of the user interface including output reports and graphics |
Definition of a few aspects of the user interface including output graphics |
Identifies types of graphs that might answer a user’s question |
Little of merit |
|
Test Requirements |
Comprehensively defined and justified test schedules and test data |
Justifies test schedules and test data |
Defines test schedules and test data |
Defines testing process |
Identifies need for testing and suggests how it might be done |
Little of merit |
Part 2 - Build MIS in SAS - 70%
Part 2.1 (50%):
Using the Outline Specification from part one of the portfolio, you will develop a small MIS application in SAS that implements the intent of your specification.
Part 2.2 (20%):
Each week you have been given a set of exercises to test your understanding of the material covered that week. You are to demonstrate a selection of the exercises that you have done for marking which will be randomly selected from exercises 1-10. Questions will be asked to test your understanding. The assessment will be based on your understanding and the knowledge of SAS programming.
Part 2.1 and 2.2 will be demonstrated together to your tutor during the last week of the semester. The demonstration schedule will be available on Udo Blackboard.
Assessment Criteria for part 2.1
Out of the 50% in part 2.1, coding and use of SAS will have 30% and data will have 20% weighting.
|
Percentage Band |
Coding and use of SAS |
Data |
|
90%-100% |
Relevant SAS Source code pre-written and stored in SAS system. Code modules chosen and run from a menu system. Data selection menu driven with complex options. Graphic data presentation, good choice of output styles which are menu driven. |
Uses pre-imported data in a saved master file using the PCT detail data and SHA ID data. Optimises SAS storage structures to the types of questions provided. Optimises storage space by use of surrogate keys to natural key tables. Optimises use of permanent and work file structures. |
|
80%-89% |
Relevant SAS Source code pre-written and stored in SAS system. Code modules chosen and run from a menu system. Data selection menu driven with complex options. Graphic data presentation, good choice of output styles. |
Uses pre-imported data in a saved master file using the PCT detail data and SHA ID data. Optimises SAS storage structures to the types of questions provided. Optimises storage space by use of surrogate keys to natural key tables. |
|
70%-79% |
Relevant SAS Source code pre-written and stored in SAS system. Code modules chosen and run from a menu system. Data selection code pre-written with complex options. Graphic data presentation, good choice of output styles. |
Uses pre-imported data in a saved master file using the PCT detail data and SHA ID data. Optimises storage space by use of surrogate keys to natural key tables |
|
60%-69% |
Relevant SAS Source code pre-written and stored in SAS system. Code modules chosen and run manually. Data selection code pre-written but with complex options. Graphic data presentation, good choice of output styles. |
Uses pre-imported data in a saved master file using the PCT detail data and SHA ID data. |
|
50%-59% |
Relevant SAS Source code pre-written and stored in SAS system. Code modules chosen and run manually. Data selection code pre-written. Simple Graphic data presentation, some choice of output styles. |
Uses pre-imported data in a saved master file using the PCT detail data and SHA ID data for several months |
|
40%-49% |
Relevant SAS Source code pre-written and copy / pasted into SAS system. Data selection code pre-written. Simple Graphic data presentation, no choice of output styles.
|
Uses pre-imported data in a saved master file using the PCT detail data for several months |
|
35%-39% |
Relevant SAS Source code pre-written and copy / pasted into SAS system. Data selection code written into programme editor window. |
Uses data for a single month imported at run time from XLS file using the PCT detail data |
|
0%-34% |
Little of merit |
Little of merit |
6
Assignment brief and tutorials/Assignment Brief and Example presentation/Example.pptx
Historic Mortality Data
By Antony Paul, Adam Langley, Rashidi Ramazani & Ghanem Nasser
Aims and Objectives
Analyse the data
Present the data (SAS/ODS Graphics)
Trends in the data
Identifying key stakeholders
Introduction
Importing the data
Results of the data
Creating tables
Potential stakeholders
Key stakeholders
NHS (Patient records, research, CQC)
Councils (Records, burial records, council tax)
Lawyers/Solicitors (Referrals, wills, attorneys)
Insurance companies (Premium & rates, trends)
Funeral directors/providers (Resource management, location)
Care Quality Commission
National Audit Office (Census)
World Health Organisation
How to import a dataset *Using codes ()
libname home 'U:\Historic Mortality data\Historic Mortality data';
run;
proc import out=home
datafile='U:\Historic Mortality data\Historic Mortality data\PDH7484.csv'
dbms=csv replace;
run;
proc print;
run;
Result
How to Import data without coding
Go file - select import data
How to import data without using Code
-Select a data source
-click next
How to import data without using Code
Select Browse.
Select the file you want to import
How to import data without using Code
-Choose your library.
-Add member name.
-Click next / then click finish.
Merging datasets
Result
Rename Variables
Proc Means Statement
Year from 1963 - 1984
Proc Tabulate Statement
Creating group chart
Create a small dataset
2. PROC CHART
3. RESULT
Comparing gender mortality for ages 10 to 14 years from 1901 to 1910 England and Wales.
FACT: There was a
decrease in death rates by over 1,000
Assignment brief and tutorials/week 1/Exercises(1).pdf
Exercises
Level 1
1. Create a Report in an ODS Destination
a. Open the starter file r101e01. The program is not syntactically correct, so do not
submit it until after you make changes. If you have not already done so, submit the
libname program that is in your working directory in order to ensure that the class
data files are available to you.
Starter Program
** Close the LISTING destination;
ods <destination> file='filename.ext' style=<style>;
proc tabulate data=OrionRPT.customer_orders f=comma8. ;
title 'Quantity Ordered by Country and Type';
title2 'Low Volume Orders';
class customer_country order_type;
var quantity;
table customer_country,
order_type*quantity*sum;
keylabel sum = ' ';
run;
ods <destination> close;
** Reopen the LISTING destination;
b. Use the demonstration program as a model and modify the program as follows:
1) Replace the <destination> placeholder with the ODS destination that
creates a
Printer Control Language result file.
2) Replace the <style> placeholder with the PRINTER style for PCL output.
3) Use a valid name for the FILE= option. The correct file extension for a Printer
Control Language file is .PCL. Name the file r101s01.pcl and store the
output file in your working directory.
4) Close the LISTING destination at the top of the program.
5) Reopen the LISTING destination at the bottom of the program.
6) Submit the program.
c. Double-click the icon in the SAS Results window. Is there an application available to
render
the PCL file? You only need to answer this question Yes or No. (If an application is
available, it opens automatically. If an application is not available, you may see a
warning message or be asked to choose a program to open the file.) The purpose of
this task is to illustrate the behavior that occurs with a less frequently used ODS
destination.)
d. Modify the program code to create an HTML file that uses the EGDEFAULT style.
Name
the output file Exercise1d.HTML. Submit the program.
e. When the job completes, double-click the HTML icon in the SAS Results window to
launch a browser window.
f. Open the HTML file with Excel. Does the HTML output look the same in Excel as it
looks in the browser?
Level 2
2. Creating Two Different HTML Files and a PDF File
a. Open and modify the r101e02 starter program to use ODS HTML, ODS CHTML,
and ODS PDF statements to create three different output files in one job step. If you
have not already done so, submit the libname program that is found in your
working directory to ensure that the class data files are available to you. Code
STYLE=SASWEB for both of the HTML destinations. You can use the default style
for the PDF output file.
b. Open each of the result files. These are the desired results:
HTML Desired Results
CHTML Desired Results:
Even though you code STYLE=SASWEB for the ODS CHTML destination,
CHTML (or Compact HTML) does not use a full SAS style template
definition. This is one of the differences between CHTML and HTML. For
more information about CHTML, consult the ODS MARKUP
documentation.
PDF Desired Results
Level 3
3. Suppressing ODS Results
This Level 3 exercise shows you a way to completely suppress the icons in the Results
window.
a. In an interactive SAS session, icons for your ODS result files are automatically
tracked in the Results window. As each file is created, it is automatically opened for
viewing. You can also manually open your results by clicking on the output file icon
in the SAS Results window. The ODS NORESULTS statement changes this default
behavior. Look up the ODS RESULTS (or ODS NORESULTS) statement in the
documentation.
b. What is the reason given in the documentation for using the ODS NORESULTS
statement?
c. Use the ODS NORESULTS statement in the starter program, r101e03. What is the
effect of running your code with this statement?
d. How can you find and open your output files when ODS NORESULTS is in effect?
Assignment brief and tutorials/week 1/Introduction to SAS Programming.pptx
Data Management and Business Intelligence 5CC519
www.derby.ac.uk/engtech
Introduction to SAS Programming
Dr John Panneerselvam
College of Engineering and Technology, University of Derby
Contents
Discussion of Data Analysis
Base SAS
Introduction to SAS Programming
Executing the SAS Program
Summary
www.derby.ac.uk/engtech
Discussion of Data Analysis
Base SAS
Introduction to SAS Programming
Executing the SAS Program
Summary
www.derby.ac.uk/engtech
www.derby.ac.uk/engtech
Data Mining
Machine Learning
KDD – Knowledge Discovery in Databases
www.derby.ac.uk/engtech
Data Pre-processing
Data Mining
Post-processing
Raw Data
Information
Discussion of Data Analysis
Base SAS
Introduction to SAS Programming
Running the SAS Program
Summary
www.derby.ac.uk/engtech
SAS programming
SAS is an integrated system of software solutions
that enables you to perform the following tasks:
Data entry, retrieval, and management
Report writing and graphics design
Statistical and mathematical analysis
Business forecasting and decision support
Operations research and project management
Applications development
www.derby.ac.uk/engtech
Base SAS Base SAS is a highly flexible and integrated software environment that can be used in virtually any setting to access, manipulate, manage, store, analyze, and report on data.
www.derby.ac.uk/engtech
Base SAS provides the following:
A graphical user interface for administering SAS tasks
A highly flexible and extensible programming language
A rich library of prewritten, ready-to-use SAS procedures
The flexibility to run on all major operating environments such as Windows, UNIX, and z/OS (OS/390)
The access to virtually any data source such as DB2, Oracle, SYBASE, Teradata, SAP, and Microsoft Excel the support for most widely used character encodings for globalization
www.derby.ac.uk/engtech
Discussion of Data Analysis
Base SAS
Introduction to SAS Programming
Executing the SAS Program
Discussions
www.derby.ac.uk/engtech
Introduction to SAS Programming
SAS environment
Overview of SAS program
Basic steps in writing a SAS program
www.derby.ac.uk/engtech
SAS Windowing Environment
12
In the SAS windowing environment, windows are used to edit and execute programming statements, display the log, output, and Help facility, and more.
www.derby.ac.uk/engtech
Three Primary Windows
13
In the SAS windowing environment, you submit and view the results of a SAS program using three primary windows.
| contains the SAS program to submit. | |
| contains information about the processing of the SAS program, including any warning and error messages. | |
| contains reports generated by the SAS program. |
www.derby.ac.uk/engtech
Editor Windows
| Enhanced Editor | Program Editor |
| Only available in the Windows operating environment | Available in all operating environments |
| Default editor for Windows operating environment | Default editor for all operating environments except Windows |
| Multiple instances of the editor can be open at one time | Only one instance of the editor can be open at one time |
| Code does not disappear after it is submitted | Code disappears after it is submitted |
| Incorporates color-coding as you type | Incorporates color-coding after you press ENTER |
14
www.derby.ac.uk/engtech
Editor Windows
Enhanced Editor
Program Editor
www.derby.ac.uk/engtech
Log Window
Partial SAS Log
16
33 data work.NewSalesEmps;
34 length First_Name $ 12 Last_Name $ 18
35 Job_Title $ 25;
36 infile 'newemps.csv' dlm=',';
37 input First_Name $ Last_Name $
38 Job_Title $ Salary;
39 run;
NOTE: The infile 'newemps.csv' is:
File Name=S:\Workshop\newemps.csv,
RECFM=V,LRECL=256
NOTE: 71 records were read from the infile 'newemps.csv'.
The minimum record length was 28.
The maximum record length was 47.
NOTE: The data set WORK.NEWSALESEMPS has 71 observations and 4 variables.
40
proc print data=work.NewSalesEmps;
NOTE: Writing HTML Body file: sashtml.htm
42 run;
NOTE: There were 71 observations read from the data set WORK.NEWSALESEMPS.
www.derby.ac.uk/engtech
Results Viewer Window
In SAS®9.3 output is displayed using HTML by default.
www.derby.ac.uk/engtech
Output Window
18
Obs First_Name Last_Name Job_Title Salary
1 Satyakam Denny Sales Rep. II 26780
2 Monica Kletschkus Sales Rep. IV 30890
3 Kevin Lyon Sales Rep. I 26955
4 Petrea Soltau Sales Rep. II 27440
5 Marina Iyengar Sales Rep. III 29715
6 Shani Duckett Sales Rep. I 25795
7 Fang Wilson Sales Rep. II 26810
8 Michael Minas Sales Rep. I 26970
9 Amanda Liebman Sales Rep. II 27465
10 Vincent Eastley Sales Rep. III 29695
11 Viney Barbis Sales Rep. III 30265
12 Skev Rusli Sales Rep. II 26580
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445
15 Leonid Karavdic Sales Rep. II 27860
www.derby.ac.uk/engtech
Sas
SAS Program
19
DATA steps are typically used to create SAS data sets.
PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, manage data, and sort data).
A SAS program is a sequence of steps that the user submits for execution.
Raw Data
DATA Step
Report
SAS Data Set
PROC Step
SAS Data Set
www.derby.ac.uk/engtech
Program statements
20
SAS steps begin with either of the following:
a DATA statement
a PROC statement
SAS detects the end of a step when it encounters one of the following:
a RUN statement (for most steps)
a QUIT statement (for some procedures)
the beginning of another step (DATA statement or PROC statement)
www.derby.ac.uk/engtech
This DATA step creates a temporary SAS data set named Work.NewSalesEmps by reading four fields from a raw data file.
The DATA step
21
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps;
run;
www.derby.ac.uk/engtech
This PROC PRINT step creates a listing report of the Work.NewSalesEmps data set.
The PROC step
22
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps;
run;
www.derby.ac.uk/engtech
The PROC step
This PROC MEANS step creates a summary report of the Work.NewSalesEmps data set with statistics for the variable Salary for each value of Job_Title.
23
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps;
run;
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
run;
www.derby.ac.uk/engtech
Variables and Observations
www.derby.ac.uk/engtech
Obs First_Name Last_Name Job_Title Salary
1 Satyakam Denny Sales Rep. II 26780
2 Monica Kletschkus Sales Rep. IV 30890
3 Kevin Lyon Sales Rep. I 26955
4 Petrea Soltau Sales Rep. II 27440
5 Marina Iyengar Sales Rep. III 29715
6 Shani Duckett Sales Rep. I 25795
7 Fang Wilson Sales Rep. II 26810
8 Michael Minas Sales Rep. I 26970
9 Amanda Liebman Sales Rep. II 27465
10 Vincent Eastley Sales Rep. III 29695
11 Viney Barbis Sales Rep. III 30265
12 Skev Rusli Sales Rep. II 26580
13 Narelle James Sales Rep. III 29990
14 Gerry Snellings Sales Rep. I 26445
15 Leonid Karavdic Sales Rep. II 27860
Step Boundaries
SAS detects the end of the DATA step when it encounters the RUN statement.
SAS detects the end of the PROC PRINT step when it encounters the beginning of the PROC MEANS step.
25
data work.NewSalesEmps;
length First_Name $ 12
Last_Name $ 18 Job_Title $ 25;
infile 'newemps.csv' dlm=',';
input First_Name $ Last_Name $
Job_Title $ Salary;
run;
proc print data=work.NewSalesEmps;
proc means data=work.NewSalesEmps;
class Job_Title;
var Salary;
www.derby.ac.uk/engtech
Discussion of Data Analysis
Base SAS
Introduction to SAS Programming
Executing the SAS Program
Summary
www.derby.ac.uk/engtech
Program Syntax
The syntax in the program is colour coded to
highlight the following;
Step boundaries.
Keywords
Variables
Data set Names
www.derby.ac.uk/engtech
SAS program execution
Include/write the SAS program in the editor window.
To open a SAS program into your SAS session, select File Open or click and then select the file that you want to include. To open a program, your Program Editor must be active.
You can also issue the INCLUDE command to open (include) a SAS program into your SAS session.
With the Program Editor active, on the command bar type include and the name of the file containing the program.
Press ENTER.
The program is included in the Program Editor window.
www.derby.ac.uk/engtech
Program Execution (Cont..)
Submit the program.
To submit the program for execution, issue the SUBMIT command, click , or select Run Submit. The output from the program is displayed in the Output window.
Log window shows any warnings/errors of the program in the editor window.
The reports will be generated in the output window.
To clear the contents of the window, issue the CLEAR command, select Edit Clear All, or click .
www.derby.ac.uk/engtech
SAS Libraries
Work – temporary and default library
Create your own library and link it to the working folder
SAS datasets will be stored in the physical folder in reference to your library
Naming a SAS dataset
Derby.Name
www.derby.ac.uk/engtech
libref
Dataset
Name
The SAS help facility
31
www.derby.ac.uk/engtech
References
Tan, Steinbach, Kumar, Introduction to Data Mining, Pearson, 2014, ISBN: 10: 1-292-02615-4
SAS Programming 1: Essentials- SAS Institute.
SAS: The Power to Know- Official documentation.
www.derby.ac.uk/engtech
Assignment brief and tutorials/week 1/Lecture 1 - Introduction to ODS.pptx
Data Management and Business Analytics 5CC519
www.derby.ac.uk/engtech
Lecture 1 – Introduction to Output Delivery System
College of Engineering and Technology, University of Derby
Assignment brief and tutorials/week 1/Module Introduction_DBBI.pptx
Data Management and Business Intelligence 5CC519 Course Introduction
www.derby.ac.uk/engtech
www.derby.ac.uk/engtech
About the lecturer
College of Engineering and Technology
Dr. John Panneerselvam
Email: [email protected])
Lives in MS310 STEM Building, Markeaton Street
Telephone: 01332 592404
This module runs every Friday 2 pm to 6 pm, in MS215, Markeaton street
www.derby.ac.uk/engtech
Module Outcomes
Identify and critically evaluate the key requirements for effective data management, analysis and presentation to meet users’ needs
Develop a small management information reporting system that meets the users’ needs using a prototyping / Agile approach.
Proactive
And reactive
3
www.derby.ac.uk/engtech
Module Structure
Scheduled learning and teaching activities: 28%
Guided independent study: 76%
Lab based Workshops 24 hours
Lab based tutorials 24 hours
Guided Independent Study 152 hours
Total: 200 hours
Proactive
And reactive
4
www.derby.ac.uk/engtech
What we will cover
The principles of data analysis and normalisation as the foundation of the “store once and once only” principle for data integrity
The principles of data administration and security management
The requirements for optimisation of data management for the conflicting needs of real-time and MIS / EIS and BI purposes
The impact of User-centred analysis and participatory development on the development data structures and stores and systems that meet the needs of the organisation
The role, application and benefits of prototyping in the rapid development of systems, using RAD and Agile approaches
Techniques and practices for effective data analysis and presentation
The impact of cloud computing and storage on the management of and access to organisational data.
Together with a range of current, leading edge topics and technologies that affect data management and usage.
Proactive
And reactive
5
www.derby.ac.uk/engtech
Assessment
100% coursework - Portfolio
The module will be assessed by a single portfolio consisting of an individual application project supported by a short, business report which academically justifies the design decisions that underpin the finished application system, which will be built in the SAS environment.
Proactive
And reactive
6
www.derby.ac.uk/engtech
Assessment
You will be given opportunities, through both formative and
summative assessments, to reflect on your learning to
support professional and personal development
– Formative Assessment - the bidirectional process between
teacher and student to enhance, recognize and respond to the
learning
– Summative Assessment - summarizes the development of
learners at a particular time
Proactive
And reactive
7
www.derby.ac.uk/engtech
Formative Assessment
– In-class group presentation
– A Zip file containing your answers and codes of all your
exercises. (Each week you have been given a set of exercises
to do that tested your understanding of the material covered
that week.)
• You will not get extra marks by the formative assessment
• You WILL lose marks (individually) by NOT completing the
assessment
Proactive
And reactive
8
www.derby.ac.uk/engtech
Summative Assessment
CW: 100%
The assignment represents the coursework element of the module.
Assessment will be by means of a portfolio of learning artefacts demonstrating the achievement of all learning outcomes
The assignment should be carried out independently. That is, no collaboration is allowed.
An electronic copy should be handed-in via Udo.
Proactive
And reactive
9
www.derby.ac.uk/engtech
10
Assignment brief and tutorials/week 2/Lecture 2_Summary Reports with TABULATE Procedure.pptx
Data Management and Business Intelligence 5CC519
www.derby.ac.uk/engtech
Lecture 2 – Getting started with Proc TABULATE
Dr John Panneerselvam
College of Engineering and Technology, University of Derby
Chapter 2: Getting Started with the TABULATE Procedure
| 2.1 Introduction to the TABULATE Procedure |
| 2.2 Creating the Structure of the Table |
| 2.3 Adding Overall Summarizations |
| 2.4 Adding Additional Statistics |
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
2
Summary Reports
r102d01
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
3
TABULATE Procedure
The TABULATE procedure displays descriptive statistics in tabular format.
4
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
4
Dimensional Tables
The structure of the tables produced with the TABULATE procedure is one-, two-, or three-dimensional.
One variable or multiple variables can be in a dimension.
5
| Page dimension | Row dimension | Column dimension | |
| One-dimensional | | ||
| Two-dimensional | | | |
| Three-dimensional | | | |
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
5
One-Dimensional Table
A one-dimensional table contains a column dimension.
Country is in the column dimension.
Gender is in the column dimension.
6
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
6
Two-Dimensional Table
A two-dimensional table contains a row and a column dimension.
7
Gender is in the row dimension and Country is in the column dimension.
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
7
Three-Dimensional Table
A three-dimensional table contains a page, a row, and a column dimension.
Job_Title is in the page dimension, Gender is in the row dimension, and Country is in the column dimension.
8
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
8
2.01 Quiz
Specify the variables that appear in each dimension.
| Page dimension | Row dimension | Column dimension |
Is this table
one-dimensional,
two-dimensional, or three-dimensional?
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
9
2.01 Quiz – Correct Answer
Specify the variables that appear in each dimension.
| Page dimension | Row dimension | Column dimension |
| Gender | Job_Title | Country |
Is this table
one-dimensional,
two-dimensional, or three-dimensional?
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
10
Concatenating Variables
Concatenating variables places the output for one variable immediately next to the output for another variable in the same dimension.
In the following examples, Gender and Country are concatenated in either the
column or row dimension.
11
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
11
Crossing Variables
Crossing variables places the value of one variable within the value of another variable.
In the following examples, Country is crossed within Gender in either the column or row dimension.
12
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
12
Classification and Analysis Variables
Every variable specified in a dimension must be declared as a classification variable or an analysis variable.
13
| Classification Variable | Specifies which variable(s) are used to group the data. Can be character or numeric. N, the number of nonmissing values, is the default statistic. |
| Analysis Variable | Identifies the variable(s) for which statistics are calculated. Must be numeric. Tends to be continuous. SUM is the default statistic. |
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
13
Classification and Analysis Variables
For the following data:
Job_Title, Gender, and Country are reasonable classification variables.
Salary and Bonus are likely analysis variables.
14
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
14
Chapter 2: Getting Started with the TABULATE Procedure
| 2.1 Introduction to the TABULATE Procedure |
| 2.2 Creating the Structure of the Table |
| 2.3 Adding Overall Summarizations |
| 2.4 Adding Additional Statistics |
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
15
Objectives
Use the TABLE statement with commas to create one-, two-, and three-dimensional tables.
Use the CLASS statement to define classification variables.
Concatenate variables by using the blank in the TABLE statement.
Cross variables by using the asterisk in the TABLE statement.
Use the VAR statement to define analysis variables.
16
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
16
TABULATE Procedure
General form of the TABULATE procedure:
The TABLE statement describes the table structure.
Every variable that is specified in the TABLE statement must be defined as a classification variable (CLASS statement) or an analysis variable (VAR statement).
17
PROC TABULATE DATA=SAS-data-set <options>;
CLASS classification-variable(s);
VAR analysis-variable(s);
TABLE <page-expression>,
<row-expression>,
column-expression </ option(s)>;
RUN;
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
17
TABLE Statement
In the TABLE statement, the comma is used to separate the dimension expressions.
A one-dimensional table has a column expression only (no commas).
A two-dimensional table has a row expression and a column expression (one comma).
A three-dimensional table has a page, a row, and a column expression (two commas).
18
TABLE , , ;
row- expression
page- expression
column- expression
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
18
TABLE Statement
Examples:
19
TABLE , , ;
table Gender , Country;
table Country;
table Job_Title , Gender , Country;
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
19
One-Dimensional Table
A one-dimensional table contains a column dimension.
20
proc tabulate data=orion.sales;
class Country;
table Country;
run;
proc tabulate data=orion.sales;
class Gender;
table Gender;
run;
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
20
One-Dimensional Table
Multiple TABLE statements can be in one step.
21
proc tabulate data=orion.sales;
class Country Gender;
table Country;
table Gender;
run;
N is the default statistic when there is only a CLASS statement and no VAR statement.
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
21
Two-Dimensional Table
A two-dimensional table contains a row and a column dimension.
22
proc tabulate data=orion.sales;
class Country Gender;
table Gender, Country;
run;
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
22
Three-Dimensional Table
A three-dimensional table contains a page, a row, and a column dimension.
23
r102d02
proc tabulate data=orion.sales;
class Country Gender Job_Title;
table Job_Title, Gender, Country;
run;
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
23
Concatenating Variables
Concatenating variables is accomplished by specifying a blank between variables in one dimension.
24
proc tabulate data=orion.sales;
class Country Gender Job_Title;
table Job_Title, Gender Country;
run;
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
24
Crossing Variables
Crossing variables is accomplished by specifying an asterisk (*) between variables in one dimension.
25
proc tabulate data=orion.sales;
class Country Gender Job_Title;
table Gender*Country, Job_Title;
run;
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
25
Analysis Variable
SUM is the default statistic for an analysis variable that is used in the TABLE statement.
26
proc tabulate data=orion.sales;
class Country Gender Job_Title;
var Salary;
table Job_Title, Country, Gender*Salary;
run;
Analysis variables are defined in the VAR statement.
r102d02
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
26
Analysis Variable
If you forget to define a variable as an analysis variable, you get an error.
The same type of error exists if you forget to define a variable as a classification variable.
27
880 proc tabulate data=orion.sales;
881 where Job_Title contains 'Rep';
882 class Country Gender Job_Title;
883 table Job_Title, Country, Gender*Salary;
884 run;
ERROR: The type of name (Salary) is unknown.
NOTE: The SAS System stopped processing this step because of
errors.
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
27
Chapter 2: Getting Started with the TABULATE Procedure
| 2.1 Introduction to the TABULATE Procedure |
| 2.2 Creating the Structure of the Table |
| 2.3 Adding Overall Summarizations |
| 2.4 Adding Additional Statistics |
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
28
Objectives
Use the keyword ALL to create overall summarizations.
Include the keyword ALL with concatenated and crossed variables.
Use parentheses with the keyword ALL to summarize within a parenthetical group.
Cross an analysis variable with the keyword ALL.
29
‹#›
Copyright © 2012, SAS Institute Inc. All rights reserved.
29
Business Scenario
Orion Star Country managers want overall summarizations added to the tabular reports about their sales employees.