Analysis

tututt
AuditAnalysisAssignment.docx

Student Assignment – Audits Data Analysis

Overview and Rationale

Being able to ask appropriate questions of data is an important part of the work of data analytics. It is also critical to be able to interpret the results of the analysis. This assignment is intended to familiarize you with the Audits data set and to get you thinking about key business questions you can answer from this data.

Course Outcomes

This assignment is directly linked to the following learning outcomes from the course syllabus:

· Visualize data in a compelling way to enable data driven storytelling.

· Conduct basic analytic tasks programmatically using the R language*

· R is not required for this assignment, but you may choose to use it if you feel confident in your R skills

Assignment Summary

There are two Excel sheets with data – one concerning "Good Working Practice” audits (GWP_Audits_Data) and one concerning “Computerized System Quality Assurance” audits (CSQA_Audits_Data). Please see the accompanying Data Dictionary to understand the fields and values.

You may use any software to perform the analyses specified below. Collaboration is encouraged, but you must not submit identical assignments.

The assignment has three parts. In the Appendix of this assignment, you are provided an example of how the questions in Part I should be answered.

Part I

Please review the Vertex Data Dictionary document as you review the Excel datasheets.

In order to understand the data we first need to run some descriptives on the data set. For both the GxP Audits and the CSQA Audits sheets, we want to look at the following variables:

· Audit Status

· In USA or OUS

· GxP Area

· Audit Type

· Audit Method

· Proposed Quarter

Start by providing the following for each variable:

1. A table that provides the frequency and percent of each value

2. A graphic representation of the count of each value

3. A graphic representation of the percent of each value

4. What business question do your descriptive analyses answer? Provide a brief discussion of the findings. If there are any unusual values, discuss them. Are these values “out of range”? If so, the data cleaning is not complete. Delete the out of range values and run the analysis again. If this is the case for any of the variables, present both the analysis with the out of range values and the analysis with the deleted out of range value.

Please first present your findings for the 2017 GxP Audits data and then the findings for the 2017 CSQA Audits data.

Note: Appendix 1 is only an example and you must complete your own analysis.

Part II

For each worksheet, compute the number of days lapsed between:

1. “Date of Intake” and “Date Q Sent”. Name that variable “Days_Intake_QSent”

2. “Date Q Sent” and “Date Q Received”. Name that variable “Days_QSent_QReceived. Based on the name of the variables, what do you think that variable means? Does it apply to all audits? Why?

3. “Date On Site Scheduled” and “Audit Start Date”. Name that variable “Days_OnSiteScheduled_AuditStartDate. Does this variable apply to all audits? Why?

4. “Audit Start Date and “Audit End Date”. Name that variable “Days_StartDate_EndDate”.

5. “Audit End Date” and “Date Final Report Due”. Name that variable “Days_AuditEnd_FinalReportDue”

6. “Date Final Report Due” and “Date of Completion”. Name that variable “Days_FinalReportDue_CompletionDate”

Then, compute the mean and median for each of the 6 variables you have computed.

Part III

Would you recommend merging the sheets “2017 GxP Audits” and “2017 CSQA” Audits? Why or why not?

Page 5 of 5

Rubrics

Category

Exceeds Standards

Meets Standards

Approaching Standards

Below Standards

Descriptives ALY6000-CO5

Accurately visualizes data in an innovative and compelling way that tells a story.

Accurately and creatively visualizes data in a useful way that tells a story.

Visualizes data in a useful way but visualizations may not be creative, or may have errors.

Data visualizations are actuate, but not visually appealing and hinder the story being told.

Data Analysis

ALY6000-CO1

Correctly calculates all values based on the data and highlights each mean and median value.

Correctly calculates all values based on the data, including the mean and median values

Calculates values, but not all are correct or does not include mean and median values

Does not calculate all required values and does not include mean and median values

Recommendations

ALY6000-CO4

Makes compelling recommendations supported by coherent and valid reasoning.

Makes appropriate recommendations supported by logical reasons

Makes good recommendations, but reasoning is obscure or not valid

Recommendations do not make sense given the reasons provided or recommendations are not supported by reasons.

Writing and Format

Assignment follows normal conventions of grammar and spelling and appropriate conventions of style and format that engages the reader

Assignment work follows normal conventions of grammar and spelling and has been carefully proofread. Appropriate conventions of style and format are used consistently.

Minimal errors in spelling, and grammar, and/or other writing conventions. Some transitions are choppy but not difficult to follow.

Frequent errors in spelling, grammar, and/or other writing conventions that distract the reader. Transitions are choppy and difficult to follow. Limited connection to the topic.

Appendix

Assignment Part I Section a Example

Business Question:

What is the distribution of the status of the 2017 GxP Audits?

Analysis:

Descriptives Table

Audit Status

Frequency

Percent

Valid Percent

Valid

Closed

19

19.8

19.8

Completed

4

4.2

4.2

In Progress

18

18.8

18.8

Scheduled

11

11.5

11.5

Pending

14

14.6

14.6

Not In Scope

26

27.1

27.1

Cancelled

4

4.2

4.2

Total

96

100.0

100.0

Audit Status Count

Audit Status Percentages

Discussion:

The data file includes information on 96 audits in 2017 for GxP areas. It is unclear if the data file includes all the known GxP audits in 2017 or if it only includes a subset.

A large percentage of all GxP Audits (27.1%) are not in scope.

19.8% of audits are closed and 4.2% of audits are completed. It is unclear what the difference between “closed” and “completed” audits is. We should perhaps ask the client. Do we really need two distinct values?

18.8% of the audits are in progress, 11.5% are scheduled and 14.6% are pending. For the pending audits, the dates of the audit process have not been established.

4.2% of the audits were canceled. It may be interesting to have a notes field where the reasons for cancelation are noted.