report
BUS105
Business Information
Systems
Workshop Week 7
Structured Data Management
(Introductory Analytics) Life Cycle
Workshop (Excel)22
Copyright Notice
COPYRIGHT COMMONWEALTH OF AUSTRALIA
Copyright Regulations 1969 WARNING
This material has been reproduced and communicated to you by or on behalf of Kaplan Higher
Education pursuant to Part VB of the Copyright Act 1968 (the Act). The material in
this communication may be subject to copyright under the Act. Any further reproduction
or communication of this material by you may be the subject of copyright protection under the Act.
Do not remove this notice
2
Lesson Learning Outcomes
1 Learn about the data analytics project
lifecycle
2 Do a hands-on exercise in excel with
reference to LO1
3 Interpret results as required
Excel Workshop Week 7 Vehicle Cost Analysis
Commons.wikipedia.org
Business Question: How much does it cost
to run a bus service?
Intechen.com
Today’s Tasks
• Please download today’s data file now
BUS105_ProximityBus_for_week_7.xlsx
• You will be doing a hands-on Microsoft Excel cost analysis
exercise in order to answer the business question:
How much does it cost to run a bus service?
• General Excel instructions will be followed by your specific
instructions.
• At the same time we will be learning about the data
analytics lifecycle and referring to it every now and then.
Data Analytics Lifecycle
Business Understanding
Data Understanding
Data Preparation
Data Modelling
Evaluation
Deployment
Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of machine learning for
predictive analytics, The MIT Press, Cambridge Massachusetts, p12-15.
Data
Kelleher, JD, MacNamee, B & D’Arcy A 2015, Fundamentals of machine learning for
predictive analytics, The MIT Press, Cambridge Massachusetts, p12-15.
Stage 1: Business Understanding
This is stage 1 of the data analytics lifecycle.
Some questions you should answer during this stage:
• What are your objectives/aims?
e.g. Is our bus company making a profit?
• What resources do you need to start the project?
e.g. Do we need an analyst? What software do we need?
• What are your business success criteria?
e.g. How can we maintain a bus good service and keep
costs below a certain level?
• In this workshop we will work with vehicle mileage and
cost data, draw charts and perform cost calculations
using excel in-built functions.
Opening the Excel Data File
Double click on the
BUS105_ProximityB
us_for_week_7.xlsx
file icon to open the
file in Excel.
Data Understanding
• Questions to ask at this stage:
• What data have you got and is it complete?
e.g. Bus ID, cost per km, km driven...
• What was the source? e.g. Maintenance department
• What other data would be useful,
• e.g. Bus ticket prices, number of passengers per day, …..
• Do you have a description of the data
e.g. (Data dictionary or encyclopedia)
This Photo by Unknown Author is
licensed under CC BY
Instructions
Clicking on a cell
makes it active
• Use the mouse
OR
• Use the arrow keys
to move around
How to Select a Cell
Cell is active when a heavy
border surrounds it.
© 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except
for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected
website for classroom use.
Ischool.utexus.edu
Cell A1
Instructions To enter worksheet titles, numbers or text
– Open the file in Microsoft Excel
– Click on a cell to make the cell active
– Type desired text
– Click the ENTER button to complete the entry
– Move to the next cell of interest and repeat
Additional information:
(To cut and paste, use Ctrl C and Ctrl V as in Word)
Your instructions on next page…
How to Enter Items
© 2017 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part,
except for use as permitted in a license distributed with a certain product or service or otherwise on a password-
protected website for classroom use.
Now Enter Text
You will notice that some information is missing.
Your Instructions
• Click on cell B3 & enter “Cost per Km”
• Move to cell C6 and enter the missing value
14949.00
• Move to cell C7 and enter the missing value
14905.00
• Move to cell E3 and replace “M cost” with
“Mileage Cost”
Your File Should Look Like This
Are the first four columns complete?
Formulae With Simple Operators
Instructions
Using simple operators and relative cell reference
• Recall all formulae start with an = sign
• Simple operators for addition, subtraction, multiplication,
division and nth power are +, -, *, / and ^n where n is the
power, e.g. 5 squared is =5^2
• If we drag the cursor along, the cell addresses are changed
relative to position. This is called relative cell referencing.
Your Instructions
• Obtain mileage cost: Go to cell E4 in the Mileage column
of your worksheet, type “= B4*C4” ENTER
Fill Handle
Instructions :
How to copy a cell calculation to adjacent cells in a col/row
• With the cell containing the contents (e.g. E4), to fill down
the column, point to the fill handle to activate it (i.e. click
on the lower right hand corner of the active cell and a plus
sign should appear “+”)
• Hold on to the corner and drag the handle down the
column as required (i.e. to cell E12)
Your Instructions
• Copy the formula (using the fill handle) down the rest of
column E to cell E12
Mileage
Costs
Your File Should Look Like This
Using Simple Formulae Cont.
Your Instructions
• Obtain total cost: Go to cell F4 and type in “=D4 + E4”
ENTER
• Copy the formula (by dragging the cursor) down the rest
of column F to cell F12
• Obtain total cost per Km: Go to cell G4 and type in
“=F4/C4” ENTER
• Copy the formula (using the fill handle) down the rest of
column G to cell G12
Your File Should Look Like This
Finding Totals Using ‘SUM’
Instructions
To sum a column of numbers
– Click the first empty cell below the column of numbers to sum
– Click the AutoSum button on the HOME tab to display a formula
in the formula bar and in the active cell, for example
=SUM(B4:B12)
Your Instructions
• Highlight G4 to G12 and click on the decrease decimal places button
in the Number menu. Reduce the decimals to 2 places.
• Sum the columns using the sum formula in the “totals” row (row 13)
Instructions
To sum a column of numbers
– Click the first empty cell below the column of numbers to sum
– Click the AutoSum button on the HOME tab to display a formula in
the formula bar and in the active cell, for example, =SUM(B4:B12)
Your Instructions
• Find column totals using the SUM formula in the “Totals” row (row 13)
• Highlight G4 to G12 and click on the decrease decimal places button
in the Number menu. Reduce the decimals to 1 place.
Finding Totals Using ‘SUM’
Your File Should Look Like This
Data Preparation
• We have carried out a small amount of data preparation.
Some of the questions you should answer during this
stage:
• Have you considered your data storage and
maintenance capacity?
e.g. Do you need new software, cloud warehousing or
just a PC?
• Do you need to transform (data wrangling) or integrate
the data in any way?
e.g. Finding total cost, reducing numbers to one decimal
Stage 4: Modelling
Questions for the modelling phase:
• What models will you use?
e.g. Descriptive, predictive analytics or AI techniques
• How will you train/test and assess the models?
e.g. You will need a training data set if you are going to
use machine learning
Let’s look at some basic summary statistics, average, max
and min.
https://www.sv-europe.com/crisp-dm-methodology//
Absolute Versus Relative
Addressing
Table 3-6 Examples of Absolute, Relative, and Mixed Cell References
Cell Reference Type of Reference Meaning
$B$4 Absolute cell reference Both column and row references remain the same
when you copy this cell, because the cell references
are absolute
B4 Relative cell reference Both column and row references are relative. When
copied to another cell, both the column and row in the
cell reference are adjusted to reflect the new location
B$4 Mixed reference This cell reference is mixed. The column reference
changes when you copy this cell to another column
because it is relative. The row reference does not
change because it is absolute
$B4 Mixed reference This cell reference is mixed. The column reference
does not change because it is absolute. The row
reference changes when you copy this cell reference to
another row because it is relative
Absolute Versus Relative Address
Instructions
To enter a formula containing absolute cell references
– Given a selected cell, enter the formula and then press the F4 key
to change the most recently typed cell reference from a relative cell
reference to an absolute cell reference
Your Instructions
– Go to cell A17 in your spreadsheet and type in “9”
– Calculate the average using each total divided by 9 using absolute
referencing: Go to cell B14 and type in “=B13/$A$17”
– Apply this to cells C14 to G14 using the fill handle and adjust the
results to 2 decimal places
Find Max and Min
Instructions
To find the maximum or minimum of a range of cells type in
=max(start cell:end cell) for maximum
=min(start cell:end cell) for minimum
Your Instructions
• Fill in the “Highest” and “Lowest” column values in row 15
and 16, using =max(B4:B12) and =min(B4:B12)
• If required, change all values so that 2 decimal places are
displayed
Your File Should Look Like This
Stage 5: Evaluation
Questions regarding the evaluation and deployment
phases:
• How will you assess the results in terms of business
success criteria?
e.g. How are these results going to help the bus company?
• Have you reviewed all the modelling so far?
e.g. What other preliminary models we can learn from?
See evaluation activity on the next page
Activity 1:Evaluation
• Answer the following questions:
1. Which bus costs the most to run per km?
2. Which bus has (lowest mileage) driven the
least number of kilometres?
3. What is the lowest maintenance cost?
Stage 5: Deployment
Questions regarding the deployment phases:
• Next steps? Do you need to gather more data, carry out
another data mining project, or start deployment?
e.g. let’s try filtering and sorting values of interest
(see next page)
• How will you implement your findings?
e.g. find a way to reduce the cost of bus 701
Filtering Instructions
Filtering
• The editing menu has sort and filter commands
• To filter items based on a particular column: click on the column to be filtered
• Move your mouse to the editing menu and select filter a small box with an arrow will appear at the top of the column
• Clicking on the arrow reveal the items in the list
• Unticking individual boxes hides (filters out those items) and the filter box changes shape
• This command is good for removing BLANKS in data sets
• To display (unfilter) the list click on “select all” in the list of filter boxes, and your original data should be displayed
Filtering
We want to filter out the costs per Km less than
1.80
Your Instructions
1. Click on the top of column B of your spreadsheet
2. Take the cursor to the editing menu and select filter
3. Click on the filter box in column B to reveal the details
of data in column B
4. Untick the boxes with values lower than 1.80 to hide
them
Notice that the row
with the minimum
is now hidden too
Filtered column
Your File Should Look Like This
We want to sort the mileage costs while keeping the other
row information consistent with those costs
Your Instructions
1. First unfilter column B by ticking the “select all” box in the
filter options
2. Copy just the values of data block from cell A3 to G12 to
Sheet 1 by highlighting the data and using control C
3. Click on cell A1 in sheet 1, right click on your mouse and
select paste special, click on the values option and OK
4. Select column E, go to the sort menu and select “sort
smallest to largest”
The “Expand selection” box will appear, make sure the
expand selection option is checked and then press SORT
Sorting
Sorting
Your Worksheet Sheet 1 should look like this
Total Cost per bus as a Percentage of
the Entire Total Cost
B701 15%
B702 12%
B703 11%
B704 11%
B705 7%
B706 10%
B707 11%
B708 11%
B709 12%
TOTAL COST PER BUS AS A PERCENTAGE OF SUM TOTAL COST
An alternative representation of total costs
Activity 2: Interpretation
Answer these questions:
1. Which bus has the greatest mileage cost?
2. What is the maintenance cost of the bus of
interest in question 1?
3. Is it easier to interpret the table or pie chart?
Why?