Discussion 15 - 205
Chapter 3
Data Visualization
1
Vb
Bm
Mbm
Introduction
Data visualization involves:
Creating a summary table for the data.
Generating charts to help interpret, analyze, and learn from the data.
Uses of data visualization:
Helpful for identifying data errors.
Reduces the size of your data set by highlighting important relationships and trends in the data.
2
Vb
Bm
Mbm
2
Overview of Data Visualization
3
Vb
Bm
Mbm
Overview of Data Visualization
Data-ink ratio: Measures the proportion of what Tufte terms “data-ink” to the total amount of ink used in a table or chart.
Edward R. Tufte first described the data-ink ratio.
Helpful for creating effective tables and charts for data visualization.
Data-ink - Ink used in a table or chart that is necessary to convey the meaning of the data to the audience.
Non-data-ink - Ink used in a table or chart that serves no useful purpose in conveying the data to the audience.
4
Vb
Bm
Mbm
Table 3.1 - Example of a Low Data-Ink Ratio Table and Figure 3.3 - Example of a Low Data-Ink Ratio Chart
5
Vb
Bm
Mbm
Let us consider the case of Gossamer Industries, a firm that produces fine silk clothing products.
Gossamer is interested in tracking the sales of one of its most popular items, a particular style of women’s scarf.
Table 3.1 and Figure 3.3 provide examples of a table and chart with low data-ink ratios used to display sales of this style of women’s scarf.
The data used in this table and figure represent product sales by day.
5
Table 3.2 - Increasing the Data-Ink Ratio by Removing Unnecessary Gridlines and Figure 3.4 - Increasing the Data-Ink Ratio by Adding Labels to Axes and Removing Unnecessary Lines and Labels
6
Vb
Bm
Mbm
Table 3.2 shows a modified table in which all grid lines have been deleted except for those around the title of the table.
Deleting the grid lines in Table 3.1 increases the data-ink ratio because a larger proportion of the ink used in the table is used to convey the information
(the actual numbers).
Similarly, deleting the unnecessary horizontal lines in Figure 3.4 increases the data-ink ratio.
6
Tables
7
Vb
Bm
Mbm
7
Tables
Tables should be used when:
1. The reader needs to refer to specific numerical values.
2. The reader needs to make precise comparisons between different values and not just relative comparisons.
3. The values being displayed have different units or very different magnitudes.
8
Vb
Bm
Mbm
8
Tables
9
Table 3.3 - Table showing Exact Values for Costs and Revenues by Month for Gossamer Industries
Figure 3.5 - Line Chart of Monthly Costs and Revenues at Gossamer Industries
Vb
Bm
Mbm
Consider when the accounting department of Gossamer Industries is summarizing the company’s annual data for completion of its federal tax forms.
In this case, the specific numbers corresponding to revenues and expenses are important and not just the relative values.
Therefore, these data should be presented in a table similar to Table 3.3.
Similarly, if it is important to know exactly by how much revenues exceed expenses each month, then this would also be better presented as a table rather than as a line chart, as seen in Figure 3.5.
9
Figure 3.6 - Combined Line Chart and Table for Monthly Costs and Revenues at Gossamer Industries
10
Vb
Bm
Mbm
Figure 3.6 allows the reader to easily see the monthly changes in revenues and costs while also being able to refer to the exact numerical values.
10
Table 3.4 - Table displaying Headcount, Costs, and Revenues at Gossamer Industries
11
Vb
Bm
Mbm
Costs and revenues are measured in dollars ($), but head count is measured in number of employees.
Although all these values can be displayed on a line chart using multiple vertical axes, this is generally not recommended.
Because the values have widely different magnitudes (costs and revenues are in the tens of thousands, whereas headcount is approximately 10 each month), it would be difficult to interpret changes on a single chart.
Therefore, a table similar to Table 3.4 is recommended.
11
Tables
Table Design Principles:
Avoid using vertical lines in a table unless they are necessary for clarity.
Horizontal lines are generally necessary only for separating column titles from data values or when indicating that a calculation has taken place.
12
Vb
Bm
Mbm
12
Figure 3.7 - Comparing different Table Designs
13
Vb
Bm
Mbm
Figure 3.7 compares several forms of a table displaying Gossamer’s costs and revenue data.
Most people find Design D, with the fewest grid lines, easiest to read.
In this table, grid lines are used only to separate the column headings from the data and to indicate that a calculation has occurred to generate the Profits row and the Total column.
13
Table 3.5 - Larger Table Showing Revenues by Location for 12 Months of Data
14
Vb
Bm
Mbm
Table 3.5 breaks out the revenue data by location for nine cities and shows 12 months of revenue and cost data.
Every other column has been lightly shaded. This helps the reader quickly scan the table to see which values correspond with each month.
The horizontal line between the revenue for Academy and the Total row helps the reader differentiate the revenue data for each location and indicates that a calculation has taken place to generate the totals by month.
Columns of numerical values in a table should be right-aligned. This makes it easy to see differences in the magnitude of values.
If you are showing digits to the right of the decimal point, all values should include the same number of digits to the right of the decimal.
It is generally best to left-align text values within a column in a table, as in the Revenues by Location (the first) column of Table 3.5.
Column headings should either match the alignment of the data in the columns or be centered over the values, as in Table 3.5.
14
Tables
15
Crosstabulation: A useful type of table for describing data of two variables.
PivotTable: A crosstabulation in Microsoft Excel.
Vb
Bm
Mbm
15
Table 3.6 - Quality Rating and Meal Price for 300 Los Angeles Restaurants
16
Vb
Bm
Mbm
Data on the quality rating, meal price, and the usual wait time for a table during peak hours were collected for a sample of 300 Los Angeles area restaurants.
Table 3.6 shows the data for the first ten restaurants.
Quality ratings - categorical data; Meal prices - quantitative data.
16
Table 3.7 - Crosstabulation of Quality Rating and Meal Price for 300 Los Angeles Restaurants
17
The greatest number of restaurants in the sample (64) have a very good rating and a meal price in the $20–29 range.
Only two restaurants have an excellent rating and a meal price in the $10–19 range.
The right and bottom margins of the crosstabulation give the frequency of quality rating and meal price separately.
Vb
Bm
Mbm
A crosstabulation of the data for quality rating and meal price data is shown in Table 3.7.
The left and top margin labels define the classes for the two variables. In the left margin, the row labels (Good, Very Good, and Excellent) correspond to the three classes of the quality rating variable.
In the top margin, the column labels ($10–19, $20–29, $30–39, and $40–49) correspond to the four classes (or bins) of the meal price variable.
Each restaurant in the sample provides a quality rating and a meal price.
For example, restaurant 5 is identified as having a very good quality rating and a meal price of $33. This restaurant belongs to the cell in row 2 and column 3.
From the right margin, we see that data on quality ratings show 84 good restaurants, 150 very good restaurants, and 66 excellent restaurants. Similarly, the bottom margin shows the counts for the meal price variable. The value of 300 in the bottom right corner of the table indicates that 300 restaurants were included in this data set.
In constructing a crosstabulation, we simply count the number of restaurants that belong to each of the cells in the crosstabulation.
17
Figure 3.8 - Excel Worksheet Containing Restaurant Data
18
Vb
Bm
Mbm
Figure 3.8 illustrates Zagat’s restaurant data in Excel.
Each of the four columns in Figure 3.8 [Restaurant, Quality Rating, Meal Price ($), and Wait Time (min)] is considered a field by Excel.
18
Figure 3.9 - Initial PivotTable Field List and PivotTable Field Report for the Restaurant Data
19
Vb
Bm
Mbm
To create a PivotTable in Excel, we follow these steps:
Step 1: Click the INSERT tab on the Ribbon
Step 2: Click PivotTable in the Tables group
Step 3: When the Create PivotTable dialog box appears:
Choose Select a Table or Range
Enter A1:D301 in the Table/Range: box
Select New Worksheet as the location for the PivotTable Report
Click OK
The resulting initial PivotTable Field List and PivotTable Report are shown in Figure 3.9.
19
Figure 3.10 - Completed PivotTable Field List and A Portion of the PivotTable Report for the Restaurant Data (Columns H:AK are Hidden)
20
Vb
Bm
Mbm
Fields may be chosen to represent rows, columns, or values in the body of the PivotTable Report.
The following steps show how to use Excel’s PivotTable Field List to assign the Quality Rating field to the rows, the Meal Price ($) field to the columns, and the Restaurant field to the body of the PivotTable report.
Step 4: In the PivotTable Fields area, go to Drag fields between areas below:
Drag the Quality Rating field to the ROWS area
Drag the Meal Price ($) field to the COLUMNS area
Drag the Restaurant field to the VALUES area
Step 5: Click on Sum of Restaurant in the VALUES area
Step 6: Select Value Field Settings from the list of options
Step 7: When the Value Field Settings dialog box appears:
Under Summarize value field by, select Count
Click OK
Figure 3.10 shows the completed PivotTable Field List and a portion of the PivotTable worksheet as it now appears.
20
Figure 3.11 - Final PivotTable Report for the Restaurant Data
21
Vb
Bm
Mbm
To complete the PivotTable, we need to group the columns representing meal prices and place the row labels for quality rating in the proper order:
Step 8: Right-click in cell B4 or any cell containing a meal price column label
Step 9: Select Group from the list of options
Step 10: When the Grouping dialog box appears:
Enter 10 in the Starting at: box
Enter 49 in the Ending at: box
Enter 10 in the By: box
Click OK
Step 11: Right-click on “Excellent” in cell A5
Step 12: Select Move and click Move “Excellent” to End
The final PivotTable, shown in Figure 3.11, provides the same information as the crosstabulation in Table 3.7.
For instance, row 8 provides the frequency distribution for the data over the quantitative variable of meal price.
Seventy-eight restaurants have meal prices of $10 to $19.
Column F provides the frequency distribution for the data over the categorical variable of quality.
One hundred fifty restaurants have a quality rating of Very Good.
21
Figure 3.12 - Percent Frequency Distribution as a PivotTable for the Restaurant Data
22
Vb
Bm
Mbm
Percent frequency distribution for the restaurant data can be created using a PivotTable by using the following steps:
Step 1: Click the Count of Restaurant in the VALUES area
Step 2: Select Value Field Settings… from the list of options
Step 3: When the Value Field Settings dialog box appears, click the tab for Show Values As
Step 4: In the Show values as area, select % of Grand Total from the drop-down menu
Click OK
Figure 3.12 displays the percent frequency distribution for the Restaurant data as a Pivot-Table.
The figure indicates that 50 percent of the restaurants are in the Very Good quality category and that 26 percent have meal prices between $10 and $19.
22
Figure 3.13 - PivotTable Report for the Restaurant Data with Average Wait Times Added
23
Vb
Bm
Mbm
PivotTables may be used to display statistics other than a simple count of items.
Figure 3.11 to display summary information on wait times instead of meal prices.
Step 1: Click the Count of Restaurant field in the VALUES area
Select Remove Field
Step 2: Drag the Wait Time (min) to the VALUES area
Step 3: Click on Sum of Wait Time (min) in the VALUES area
Step 4: Select Value Field Settings… from the list of options
Step 5: When the Value Field Settings dialog box appears:
Under Summarize value field by, select Average
Click Number Format
In the Category: area, select Number
Enter 1 for Decimal places:
Click OK
When the Value Field Settings dialog box reappears, click OK
The completed PivotTable appears in Figure 3.13.
This PivotTable replaces the counts of restaurants with values for the average wait time for a table at a restaurant for each grouping of meal prices ($10–19, $20–29, $30–39, $40–49).
For instance, cell B7 indicates that the average wait time for a table at an Excellent restaurant with a meal price of $10–$19 is 25.5 minutes.
Column F displays the total average wait times for tables in each quality rating category.
We see that Excellent restaurants have the longest average waits of 35.2 minutes and that Good restaurants have average wait times of only 2.5 minutes.
Finally, cell D7 shows us that the longest wait times can be expected at Excellent restaurants with meal prices in the $30–$39 range (34 minutes).
23
Charts
24
Vb
Bm
Mbm
24
Charts
Charts (or graphs): Visual methods of displaying data.
Scatter chart: Graphical presentation of the relationship between two quantitative variables.
Trendline – A line that provides an approximation of the relationship between the variables.
Line chart: A line connects the points in the chart.
Useful for time series data collected over a period of time. (minutes, hours, days, years, etc.)
25
Vb
Bm
Mbm
25
Table 3.8 - Sample Data for the San Francisco Electronics Store
26
Vb
Bm
Mbm
We will investigate whether a relationship exists between the number of commercials shown and sales at the store the following week using a scatter chart.
The steps involved to create a scatter chart using Excel’s chart tools are as follows:
Copy the data in the file electronics to a new excel worksheet in columns A through C and rows 1 through 11.
Step 1: Select cells B2:C11
Step 2: Click the INSERT tab in the Ribbon
Step 3: Click the Insert Scatter (X,Y) or Bubble Chart button in the Charts group
Step 4: When the list of scatter chart subtypes appears, click the Scatter button
Step 5: Click the DESIGN tab under the Chart Tools Ribbon
Step 6: Click Add Chart Element in the Chart Layouts group
Select Chart Title, and click Above Chart
Click on the text box above the chart, and replace the text with Scatter Chart for the San Francisco electronics Store
Step 7: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Vertical
Click on the text box under the horizontal axis, and replace “Axis
Title” with number of Commercials
Step 8: Click Add Chart Element in the Chart Layouts group
Select Axis Title, and click Primary Horizontal
Click on the text box next to the vertical axis, and replace “Axis Title” with Sales ($100s)
Step 9: Right-click on the one of the horizontal grid lines in the body of the chart, and click Delete
Step 10: Right-click on the one of the vertical grid lines in the body of the chart, and click Delete
To add a linear trendline using Excel, we use the following steps:
Step 1: Right-click on one of the data points in the scatter chart, and select Add Trendline…
Step 2: When the Format Trendline task pane appears, select Linear under TRENDLINE OPTIONS
26
Figure 3.14 - Scatter Chart for the San Francisco Electronics Store
27
Vb
Bm
Mbm
The number of commercials (x) is shown on the horizontal axis, and sales (y) are shown on the vertical axis.
For week 1, x = 2 and y = 50. A point is plotted on the scatter chart at those coordinates; similar points are plotted for the other nine weeks.
Note that during two of the weeks, one commercial was shown, during two of the weeks, two commercials were shown, and so on.
The completed scatter chart in Figure 3.14 indicates a positive linear relationship (or positive correlation) between the number of commercials and sales: higher sales are associated with a higher number of commercials.
27
Table 3.9 - Monthly Sales Data of Air Compressors at Kirkland Industries
28
Vb
Bm
Mbm
Table 3.9 contains total sales amounts (in $100s) for air compressors during each month in the most recent calendar year. To create the line chart to this data, the steps are as follows:
Copy the data in the file Kirkland to a new excel worksheet in columns A and B; rows 1 through 13.
Step 1: Select cells A2:B13
Step 2: Click the INSERT tab on the Ribbon
Step 3: Click the Insert Line Chart button in the Charts group
Step 4: When the list of line chart subtypes appears, click the Line with Markers button under 2-D Line
This creates a line chart for sales with a basic layout and minimum formatting
Step 5: Select the line chart that was just created to reveal the CHART TOOLS Ribbon
Click the DESIGN tab under the CHART TOOLS Ribbon
Step 6: Click Add Chart Element in the Chart Layouts group
Select Axis Title from the drop-down menu
Click Primary Vertical
Click on the text box next to the vertical axis, and replace “Axis Title” with Sales ($100s)
Step 7: Click Add Chart Element in the Chart Layouts group
Select Chart Title from the drop-down menu
Click Above Chart
Click on the text box above the chart, and replace “Chart Title” with line Chart for Monthly Sales data
Step 8: Right-click on one of the horizontal lines in the chart, and click Delete
28
Figure 3.15 - Scatter Chart and Line Chart for Monthly Sales Data at Kirkland Industries
29
Vb
Bm
Mbm
29
Table 3.10 - Regional Sales Data by Month for Air Compressors at Kirkland Industries
30
Vb
Bm
Mbm
30
Figure 3.16 - Line Chart of Regional Sales Data at Kirkland Industries
31
Vb
Bm
Mbm
We can create a line chart in Excel that shows sales in both regions, as in Figure 3.16, by following similar steps but selecting cells A2:C14 in the file KirklandRegional before creating the line chart.
Sales in both the North and South regions seemed to follow the same increasing/decreasing pattern until October.
Starting in October, sales in the North continued to decrease while sales in the South increased.
We would probably want to investigate any changes that occurred in the North region around October.
31
Charts
Sparkline - Special type of line chart
Minimalist type of line chart that can be placed directly into a cell in Excel.
Contain no axes; they display only the line for the data.
Take up very little space, and they can be effectively used to provide information on overall trends for time series data.
32
Vb
Bm
Mbm
To create a sparkline in Excel:
Step 1: Click the INSERT tab on the Ribbon
Step 2: Click Line in the Sparklines group
Step 3: When the Create Sparklines dialog box opens,
Enter B3:B14 in the Data Range: box
Enter B15 in the Location Range: box
Click OK
Step 4: Copy cell B15 to cell C15
32
Figure 3.17 - Sparklines for the Regional Sales Data at Kirkland Industries
33
Vb
Bm
Mbm
The sparklines in cells B15 and C15 do not indicate the magnitude of sales in the North and South regions, but they do show the overall trend for these data.
Sales in the North appear to be decreasing in recent time, and sales in the South appear to be increasing overall.
Because sparklines are input directly into the cell in Excel, we can also type text directly into the same cell that will then be overlaid on the sparkline, or we can add shading to the cell, which will appear as the background.
In Figure 3.17, we have shaded cells B15 and C15 to highlight the sparklines.
As can be seen, sparklines provide an efficient and simple way to display basic information about a time series.
33
Charts
Bar Charts: Use horizontal bars to display the magnitude of the quantitative variable.
Column Charts: Use vertical bars to display the magnitude of the quantitative variable.
Bar and column charts are very helpful in making comparisons between categorical variables.
34
Vb
Bm
Mbm
34
Figure 3.18 - Bar Charts for Accounts Managed Data
35
Gentry manages the greatest number of accounts and Williams
the fewest.
Vb
Bm
Mbm
Consider the regional supervisor who wants to examine the number of accounts being handled by each manager. Figure 3.18 shows a bar chart created in Excel displaying these data. To create this bar chart in Excel:
Step 1: Select cells A2:B9
Step 2: Click the INSERT tab on the Ribbon
Step 3: Click the Insert Bar Chart button in the Charts group
Step 4: When the list of bar chart subtypes appears:
Click the Clustered Bar button in the 2-D Bar section
Step 5: Select the bar chart that was just created to reveal the CHART TOOLS ribbon
Click the DESIGN tab under the CHART TOOLS Ribbon
Step 6: Click Add Chart Element in the Chart Layouts group
Select Axis Title from the drop-down menu
Click Primary Horizontal
Click on the text box next to the vertical axis, and replace “Axis Title” with Accounts Managed
Step 7: Click Add Chart Element in the Chart Layouts group
Select Axis Title from the drop-down menu
Click Primary Vertical
Click on the text box next to the vertical axis, and replace “Axis Title” with Manager
Step 8: Click Add Chart Element in the Chart Layouts group
Select Chart Title from the drop-down menu
Click Above Chart
Click on the text box above the chart, and replace “Chart Title” with Bar Chart of Accounts Managed
Step 9: Right-click on one of the vertical lines in the chart, and click Delete
35
Figure 3.19 - Sorted Bar Chart for Accounts Managed Data
36
Vb
Bm
Mbm
We can make the bar chart in Figure 3.18 even easier to read by ordering the results by the number of accounts managed.
In the completed bar chart in Excel, shown in Figure 3.19, we can easily compare the relative number of accounts managed for all managers.
We can do this with the following steps:
Step 1: Select cells A1:B9
Step 2: Right-click any of the cells A1:B9
Choose Sort
Click Custom Sort
Step 3: When the Sort dialog box appears:
Make sure that the check box for My data has headers is checked
Choose Accounts Managed in the Sort by box under Column
Choose Smallest to Largest under Order
Click OK
36
Figure 3.20 - Bar Chart with Data Labels for Accounts Managed Data
37
Vb
Bm
Mbm
In order to interpret from the bar chart exactly how many accounts are assigned to each manager, we add data labels to the bar chart, as in Figure 3.20, which is created in Excel using the following steps:
Step 1: Select the bar chart just created to reveal the CHART TOOLS Ribbon
Step 2: Click DESIGN tab in the CHART TOOLS Ribbon
Step 3: Click Add Chart Element in the Chart Layouts group
Select Data Labels
Click Outside End
This adds labels of the number of accounts managed to the end of each bar so that the reader can easily look up exact values displayed in the bar chart.
37
Charts
Pie charts: Common form of chart used to compare categorical data.
Bubble chart: Graphical means of visualizing three variables in a two-dimensional graph.
Sometimes a preferred alternative to a 3-D graph.
Heat map: A two-dimensional graphical representation of data that uses different shades of color to indicate magnitude.
38
Vb
Bm
Mbm
38
Figure 3.21 - Pie Chart of Accounts Managed
39
Vb
Bm
Mbm
Figure 3.21 displays the data for the number of accounts managed in Figure 3.18.
Visually, it is still relatively easy to see that Gentry has the greatest number of accounts and that Williams has the fewest.
However, it is difficult to say whether Lopez or Francois has more accounts. Research has shown that people find it very difficult to perceive differences in area.
Compare Figure 3.21 to Figure 3.19.
Making visual comparisons is much easier in the bar chart than in the pie chart (particularly when using a limited number of colors for differentiation).
39
Table 3.11 - Sample Data on Billionaires per Country
40
Vb
Bm
Mbm
Suppose that we want to compare the number of billionaires in various countries.
Table 3.11 provides a sample of six countries, showing, for each country, the number of billionaires per 10 million residents, the per capita income, and the total number of billionaires.
40
Figure 3.22 - Bubble Chart Comparing Billionaires by Country
41
Vb
Bm
Mbm
Figure 3.22 shows the bubble chart that has been created by Excel.
Copy the sample data on billionaires from the webfile Billionaires to an Excel worksheet from columns A through D and rows 1 through 7.
Step 1: Select cells B2:D7
Step 2: Click the INSERT tab on the Ribbon
Step 3: In the Charts group, click Insert Scatter (X,Y) or Bubble Chart button
In the Bubble subgroup, click the Bubble button
Step 4: Select the chart that was just created to reveal the CHART TOOLS ribbon
Click the DESIGN tab under the CHART TOOLS Ribbon
Step 5: Click Add Chart Element in the Chart Layouts group
Choose Axis Title from the drop-down menu
Click Primary Horizontal
Click on the text box under the horizontal axis, and replace “Axis Title” with Billionaires per 10 Million residents
Step 6: Click Add Chart Element in the Chart Layouts group
Choose Axis Title from the drop-down menu
Click Primary Vertical
Click on the text box next to the vertical axis, and replace “Axis Title” with Per Capita income
Step 7: Click Add Chart Element in the Chart Layouts group
Choose Chart Title from the drop-down menu
Click Above Chart
Click on the text box above the chart, and replace “Chart Title” with Billionaires by Country
Step 8: Click Add Chart Element in the Chart Layouts group
Choose Gridlines from the drop-down menu
Deselect Primary Major Horizontal and Primary Major Vertical to remove the gridlines from the bubble chart
However, we can make this chart much more informative by taking a few additional steps to add the names of the countries:
Step 9: Click Add Chart Element in the Chart Layouts group
Choose Data Labels from the drop-down menu
Click More Data Label Options . . .
Step 10: Click the Label Options icon
Under LABEL OPTIONS, select Value from Cells, and click the Select Range button
Step 11: When the Data Label Range dialog box opens, select cells A2:A7 in the Worksheet.
This will enter the value “=SheetName!$A$2:$A$7” into the Select Data Label Range box
where “SheetName” is the name of the active Worksheet
Click OK
Step 12: In the Format Data Labels task pane, deselect Y Value in the LABEL OPTIONS area, and select Right under Label Position
41
Figure 3.23 - Bubble Chart Comparing Billionaires by Country with Data Labels added
42
Vb
Bm
Mbm
The completed bubble chart in Figure 3.23 enables us to easily associate each country with the corresponding bubble.
Hong Kong has the most billionaires per 10 million residents but that the United States has many more billionaires overall (Hong Kong has a much smaller population than the United States).
From the relative bubble sizes, we see that China and Russia also have many billionaires but that there are relatively few billionaires per 10 million residents in these countries and that these countries overall have low per capita incomes.
The United Kingdom, United States, and Hong Kong all have much higher per capita incomes.
42
Figure 3.24 - Heat Map and Sparklines for Same-Store Sales Data
43
Vb
Bm
Mbm
Figure 3.24 can be created in Excel by following these steps:
Step 1: Select cells B2:M17
Step 2: Click the HOME tab on the Ribbon
Step 3: Click Conditional Formatting in the Styles group
Choose Color Scales and click on Blue–White–Red Color Scale
To add the sparklines in Column N, we use the following steps:
Step 4: Select cell N2
Step 5: Click the INSERT tab on the Ribbon
Step 6: Click Line in the Sparklines group
Step 7: When the Create Sparklines dialog box opens:
Enter B2:M2 in the Data Range: box
Enter N2 in the Location Range: box and click OK
Step 8: Copy cell N2 to N3:N17
The heat map in Figure 3.24 helps the reader to easily identify trends and patterns.
The cells shaded grey in Figure 3.24 indicate declining same-store sales for the month, and cells shaded blue indicate increasing same-store sales for the month. Column N in Figure 3.24 also contains sparklines for the same-store sales data.
We can see that Austin has had positive increases throughout the year, while Pittsburgh has had consistently negative same-store sales results.
Same-store sales at Cincinnati started the year negative but then became increasingly positive after May.
In addition, we can differentiate between strong positive increases in Austin and less substantial positive increases in Chicago by means of color shadings.
A sales manager could use the heat map in Figure 3.24 to identify stores that may require intervention and stores that may be used as Models.
To avoid problems with interpreting differences in color, we can add the sparklines in Column N of Figure 3.24.
43
Charts
Additional charts for multiple variables
Stacked column chart: Allows the reader to compare the relative values of quantitative variables for the same category in a bar chart.
Clustered column (or bar) chart: An alternative chart to stacked column chart for comparing quantitative variables.
Scatter chart matrix: Useful chart for displaying multiple variables.
44
Vb
Bm
Mbm
Stacked column and bar charts should be used only when comparing a few quantitative variables and when there are large differences in the relative values of the quantitative variables within the category.
A scatter chart matrix allows the reader to easily see the relationships among multiple variables.
44
Figure 3.25 - Stacked Column Chart for Regional Sales Data for Kirkland Industries
45
Vb
Bm
Mbm
To create the stacked column chart shown in Figure 3.25, we use the following steps:
Step 1: Select cells A3:C14
Step 2: Click the INSERT tab on the Ribbon
Step 3: In the Charts group, click the Insert Column Chart button
Click the Stacked Column button under the 2-D Column
45
Figure 3.26 - Comparing Stacked, Clustered and Multiple Column Charts for the Regional Sales Data for Kirkland Industries
46
Vb
Bm
Mbm
Figure 3.26 shows that the multiple column charts require considerably more space than the stacked and clustered column charts.
However, when comparing many quantitative variables, using multiple charts can often be superior even if each chart must be made smaller.
46
Table 3.12 - Data for New York city Subboroughs
47
Vb
Bm
Mbm
Table 3.12 contains a partial listing of the data for each of New York City’s 55 subboroughs (a designation of a community within New York City) on monthly median rent, percentage of college graduates, poverty rate, and mean travel time to work.
We will use the scatter chart matrix to examine the relationship between these different categorical variables.
47
Figure 3.27 - Scatter Chart Matrix for New York City Rent Data
48
Vb
Bm
Mbm
Figure 3.27 displays a scatter chart matrix (scatter plot matrix) for data related to rentals in New York City.
Each scatter chart in the matrix is created in the same manner as for creating a single scatter chart.
Each column and row in the scatter chart matrix corresponds to one categorical variable.
For instance, row 1 and column 1 in Figure 3.27 correspond to the median monthly rent variable.
Row 2 and column 2 correspond to the percentage of college graduates variable.
Therefore, the scatter chart shown in the row 1, column 2 shows the relationship between median monthly rent (on the y-axis) and the percentage of college graduates (on the x-axis) in New York City subboroughs.
The scatter chart shown in row 2, column 3 shows the relationship between the percentage of college graduates (on the y-axis) and poverty rate (on the x-axis).
Figure 3.27 allows us to infer several interesting findings. Because the points in the scatter chart in row 1, column 2 generally get higher moving from left to right, this tells us that subboroughs with higher percentages of college graduates appear to have higher median monthly rents.
The scatter chart in row 1, column 3 indicates that subboroughs with higher poverty rates appear to have lower median monthly rents.
The data in row 2, column 3 show that subboroughs with higher poverty rates tend to have lower percentages of college graduates.
The scatter charts in column 4 show that the relationships between the mean travel time and the other categorical variables are not as clear as relationships in other columns.
Scatter chart matrix is created using the Excel Add-In XLMiner. Statistical software packages such as R, NCSS, and SAS can also be used to create these matrixes.
48
Charts
PivotChart: To summarize and analyze data with both a crosstabulation and charting, Excel pairs PivotCharts with PivotTables.
49
Vb
Bm
Mbm
Using the restaurant data introduced in Table 3.7 and Figure 3.7, we can create a PivotChart by taking the following steps:
Step 1: Click the INSERT tab on the Ribbon
Step 2: In the Charts group, choose PivotChart
Step 3: When the Create PivotChart dialog box appears:
Choose Select a Table or Range
Enter A1:D301 in the Table/Range: box
Choose New Worksheet as the location for the PivotTable Report
Click OK
Step 4: In the PivotChart Fields area, under Choose fields to add to report:
Drag the Quality Rating field to the AXIS (CATEGORIES) area
Drag the Meal Price ($) field to the LEGEND (SERIES) area
Drag the Wait Time (min) field to the VALUES area
Step 5: Click on Sum of Wait Time (min) in the Values area
Step 6: Click Value Field Settings… from the list of options that appear
Step 7: When the Value Field Settings dialog box appears:
Under Summarize value field by, choose Average
Click Number Format
In the Category: box, choose Number
Enter 1 for Decimal places:
Click OK
When the Value Field Settings dialog box reappears, click OK
Step 8: Right-click in cell B2 or any cell containing a meal price column label
Step 9: Select Group from the list of options that appears
Step 10: When the Grouping dialog box appears:
Enter 10 in the Starting at: box
Enter 49 in the Ending at: box
Enter 10 in the By: box
Click OK
Step 11: Right-click on “Excellent” in cell A5
Step 12: Select Move and click Move “Excellent” to End
49
Figure 3.28 - PivotTable and PivotChart for the Restaurant Data
50
Vb
Bm
Mbm
The completed PivotTable and PivotChart appear in Figure 3.28.
The PivotChart is a clustered column chart whose column heights correspond to the average wait times and are clustered into the categorical groupings of Good, Very Good, and Excellent.
The columns are shaded to differentiate the wait times at restaurants in the various meal price ranges.
Figure 3.28 shows that Excellent restaurants have longer wait times than Good and Very Good restaurants.
We also see that Excellent restaurants in the price range of $30–$39 have the longest wait times.
The PivotChart displays the same information as that of the PivotTable in Figure 3.13, but the column chart used here makes it easier to compare the restaurants based on quality rating and meal price.
50
Advanced Data Visualization
51
Vb
Bm
Mbm
Advanced Data Visualization
Parallel coordinates plot: Chart for examining data with more than two variables.
Includes a different vertical axis for each variable.
Each observation is represented by drawing a line on the parallel coordinates plot connecting each vertical axis.
The height of the line on each vertical axis represents the value taken by that observation for the variable corresponding to the vertical axis.
Treemap: Useful for visualizing hierarchical data along multiple dimensions.
52
Vb
Bm
Mbm
52
Figure 3.29 - Parallel Coordinates Plot for Baseball Data
53
Vb
Bm
Mbm
Figure 3.29 displays a parallel coordinates plot for a sample of Major League Baseball players.
The figure contains data for ten players who play first base (1B) and ten players who play second base (2B).
For each player, the leftmost vertical axis plots his total number of home runs (HR).
The center vertical axis plots the player’s total number of stolen bases (SB), and the rightmost vertical axis plots the player’s batting average.
Various colors differentiate 1B players from 2B players (1B players are in the lighter color, 2B player in the darker color).
We can make several interesting statements upon examining Figure 3.29.
The sample of 1B players tend to hit lots of home runs (HR) but have very few stolen bases (SB).
Conversely, the sample of 2B players steal more bases but generally have fewer HR, although some 2B players have many HR and many SB.
Finally, 1B players tend to have higher batting averages (AVG) than 2B players.
We may infer from Figure 3.29 that the traits of 1B players may be different from those of 2B players.
Players at 1B tend to be offensive stars who hit for power and average, whereas players at 2B are often faster and more agile in order to handle the defensive responsibilities of the position (traits that are not common in strong HR hitters).
Parallel coordinates plots, in which you can differentiate categorical variable values using color as in Figure 3.29, can be very helpful in identifying common traits across multiple dimensions.
53
Figure 3.30 - SmartMoney’s Map of the Market as An Example of A Treemap
54
Vb
Bm
Mbm
SmartMoney’s Map of the Market, shown in Figure 3.30, is a treemap for analyzing stock market performance.
In the Map of the Market, each rectangle represents a particular company (Apple, Inc. is highlighted in Figure 3.30).
The shading of the rectangle represents the overall performance of the company’s stock over the previous 52 weeks.
The Map of the Market is also divided into market sectors (Health Care, Financials, Oil & Gas, etc.).
The size of each company’s rectangle provides information on the company’s market capitalization size relative to the market sector and the entire market.
Figure 3.30 shows that Apple has a very large market capitalization relative to other firms in the Technology sector and that it has performed exceptionally well over the previous 52 weeks.
An investor can use the treemap in Figure 3.30 to quickly get an idea of the performance of individual companies relative to other companies in their market sector as well as the performance of entire market sectors relative to other sectors.
54
Advanced Data Visualization
Geographic Information systems charts
Geographic Information Systems (GIS): A system that merges maps and statistics to present data collected over different geographies.
Helps in interpreting data and observing patterns.
55
Vb
Bm
Mbm
55
Figure 3.31 - GIS Chart for Cincinnati Zoo Member Data
56
Vb
Bm
Mbm
Figure 3.31 displays a map of the Cincinnati, Ohio, metropolitan area showing the relative concentrations of Cincinnati Zoo members.
The more darkly shaded areas represent areas with a greater number of members.
The GIS chart in Figure 3.31 combines a heat map and a geographical map to help the reader analyze this data set.
A high concentration of zoo members in a band to the northeast of the zoo that includes the cities of Mason and Hamilton (circled). Similarly, the city of Florence towards the Southwest of the zoo.
These observations could prompt the zoo manager to identify the characteristics that the populations of Mason, Hamilton, and Florence share to learn what is leading them to be zoo members.
If these characteristics can be identified, the manager can then try to identify other nearby populations that share these characteristics as potential markets for increasing the number of zoo members.
56
Data Dashboards
57
Vb
Bm
Mbm
Data Dashboards
Data dashboard: Data visualization tool that illustrates multiple metrics and automatically updates these metrics as new data become available.
Key performance indicators (KPIs) in dashboards:
Automobile dashboard - Current speed, Fuel level, and oil pressure
Business dashboard - Financial position, inventory on hand, customer service metrics
58
Vb
Bm
Mbm
58
Data Dashboards
Principles of effective data dashboards
Should provide timely summary information on KPIs that are important to the user.
Should present all KPIs as a single screen that a user can quickly scan to understand the business’s current state of operations.
59
Vb
Bm
Mbm
59
Data Dashboards
Principles of effective data dashboards (contd.)
The KPIs displayed in the data dashboard should convey meaning to its user and be related to the decisions the user makes.
A data dashboard should call attention to unusual measures that may require attention.
Color should be used to call attention to specific values to differentiate categorical variables, but the use of color should be restrained.
60
Vb
Bm
Mbm
60
Figure 3.32 - Data Dashboard for the Grogan Oil Information Technology Call Center
61
Vb
Bm
Mbm
Grogan Oil Company which has offices located in three cities in Texas: Austin (its headquarters), Houston, and Dallas.
Grogan’s Information Technology (IT) call center, located in Austin, handles calls from employees regarding computer-related problems involving software, Internet, and e-mail issues.
The data dashboard shown in Figure 3.32, developed to monitor the performance of the call center, combines several displays to track the call center’s KPIs.
The line chart in the upper left-hand corner shows the call volume for each type of problem (Software, Internet, or E-mail) over time.
This chart shows that call volume is heavier during the first few hours of the shift, that calls concerning e-mail issues appear to decrease over time, and that the volume of calls regarding software issues are highest at midmorning.
The column chart in the upper right-hand corner of the dashboard shows the percentage of time that call center employees spent on each type of problem or were idle (not working on a call).
The clustered bar chart in the middle right of the dashboard shows the call volume by type of problem for each of Grogan’s offices.
This allows the IT manager to quickly identify if there is a particular type of problem by location.
The office in Austin seems to be reporting a relatively high number of issues with e-mail. If the source of the problem can be identified quickly, then the problem might be resolved quickly for many users all at once.
The bar chart shown in the middle left of the data dashboard displays the length of time for which each case has been unresolved, for each unresolved case that was received more than 15 minutes ago.
This chart enables Grogan to quickly monitor the key problem cases and decide whether additional resources may be needed to resolve them.
The frequency distribution chart in the bottom panel shows the length of time required for resolved cases during the current shift.
The Grogan Oil data dashboard presents data at the operational level, is updated in real time, and is used for operational decisions such as staffing levels.
Data dashboards may also be used at the tactical and strategic levels of management.
At the highest level, a more strategic dashboard would allow upper management to quickly assess the financial health of the company by monitoring more aggregate financial, service-level, and capacity utilization information.
61