Information Technology Management (Business Intelligence)

profilemalmershed
ASGN-8MovieMess.pdf

ASGN-8: Movie Mess Setting Our movie chain has a really bad ticket purchasing system - it allows

all sorts of problems with data entry, and lets people buy tickets for

combinations that don't exist yet (including movies that aren't even

showing in the years that we have a catalog for). While we had an

intern try to clean up the movie titles, they didn’t know how to combine their work with the transactions.

Also, there are a *lot* of other errors that need to be found and corrected.

Required Files  ASGN-8MovieMess.pbix – A Microsoft Power BI Desktop file with data pre-loaded in it

 ASGN-8 Movie Mess.pdf – these instructions

 ASGN-8 Answer File.docx – file to record your answers

Tools and Background You’ve learned about a few types of data problems from the

lecture slides. Now, let’s explore our messy data and see which

problems we can find (and maybe fix). While we could use

Excel, we’re going to try using Power BI’s query editor to

explore and shape the data. A Microsoft reference page on the query editor can be found at:

https://docs.microsoft.com/en-us/power-bi/desktop-query-overview

ABOUT THE QUERY EDITOR All of your ETL (Extract, Transform & Load) capabilities are on this screen. When you have cleaned your data here,

click ‘Close & Apply’ to load into your “Data View”. From there you can create links to other data in “Relationships

View”, and start building reports in “Report View”.

You will mostly use the Home, Transform & Add Column ribbons

Each change you make to clean up the data is stored here. If you need to, you can delete a step and redo it.

Your Queries will be listed here. DblClk a name to change it. You can also merge & append queries here.

Click on the Fieldname to select the column. RghtClk brings your up tools to use, and more tools are on the Transform ribbon

Click here to change a field’s Data Type. ‘ABC’ is Text. ‘1.2’ is Decimal Number.

Thank you to everyone for filling out

the form for ENGAGE-7:Let’s go to the

movies. Your responses were

combined with last semester’s to

create this data set representing the

movies you and your peers indicated

you would conceivably watch.

Getting Started 1. Download and open the file ASGN-8MovieMess.pbix

a. it is recommended you use Chrome or Firefox to download this file from Blackboard.

Microsoft Edge and Internet Explorer sometimes rename the .pbix file as a .zip file.

b. You can fix this by switching browsers, or by changing the extension from .zip to .pbix

2. Open the Query Editor

a. “To get to Query Editor, select Edit Queries

from the Home tab of Power BI Desktop.”

b. Source: https://docs.microsoft.com/en-

us/power-bi/desktop-query-overview

c. You should see a screen similar to that

below. If you are off-campus and receive an

error, you can fix it in one way, of two ways, both of which are explained in Appendix A:

i. Download the Boise State VPN (see Appendix A) – best if you’ve already started

working on the file

ii. Use the backup Excel file (see Appendix A) – best if no work is done yet

iii. If you are on EDUROAM and are asked for your credentials, see Appendix B

Ready to get started

Needs VPN and/or backup Excel file (see Appendix A at the end of this document)

Types of Data Problems (from the lecture): 1. Duplicate Data

2. Inconsistent values—due to spelling, spaces, punctuation, abbreviation

3. Blank cells – missing data that should be there

4. Numbers that aren’t numbers

5. Multiple Values per cell

6. Outliers & Incorrect Values

7. Inconsistent Related values

The intern ran the “Remove Duplicate Values” algorithm for us, so we can ignore “Duplicate Data” for the

rest of this assignment. The remaining 6 problems may be observed in one or more columns.

Two useful tools for finding data problems are (1) sorting and (2) observing the filter list.

Examine the “TicketTransactions” table, starting in the far right column 1. Question 1: In the column, “How would you like to pay for your ticket?”

a. Use the down-arrow to the right of the column name to see a list of the values you can

filter by. Which of the seven data problems do you observe? Record these in the answer

file template.

b. Take a screenshot of the filter list and insert it in the answer file.

2. Question 2: In the column, “Where will you purchase and print your ticket?”

a. How many rows contain “null” (a blank cell) for this column?

b. Which purchase location is least popular? (HINT: filter the values one by one and observe

the number of rows shown in the bottom left corner of the page)

3. Question 3: In the column “Will anyone else be joining your group (and purchasing their tic”

a. Which of the seven data problems do you observe?

b. Right-click the column name, select “Rename” and change the

column name to “OthersJoining?”. Take a screenshot of the top

of the column after making the change.

4. Question 4: In the column “How many tickets would you like to buy?”

a. How many transactions were for 5 or more tickets?

b. Do you think any of these amounts are an outlier? Why, or why not?

Thank goodness! The ticketing system didn’t appear to allow data errors to the individual columns

“Choose your ticket type,” “Choose your movie house,” and “Choose your movie format”… at least, I

think. Something bugs me about those. Let’s come back to them later.

5. Question 5: In the column “What showing time would you like to attend?” There are dates (many

12/30/1899) and in the column “What day would you like to watch the movie?” there are times

(set to 12:00:00 AM). These related values appear inconsistent. Let’s see

if we can fix this by changing the data types

a. Click on the icon to the left of the “What showing time would you

like to attend?” column title and change its data type to “Time”

b. Change the data type for the column “What day would you like to

watch the movie?” to “Date”. Take a screenshot of both column

names and the first couple rows and observe that the 12/30/1899

and 12:00:00 AM times are gone.

6. Question 6: In the column “What movie would you like to see?”

a. Which of the seven data problems do you observe?

b. How many different titles can you find for “Spider-Man:Homecoming ”? (HINT: even if

they look the same, each entry on the text filter list is there because of a unique character

arrangement)

Oh, I figured out what was wrong with “Choose your ticket type” column – I can’t multiply those dollar

amounts by number of tickets to calculate sales. That’s not the only problem, either.

7. Question 7: In the column “Choose your ticket type”

a. Which of the seven data problems do you observe?

Extra Credit Even though there are a lot of data inconsistencies in the movie titles, our intern was able to use a

PivotTable to identify all the unique values and define one consistent value for each title. Power BI lets

you combine two tables like this by merging two queries.

If you follow the steps below, you should be able to see how good of a job the intern did (defined as the

percentage of rows that match the crosswalk).

1. Select “Merge Queries as New”

2. Set the second table drop-down to “TitleCrosswalk”

3. Click the column name “What movie would you like to see?” in the table “TicketTransactions”

4. Click the column name “TrimmedTitle” in the table “TitleCrosswalk”, and observe that 507 of the

605 rows (83.8%) will match if the operation is completed.

5. This isn’t good enough, so click “Cancel”

We can get the percentage of matches higher by manipulating the text in the colum “What movie would

you like to see?”

Trim If we use the “Trim” transformation and try to merge again, the matched rows rise to 547 out of 605

(90.4%)

UPPERCASE We can similarly use the UPPERCASE transformation on both the

“What movie would you like to see?” column in the TicketTransactions

table and the “TrimmedTitle” column of the TitleCrosswalk table. This

last transformation will create a duplicate in the TitleCrosswalk table,

so you’ll need to run the Remove Rows  Remove Duplicates tool on

that table.

Final Merge After applying the Trim and UPPERCASE transformations as described

above, complete the final merge (change step 5 in the instructions

above to “OK” instead of “Cancel”), noting that all 605 records match.

When you do this, you’ll notice that the last column named

“TitleCrosswalk” looks strange. Click on the arrows to the right

of the column name and select just the “MovieTitle” column and

click “OK”.

Drag the MovieTitle column to the left until it’s next to the ALL

CAPS “What movie would you like to see?” column.

To earn extra credit, take a screenshot of the whole Power BI

window, making sure the Merge1 query, both title fields, and

some data values are visible.

Appendix A – Fix the DataSource.Error Did you receive an error similar to the one below? Not to worry, there are a couple options you can take

to work around this.

Option 1: Download the Boise State VPN (the same as faculty and staff)  Use the instructions and links under

“Download and Install the Boise State VPN

Software” at

https://oit.boisestate.edu/network/vpn-

services/

 Note: as a student in ITM310, a VPN account

has been created for you and you can use

the same software as faculty and staff, free

of charge

 Once you have run the VPN, logged in with

your credentials and connected to the

campus network, you can continue.

 Select one of the queries on the left-hand

list, then on the Home tab of the ribbon,

click “Refresh Preview” and the data should

populate

 Select the other query on the left-hand list

and click “Refresh Preview” again

 You should now be ready to start your

assignment

Unfortunately, an unforeseen technical

issue prevents students using their own

computer from connecting the Boise State-

hosted SQL Server database at this time

(even while on campus). While this is being

worked on by OIT, students who need to

use their home computer should follow the

instructions in “Appendix A Option 2:

Import Backup Data from Excel.”

Option 2: Import Backup data from Excel 1. Click the down arrow under “New Source” and choose Excel

2. Select the file ASGN-8-BackupData.xlsx (available on Blackboard)

3. Check the box next to both tables and click “Ok”

4. Navigate to the TitleCrosswalk (2) table, and on the Home tab of

the ribbon, in the “Transform” group, Click “Use First Row as

Headers”

5. [Optional] You can right-click the original “TicketTransactions” table

(the one with the “!” triangle next to it) and select “Delete” (repeat

for “TitleCrosswalk” )

6. You should now be ready to start the assignment

Appendix B – How to enter credentials in Power BI while on campus

This only works on the wireless internet connection EDUROAM (on any other wireless or wired network, a

VPN is required -- see Appendix A)

1. After opening up the Power BI file (ASGN-8MovieMess.pbix), click "Edit Queries"

2. When asked for credentials, enter your username and password as described and shown below.

a. Select "Windows" on the left

b. Click "Use alternate credentials" in the middle

c. in front of your username, type "boisestate\" (without the quotation marks)

d. Enter your password normally.

Unfortunately, an unforeseen technical

issue prevents students using their own

computer from connecting the Boise State-

hosted SQL Server database at this time

(even while on campus). While this is being

worked on by OIT, students who need to

use their home computer should follow the

instructions in “Appendix A Option 2:

Import Backup Data from Excel.”