Information Technology Management (Business Intelligence)
ASGN-8: Movie Mess Setting Our movie chain has a really bad ticket purchasing system - it allows
all sorts of problems with data entry, and lets people buy tickets for
combinations that don't exist yet (including movies that aren't even
showing in the years that we have a catalog for). While we had an
intern try to clean up the movie titles, they didn’t know how to combine their work with the transactions.
Also, there are a *lot* of other errors that need to be found and corrected.
Required Files ASGN-8MovieMess.pbix – A Microsoft Power BI Desktop file with data pre-loaded in it
ASGN-8 Movie Mess.pdf – these instructions
ASGN-8 Answer File.docx – file to record your answers
Tools and Background You’ve learned about a few types of data problems from the
lecture slides. Now, let’s explore our messy data and see which
problems we can find (and maybe fix). While we could use
Excel, we’re going to try using Power BI’s query editor to
explore and shape the data. A Microsoft reference page on the query editor can be found at:
https://docs.microsoft.com/en-us/power-bi/desktop-query-overview
ABOUT THE QUERY EDITOR All of your ETL (Extract, Transform & Load) capabilities are on this screen. When you have cleaned your data here,
click ‘Close & Apply’ to load into your “Data View”. From there you can create links to other data in “Relationships
View”, and start building reports in “Report View”.
You will mostly use the Home, Transform & Add Column ribbons
Each change you make to clean up the data is stored here. If you need to, you can delete a step and redo it.
Your Queries will be listed here. DblClk a name to change it. You can also merge & append queries here.
Click on the Fieldname to select the column. RghtClk brings your up tools to use, and more tools are on the Transform ribbon
Click here to change a field’s Data Type. ‘ABC’ is Text. ‘1.2’ is Decimal Number.
Thank you to everyone for filling out
the form for ENGAGE-7:Let’s go to the
movies. Your responses were
combined with last semester’s to
create this data set representing the
movies you and your peers indicated
you would conceivably watch.
Getting Started 1. Download and open the file ASGN-8MovieMess.pbix
a. it is recommended you use Chrome or Firefox to download this file from Blackboard.
Microsoft Edge and Internet Explorer sometimes rename the .pbix file as a .zip file.
b. You can fix this by switching browsers, or by changing the extension from .zip to .pbix
2. Open the Query Editor
a. “To get to Query Editor, select Edit Queries
from the Home tab of Power BI Desktop.”
b. Source: https://docs.microsoft.com/en-
us/power-bi/desktop-query-overview
c. You should see a screen similar to that
below. If you are off-campus and receive an
error, you can fix it in one way, of two ways, both of which are explained in Appendix A:
i. Download the Boise State VPN (see Appendix A) – best if you’ve already started
working on the file
ii. Use the backup Excel file (see Appendix A) – best if no work is done yet
iii. If you are on EDUROAM and are asked for your credentials, see Appendix B
Ready to get started
Needs VPN and/or backup Excel file (see Appendix A at the end of this document)
Types of Data Problems (from the lecture): 1. Duplicate Data
2. Inconsistent values—due to spelling, spaces, punctuation, abbreviation
3. Blank cells – missing data that should be there
4. Numbers that aren’t numbers
5. Multiple Values per cell
6. Outliers & Incorrect Values
7. Inconsistent Related values
The intern ran the “Remove Duplicate Values” algorithm for us, so we can ignore “Duplicate Data” for the
rest of this assignment. The remaining 6 problems may be observed in one or more columns.
Two useful tools for finding data problems are (1) sorting and (2) observing the filter list.
Examine the “TicketTransactions” table, starting in the far right column 1. Question 1: In the column, “How would you like to pay for your ticket?”
a. Use the down-arrow to the right of the column name to see a list of the values you can
filter by. Which of the seven data problems do you observe? Record these in the answer
file template.
b. Take a screenshot of the filter list and insert it in the answer file.
2. Question 2: In the column, “Where will you purchase and print your ticket?”
a. How many rows contain “null” (a blank cell) for this column?
b. Which purchase location is least popular? (HINT: filter the values one by one and observe
the number of rows shown in the bottom left corner of the page)
3. Question 3: In the column “Will anyone else be joining your group (and purchasing their tic”
a. Which of the seven data problems do you observe?
b. Right-click the column name, select “Rename” and change the
column name to “OthersJoining?”. Take a screenshot of the top
of the column after making the change.
4. Question 4: In the column “How many tickets would you like to buy?”
a. How many transactions were for 5 or more tickets?
b. Do you think any of these amounts are an outlier? Why, or why not?
Thank goodness! The ticketing system didn’t appear to allow data errors to the individual columns
“Choose your ticket type,” “Choose your movie house,” and “Choose your movie format”… at least, I
think. Something bugs me about those. Let’s come back to them later.
5. Question 5: In the column “What showing time would you like to attend?” There are dates (many
12/30/1899) and in the column “What day would you like to watch the movie?” there are times
(set to 12:00:00 AM). These related values appear inconsistent. Let’s see
if we can fix this by changing the data types
a. Click on the icon to the left of the “What showing time would you
like to attend?” column title and change its data type to “Time”
b. Change the data type for the column “What day would you like to
watch the movie?” to “Date”. Take a screenshot of both column
names and the first couple rows and observe that the 12/30/1899
and 12:00:00 AM times are gone.
6. Question 6: In the column “What movie would you like to see?”
a. Which of the seven data problems do you observe?
b. How many different titles can you find for “Spider-Man:Homecoming ”? (HINT: even if
they look the same, each entry on the text filter list is there because of a unique character
arrangement)
Oh, I figured out what was wrong with “Choose your ticket type” column – I can’t multiply those dollar
amounts by number of tickets to calculate sales. That’s not the only problem, either.
7. Question 7: In the column “Choose your ticket type”
a. Which of the seven data problems do you observe?
Extra Credit Even though there are a lot of data inconsistencies in the movie titles, our intern was able to use a
PivotTable to identify all the unique values and define one consistent value for each title. Power BI lets
you combine two tables like this by merging two queries.
If you follow the steps below, you should be able to see how good of a job the intern did (defined as the
percentage of rows that match the crosswalk).
1. Select “Merge Queries as New”
2. Set the second table drop-down to “TitleCrosswalk”
3. Click the column name “What movie would you like to see?” in the table “TicketTransactions”
4. Click the column name “TrimmedTitle” in the table “TitleCrosswalk”, and observe that 507 of the
605 rows (83.8%) will match if the operation is completed.
5. This isn’t good enough, so click “Cancel”
We can get the percentage of matches higher by manipulating the text in the colum “What movie would
you like to see?”
Trim If we use the “Trim” transformation and try to merge again, the matched rows rise to 547 out of 605
(90.4%)
UPPERCASE We can similarly use the UPPERCASE transformation on both the
“What movie would you like to see?” column in the TicketTransactions
table and the “TrimmedTitle” column of the TitleCrosswalk table. This
last transformation will create a duplicate in the TitleCrosswalk table,
so you’ll need to run the Remove Rows Remove Duplicates tool on
that table.
Final Merge After applying the Trim and UPPERCASE transformations as described
above, complete the final merge (change step 5 in the instructions
above to “OK” instead of “Cancel”), noting that all 605 records match.
When you do this, you’ll notice that the last column named
“TitleCrosswalk” looks strange. Click on the arrows to the right
of the column name and select just the “MovieTitle” column and
click “OK”.
Drag the MovieTitle column to the left until it’s next to the ALL
CAPS “What movie would you like to see?” column.
To earn extra credit, take a screenshot of the whole Power BI
window, making sure the Merge1 query, both title fields, and
some data values are visible.
Appendix A – Fix the DataSource.Error Did you receive an error similar to the one below? Not to worry, there are a couple options you can take
to work around this.
Option 1: Download the Boise State VPN (the same as faculty and staff) Use the instructions and links under
“Download and Install the Boise State VPN
Software” at
https://oit.boisestate.edu/network/vpn-
services/
Note: as a student in ITM310, a VPN account
has been created for you and you can use
the same software as faculty and staff, free
of charge
Once you have run the VPN, logged in with
your credentials and connected to the
campus network, you can continue.
Select one of the queries on the left-hand
list, then on the Home tab of the ribbon,
click “Refresh Preview” and the data should
populate
Select the other query on the left-hand list
and click “Refresh Preview” again
You should now be ready to start your
assignment
Unfortunately, an unforeseen technical
issue prevents students using their own
computer from connecting the Boise State-
hosted SQL Server database at this time
(even while on campus). While this is being
worked on by OIT, students who need to
use their home computer should follow the
instructions in “Appendix A Option 2:
Import Backup Data from Excel.”
Option 2: Import Backup data from Excel 1. Click the down arrow under “New Source” and choose Excel
2. Select the file ASGN-8-BackupData.xlsx (available on Blackboard)
3. Check the box next to both tables and click “Ok”
4. Navigate to the TitleCrosswalk (2) table, and on the Home tab of
the ribbon, in the “Transform” group, Click “Use First Row as
Headers”
5. [Optional] You can right-click the original “TicketTransactions” table
(the one with the “!” triangle next to it) and select “Delete” (repeat
for “TitleCrosswalk” )
6. You should now be ready to start the assignment
Appendix B – How to enter credentials in Power BI while on campus
This only works on the wireless internet connection EDUROAM (on any other wireless or wired network, a
VPN is required -- see Appendix A)
1. After opening up the Power BI file (ASGN-8MovieMess.pbix), click "Edit Queries"
2. When asked for credentials, enter your username and password as described and shown below.
a. Select "Windows" on the left
b. Click "Use alternate credentials" in the middle
c. in front of your username, type "boisestate\" (without the quotation marks)
d. Enter your password normally.
Unfortunately, an unforeseen technical
issue prevents students using their own
computer from connecting the Boise State-
hosted SQL Server database at this time
(even while on campus). While this is being
worked on by OIT, students who need to
use their home computer should follow the
instructions in “Appendix A Option 2:
Import Backup Data from Excel.”