accounting

profileheyman
ATeachingCaseApplyingBenford_studentversion_032320232.docx

Applying Benford’s Law to Detect Credit Card Fraud Using Microsoft Excel

CASE INTRODUCTION

According to a Nilson Report published in December 2021, global credit and debit card frauds resulted in losses amounting to $28.58 billion in 2020[footnoteRef:1]. The collective losses to card issuers and merchants are expected to grow to $408.50 billion over the next ten years. Given the enormous amount of card transactions and potential loss to fraudulent activities, it is crucial to take advantage of emerging information technology and data analytics techniques to identify the transactions with higher fraud risks. [1: https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwj_ncPYh6f7AhXNF1kFHdbgDuwQFnoECBYQAQ&url=https%3A%2F%2Fnilsonreport.com%2Fupload%2Fcontent_promo%2FNilsonReport_Issue1209.pdf&usg=AOvVaw2drYIt3gN7wNwVE4J-N7LX]

You were recently hired by a commercial bank suffering from massive credit card losses. You learned through discussions with your coworkers that the fraud detection team had applied neural networks, decision trees, and other techniques to identify higher-risk transactions. However, your coworkers seemed to know little about Benford’s Law; thus, your supervisor wanted you to give a 10-minute crash course to demonstrate its power, applicability, and process.

While in school, you learned about Benford’s Law and how to use computer-aided auditing tools (CAATs) to efficiently and effectively identify suspicious transactions for further analysis. Without access to those specialized tools, you decided to use Microsoft Excel to show the process of applying Benford’s Law to fraud detection. It is important to note that Benford’s Law is more applicable to large and naturally occurring sets of numbers. While it is somewhat easy to explain why large datasets are needed for Benford’s Law to apply, you believe it is necessary and beneficial to use recent, eye-catching datasets to illustrate and exemplify “naturally occurring.” Thus, you went online to search for some publicly available datasets and found a dataset from the U.S. Center for Disease Control and Prevention (CDC) website . For planning your 10-minute crash course, you decided to include two parts. The first part, requirement 1-9, using a dataset from CDC,you will go through the data preparation process and the data analytics process to perform Benford’s Law. If you need helps in performing the required tasks, please refer to Appendix A for video tutorials. In the second part, you will use the credit card transaction dataset provided by your colleague and apply what you learn from the first part to credit card fraud detection.

CASE REQUIREMENTS

Objective 1 – Use Excel to extract and transform needed digits from datasets

You downloaded an Excel file from the U.S. Center for Disease Control and Prevention (CDC) website showing the number of confirmed COVID-19 cases per county on March 31, 2022 (see Excel worksheet titled “US Population by County”). Specifically, the dataset provides the name of the state and county and the following:

· county_population: County population (2019 Census estimate)

· health_service_area_number and health_service_area: A health service area is defined as one or more counties that are relatively self-contained with respect to the provision of routine hospital care. Each area is assigned a numeric code.

· covid_cases_per_100k: New COVID-19 cases per 100,000 population (7-day total)

Requirement 1 Microsoft Excel provides various text functions to extract characters from text strings. Use one of the following functions to show the first digit of the county population in Column G.

Extracting Texts from a Cell

=Left(cell, k)

Returns the first k characters in a text string

=Right(cell, k)

Returns the last k characters in a text string.

=Mid(cell, k, m)

Begins at character k of a text string and returns the next m characters.

For example, if the text YYYYMMDD locates in Cell A1 (e.g., 20150809), the following functions extract the necessary data and convert the data into date format: =Date(Left(A1,4),Mid(A1,5,2),Right(A1,2))

· Note that the Date function shows Year, Month, and Day

· Left(A1,4) extracts the first four letters from the left of Cell A1 as the Year

· Mid(A1,5,2) extracts two letters from the fifth character position of Cell A1 as the Month

· Right(A1,2) extracts two letters from the right of Cell A1 as the Day

Requirement 2 By default, Excel’s text functions store retrieved data as texts, even if they are numeric. To facilitate further data analyses and avoid potential issues, numeric data shall be stored as numbers ideally. For example, if you store numbers 1-10 as texts in a column and sort them, Excel will return 1, 10, 2, 3, …, and 9, which may result in unnecessary confusion and difficulties. To convert texts into numbers, you can create a new column or not. Use one of the following methods (or try all three methods as a practice) to convert texts in Column G into numbers without creating new columns.

Converting Texts into Numbers – Without Creating New Columns

1. Use the =Value() function Enclose the text function with =Value() to force the returned strings as values. For example, =Value(Right(A1,2)) extracts two letters from the right of Cell A1 and then converts the texts into values.

2. Forced calculations Texts are not calculatable. However, when Excel is forced to perform calculations on text, Excel automatically shows the results as numbers. For example, =Mid(A1,5,2)+0, =Mid(A1,5,2)-0, =Mid(A1,5,2)*1, =Mid(A1,5,2)/1, or =--Mid(A1,5,2) will first extract two letters from the fifth character position of Cell A1 and then convert the results as numbers, because of +0, -1, *1, /1, or double negative (--).

3. Use Paste Special and Multiply

· Select a blank cell beyond the scope of the dataset, type the number 1 into it, and then press Enter.

· Press CTRL + C to copy the cell.

· Select the cells that have numbers stored as text.

· On the Home tab, click Paste > Paste Special.

· Click Multiply, and then click OK. Excel multiplies each cell by 1, and in doing so, converts the text to numbers.

Objective 2 – Create a PivotTable to tabulate the first digit of county populations

Requirement 3 Create a PivotTable to tabulate the frequencies of the first digit of county populations. Note that Benford’s First-Digit Law applies to the leading digits from 1 to 9, not including zero. Thus, if your raw data contain numbers starting with zero, you must filter out the occurrence and frequencies of those records.

Requirement 4 Append a new column to your PivotTable and then enter =log(1+(1/cell)) to show the expected frequencies based on Benford’s First-Digit Law. Your results for Requirements 3 and 4 should be similar to the following table.

Objective 3 – Create a combo chart to contrast the actual vs. expected frequencies

Microsoft Excel treats all aggregated results within a PivotTable as a whole and refreshes them when users change its design or modify the underlying dataset. If you plan to use a PivotTable and other appended columns to create data visualization, it might be easier to copy the entire table and paste it somewhere else as values.

Numerical proportions are frequently visualized via various formats. Each format possesses particular strengths and weaknesses and thus is ideal for different purposes. You have created a PivotTable to display the actual and expected frequencies of each leading digit. Your goal is to show if the actual frequencies of the leading digits conform with their expected frequencies (based on Benford’s Law).

Requirement 5 One of the most popular and traditional data visualizations is the pie chart, which divides a circle into slices to illustrate numerical proportion. In a new worksheet titled “Pies,” create two pie charts showing each leading digit's actual and expected frequencies. Can you easily identify the leading digits where their actual frequency deviates from their expected frequency? Why or why not?

Requirement 6 The 100% stacked column chart shows the relative percentage of multiple data series in stacked columns, where the total of stacked columns always equals 100%. In a new worksheet titled “Stacked,” create a 100% stacked column chart with two bars showing each leading digit's actual and expected frequencies. Compared with two pie charts, can you quickly identify the leading digits where their actual frequency deviates from their expected frequency? Why or why not?

Requirement 7 The combo chart combines two or more charts to contrast data series. In a new worksheet titled “combo,” create a combo chart where actual frequencies are shown as bars and expected frequencies as a line. Your combo chart should look similar to the following graph. Can you quickly identify the leading digits where their actual frequency deviates from their expected frequency? Based on the visualizations you create for Requirements 5, 6, and 7, which visualization do you think is the best to visualize and identify anomalies? When should each chart be used for data visualization?

Chart, bar chart  Description automatically generated

Objective 4 – Understand “naturally occurring” numbers by using various examples

Requirement 8 After creating a combo chart above, you believe you have found an example of a “naturally occurring” dataset that meets Benford’s First-Digit Law. Based on how the I.D. numbers of U.S. health service areas were created, you believe their first digits would not follow Benford’s Law. In a new worksheet titled “I.D.,” create a PivotTable and a combo chart to verify your answers.

Requirement 9 The World Health Organization collects and provides COVID-19 data reported by 185 countries. You recently read an article about an empirical study analyzing those data. The research finds that COVID-19 data reported by countries with more functional democracies, higher incomes, and better healthcare systems follow Benford’s Law. On the other hand, data from autocratic regimes and less developed countries should be treated with more caution. You then see the column showing the number of confirmed COVID-19 cases in each U.S. county. Do you expect their first digits to follow Benford’s Law? Create a combo chart to verify your answers.

Objective 5 – Learn how Benford’s Law contributes to detecting suspicious transactions

To help you prepare for the crash course, your coworker provided a large, validated dataset of prior credit card transactions. Specifically, the dataset has 284,807 records, where –

· Time: The seconds elapsed between each transaction and the first transaction in the dataset

· Amount: The amount of each credit transaction

· Class: The result of prior validation, where 1= fraudulent and 0 = not.

The validated dataset has 284,807 records with 492 confirmed fraudulent credit card transactions. If an auditor uses simple random sampling, the probability of the audit sample containing fraudulent transactions is 492/284,807 or 0.17% - more like finding a needle in a haystack and prone to being overlooked. So, after using the recent and eye-catching COVID-19 data to draw your coworkers’ attention, your next task is to demonstrate how Benford’s Law can be used to identify suspicious transactions efficiently.

Requirement 10 In a new worksheet titled “CC,” create a data visualization using the actual credit card dataset and Benford’s First Digit Law. Based on your results, which leading digits should be investigated further? To demonstrate the benefits of Benford’s Law, you then compile a table to show the statistics for fraudulent vs. non-fraudulent. Examine the table carefully and itemize the benefits of applying Benford’s Law to fraud detection.

Requirement 11 Benford’s First-Digit Law only applies to leading numbers of 1-9. Thus, 16,804 records with zero as their leading digit were excluded from the sampling plan above. To alleviate your coworkers’ potential concerns, in a new worksheet titled “zero,” you create a PivotTable to show each leading digit's count, sum, and average transaction amounts of each leading digit. Consider the following questions when you respond to your coworkers’ potential concerns.

· What is the average amount for transactions starting with zero?

· What are the leading digits showing abnormally low average transaction amounts?

· Should you worry about bypassing transactions with zero as their leading digits?

Deliverable: Based on your analyses and findings above, prepare a one-page memo summarizing the strength, applicability, and process of using Benford’s Law. Submit your Excel and Word files to your instructor before the deadline.

Appendix A

Detect Deviation from the Benford’s Law Video List

The videos are available at

https://youtube.com/playlist?list=PLXTR-x25DxtOy8MLRvOujE9ZCz2-pofVB

First Objective (2 videos)

Extracting characters from text strings – 1 min 23 secs - This video demonstrates how to extract certain length of characters, with a specified starting position, from text strings.

Converting text characters to numbers – 1 min 42 secs - This video demonstrates how to convert extracted texts to numbers without creating any new column.

Second Objective (1 video)

Creating a PivotTable to show the frequency of a series of unique categories – 1 min 48 secs - This video demonstrates how to create a PivotTable to show the frequencies of a series of unique categories of a dataset.

Third Objective (1 video)

Creating a pie/100% stacked column/combo chart – 1 min 15 secs - This video demonstrates how to create a pie chart, 100% stacked column chart, and combo chart.

Fifth Objective (1 video)

Creating a PivotTable to show the average, minimum, and maximum amounts of a common attribute – 2 mins 10 secs - This video demonstrates how to create a PivotTable to tabulate the count, sum, and average amounts of a common attribute of a series of unique categories of a dataset.

image1.emf

image2.png