Case assignment
1
Benford’s Law: An Overview
Updated Fall 2021
2
The Basics n Benford’s Law predicts the expected
digit frequencies in a list of numbers. n First digit has been emphasized, but it
can be used for all digits.
1 2 , 3 6 7
First digit
Benford’s Law can answer the question: What’s the likelihood of 1 being the 1st digit in a set of numbers?
3
The Basics n History
n Frank Benford, a physicist at GE, noted that the pages of logarithm table books* covering numbers with initial digits 1 and 2 were more worn and dirty that for the pages for 7, 8, and 9.
n Benford hypothesized that worn and dirty pages were that way because there were more numbers with low first digits.
n Benford forumulated the expected frequencies of the various digits in lists of numbers using a mathematical assumption based on geometric sequences.
*log tables were used to do large multiplication problems; since multiplication is converted to addition after taking the log of the factors.
4
The Basics n His findings: A higher probability of
lower digits in the 1st, 2nd, 3rd position, but the probabilities are much lower and closer together for about the 4th position onwards. n POINT: Digits are not necessarily
distributed uniformly over 0 to 9 (or 1 to 9 for the first digit).
5
Benford’s Law Formulae First digit n P(D1=d1) = log10(1 + (1/d1))
We’ll confine ourselves to the 1st digit in the case.
6
Benford’s Law Formulae Second digit
9
n P(D2=d2) = å log10(1 + (1/d1d2)) d1=1
Note that you sum over the 1st digit. If you want P(D2=1), you need to account for: 11, 21, 31, 41, 51, 61, 71, 81, 91.
7
Benford’s Law Formulae First 2 digits n P(D1D2=d1d2) = log10(1 + (1/d1d2))
Benford 1st Digit Proportions First Digit
Benford Proportion
1 0.30103000 [log10(2/1)] 2 0.17609126 [log10(3/2)] 3 0.12493874 [log10(4/3)] 4 0.09691001 [log10(5/4)] 5 0.07918125 [log10(6/5)] 6 0.06694679 [log10(7/6)] 7 0.05799195 [log10(8/7)] 8 0.05115252 [log10(9/8)] 9 0.04574749 [log10(10/9)]
8
Naïve Number Generators n Naïve persons tend to make up
numbers that do not conform to Benford’s Law.
n This includes naïve fraudsters --- this is the power of Benford’s Law.
n They tend to believe that 1st digits are distributed approximately uniformly (equal # of 1s, 2s, 3s, ….., 9s).
9
Real World Look at Naïve Number Generators
10
Naïve Number Generators n I’ve had Fraud Exam students create
fraudulent sales numbers: Pretend that you are a budding fraudster --- just for a moment, not permanently. You need to “boost” sales for the year to meet analysts’ expectations. Your real sales are $13,000,000 and you need to bump them up by about 1.5% ($200,000) to get to the expectation of about $13,200,000.
11
Student-Generated Fraud Numbers
Naïve Number Generators You have 2,031 legitimate sales transactions that range from $1,000 to $10,000 for the year. If you were to create 30 fictitious sales amounts to boost your revenue (about $200,000) what amounts would you create (list the 30 fictitious sales amounts you would record in the books and records to perpetrate the fraud)?
12
Student-Generated Fraud Numbers
13
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 4 5 6 7 8 9
Percent
First Digit
Student First Digit vs. Benford's Law
Class % Benford
Naïve uniform distribution (1/9th)
Benford’s Law Applied to Real World Phenomena
14
15
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 4 5 6 7 8 9
Pr op
or ti
on
First Digit
Illinois Cities 1st Digit Proportions versus Benford 1st Digit Proportions
Illin ois Citi es
Ben ford
Illinois Cities Populations (Source: U.S. Census Bureau, Census 2010) – Actual Application of Benford’s Law
c2 = 1.989 (equal distributions) M.A.D. = 0.000322; close conformity per Nigrini
16
Intuitive Explanation of Benford’s Law Use a census data example: n Consider a city with a population of 10,000. n Assume the population is growing at 10% per year. n At 10,000 the first digit 1 will persist until the
population doubles to 20,000 (7.27254 years) and the 1st digit goes to 2. n 19,999 à to à 20,000
n At 20,000 the first digit 2 will persist until the population goes up 50% to 30,000 (4.254164 years) and the 1st digit goes to 3. n 29,999 à to à 30,000
17
Intuitive Explanation of Benford’s Law Use a census data example: n At 30,000 the first digit 3 will persist
until the population increases by 1/3 to 40,000 (3.018377 years) and the 1st digit goes to 4.
n At 40,000 the first digit 4 will persist until the population increases by 1/4 to 50,000 (2.34124 years) and the 1st digit goes to 5.
n So on and so on --- next slide ……..
18
Intuitive Explanation of Benford’s Law Use a census data example: n 50,000 to 60,000: 1.91293 years. n 60,000 to 70,000: 1.61736 years. n 70,000 to 80,000: 1.40102 years. n 80,000 to 90,000: 1.23579 years. n 90,000 to 100,000: 1.10545 years.
19
From Steps to Benford Proportion From-To Steps (Yrs.) 10-20 7.27254 20-30 4.25416 30-40 3.01838 40-50 2.34124 50-60 1.91293 60-70 1.61736 70-80 1.40102 80-90 1.23579 90-100 1.10545 Totals 24.15886
(1,000s)
Going from 1 as the 1st digit to 1 as the 1st digit.
Math Fun. Check 24.15886 years to go from 10,000 to 100,000 at a 10% annual growth rate. 10,000x1.1y = 100,000 1.1y = 10 yLog(1.1) = Log(10) y = Log(10)/Log(1.1) y = 1/Log(1.1) y = 1/0.041392685 y = 24.15886 Bingo!
1+Growth Rate
20
From Steps to Benford Proportion From-To Steps (Yrs.) Proportion 10-20 7.27254❶ 0.30103 (❶ 7.27254 ÷ ❷ 24.15886) 20-30 4.25416 0.17609 30-40 3.01838 0.12494 40-50 2.34124 0.09691 50-60 1.91293 0.07918 60-70 1.61736 0.06695 70-80 1.40102 0.05799 80-90 1.23579 0.05115 90-100 1.10545 0.04576 Totals 24.15886❷ 1.00000
Look Familiar? Benford Proportions
(1,000s)
Going from 1 as the 1st digit to 1 as the 1st digit.
21
Steps - Years n How’d I get the number of years that it takes
for 10,000 to grow to 20,000 at a 10% annual growth rate?
n What do we know? 10,000 ´ 1.1y = 20,000; y=number of yrs.
n Solve for y: 1.1y = 20,000/10,000 = 2 n Log10(Xy) = yLog10X or yLog10(1.1)=Log10(2) n y = Log10(2)/Log10(1.1) = 0.30103/0.041393
= 7.27254 years.
Growth Rate
22
Intuitive Explanation of Benford’s Law-Example (10% Population Growth – 101 periods)
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 4 5 6 7 8 9
P ro po rt io n
First Digit
Intuitive Example n=101
Population
Benford
If we grew more periods, the proportions would converge to Benford proportions.
23
Intuitive Explanation of Benford’s Law Use a census data example: n Larger numbers progress on to other
numbers quicker than smaller numbers. n So, lower first digits progress on to higher
digits at a slower rate. n The progression is a geometric sequence.
n Every successive value is a fixed percentage increase over the value before it.
n Nature seems to favor geometric sequences. n Data that fits Benford’s Law are called
Benford Sets.
Look at the population example: It took the 1 7.27 yrs. to progress to 2, but it took the 9 only 1.1 yrs. to progress to 1 --- 6.6X faster.
That’s why there are more of them!
24
Proof of Stuff-1st Digit n Why are Benford’s proportions simply
the Log of the ratio of digit change (New/Old)? {From 10,000 to 20,000; Log10(20,000/10,000) = Log10(2) and Benford’s formula is Log10(1+1/1) = Log10(2)}
n Proof next.
Not that important; just for the mathematically curious.
25
Proof-1st Digit Digit D Log10(ratio) Benford’s Formula 1: 1à2: Log10(2/1)àLog10 (1+1/1)=Log10 (2) 2: 2à3: Log10 (3/2)àLog10 (1+1/2)=Log10 (1.5) 3: 3à4: Log10 (4/3)àLog10 (1+1/3)=Log10 (1.3333) 4: 4à5: Log10 (5/4)àLog10 (1+1/4)=Log10 (1.25)
¯ ¯ ¯ ¯ ¯ ¯ 9: 9à10: Log10 (10/9)àLog10 (1+1/9)=Log10 (1.1111)
Not that important; just for the mathematically curious.
26
Some Theory n Recent work (e.g., Hill 1996) has shown
that Benford Sets are, essentially, the distribution of all distributions. n Technically: If distributions are selected at
random, and random samples are taken from each of these distributions, the significant digits of the resulting collection will converge to the logarithmic (Benford) distribution.
27
Conforming to Benford’s Law n General Mathematical Rule: To conform
to Benford’s Law, the data, when ranked from smallest to largest, should approximate a geometric sequence.
n In theory no two numbers should be the same. n But small duplications (like in accounting
applications) should not invalidate the comparison to Benford’s Law.
28
Conforming to Benford’s Law n Auditors generally make a decision about
whether data sets are expected to conform to the Law based on the type of data being analyzed.
n If the data is expected to conform and it does not conform, then auditors have a reason to question whether the data is valid and free of major recurring errors.
29
Conforming to Benford’s Law n Practice has show that the following 4
criteria must be met for a data set to conform to Benford’s Law: 1. The data set should describe the sizes of
similar phenomena § Examples: population of towns (recall the real
world example), areas of lakes, heights of mountains, market values of companies on the NYSE, daily sales volume of companies on the NYSE, etc.
30
Conforming to Benford’s Law 2. There should be no built-in minimum or
maximum values in the data set. n Examples: broker with a minimum commission charge
of $50 --- a set of commissions would be biased toward numbers with a first digit of 5 and a second digit of 0.
3. The numbers should not be made up of assigned numbers --- numbers given to things in place of words. n Examples: SS#, bank acct. #, car license plate #, etc.
31
Conforming to Benford’s Law 4. The data set should have more small
items than big items. § This reflects the fact that in naturally
occurring data there are, for example, § more towns than cities, § more small companies than General Electrics, § more small lakes than big lakes, etc.
32
Conforming to Benford’s Law n Effect of sample size.
n Research has shown that the numbers in data sets should have 4 or more digits for a good fit with Benford’s Law.
n When numbers have less than 4 digits, there is a slight bias in favor of the lower digits.
n Generally, unless working with nothing but single-digit or double-digit numbers, the bias is not severe enough to merit an adjustment to the expected digit frequencies.
n A large data set is required for a close fit to a Benford distribution.
33
Conforming to Benford’s Law n Scale invariance.
n If a Benford Set is multiplied by a nonzero constant, the resultant set will also be a Benford Set.
n Pinkham 1961 showed that only the frequencies of a Benford Set have this property.
n This could be a problem for auditors – you may not find double-counting errors since its just a scaling by 2.
34
Conforming to Benford’s Law n Scale invariance.
n More problems for auditors: When manipulations of accounting data are systematic (a constant percentage increase of decrease), Benford’s Law will not help detect them.
n The same with systematic addition or subtraction of a constant.
But this type of manipulation is rare. As we have discussed, frauds are usually perpetrated/concentrated around period-end.
Goodness of Fit Tests
35
Goodness of Fit Tests n When you use Benford’s law you will
want to assess how well your data conforms to Benford’s law.
n First, you should graph your digit proportions against a graph of the Benford proportions – a picture is worth a 1,000 words.
36
Goodness of Fit Tests n Second, you should perform analytical
goodness-of-fit tests to determine if your data conforms to Benford’s law. Two goodness-of-fit tests are often used by practitioners of Benford’s law: n the mean absolute deviation (MAD) test
and n the chi-square test.
37
38
Mean Absolute Deviation (MAD) n Sum the absolute deviations
between the Benford proportion and the actual proportion (for each of the 9 digits). This is the Absolute Deviation.
n Divide the absolute sum by 9 (to get the Mean Absolute Deviation.
Dr. Frank’s favorite and the easiest to use! And I recommend you start with the MAD.
RECALL: This is for the 1st digit!
39
Mean Absolute Deviation (MAD) n Mark Nigrini (Digital Analysis Using Benford’s
Law, 2000, Global Audit Publications, pages 118-119) suggests the following guidelines for evaluating the 1st digit MAD: MAD Range Evaluation 0.000 to 0.004 Close conformity 0.004 to 0.008 Acceptable conformity 0.008 to 0.012 Marginally acceptable > 0.012 Nonconformity
40
MAD Intuituve Example Data Summary Digit Benford Data Abs. Dev.
1 0.30103 0.32673 0.02570 2 0.17609 0.16832 0.00777 3 0.12494 0.11881 0.00613 4 0.09691 0.09901 0.00210 5 0.07918 0.06931 0.00987 6 0.06695 0.05941 0.00754 7 0.05799 0.05941 0.00142 8 0.05115 0.05941 0.00826 9 0.04576 0.03960 0.00616
Total 0.07495 MAD 0.00833
Strong end of the “marginally acceptable” category
You use the actual digit proportions!
Chi-Square Test n Chi-square is a statistical test commonly
used to compare observed data with data we would expect to obtain according to a specific hypothesis. n The “specific hypothesis” used when using
Benford’s law is that the data conform to Benford’s law (i.e., the actual digit frequencies conform to the Benford proportions).
41
Chi-Square Test n The null hypothesis is that the two
distributions are the same (Benford). n In other words, the statistical null
hypothesis is that the number of observations in each category (first digit) is equal to that predicted by a theory (Benford’s law), and the alternative hypothesis is that the observed numbers are different from the expected.
42
Chi-Square Test n The null hypothesis is usually an
extrinsic hypothesis, where you knew the expected proportions before doing the experiment (determining the 1st digit proportions).
43
Chi-Square Test n You then use a mathematical
relationship, in this case the chi-square distribution, to estimate the probability of obtaining that value of the test statistic.
n If your calculated c2 value is less than the cut-off value, you would accept that your first digit distribution matches Benford’s law (you can’t reject the null).
44
45
Chi-Square Test 9
n Chi-Square = å (ACd1 – ECd1)2 / ECd1 d1=1
n Where: n ACd1 = Actual Count for digit d1 n ECd1 = Expected Count (Benford Proportion ´
Population count, n) for digit d1
RECALL: This is for the 1st digit!
46
Chi-Square Test n Get cut-off from Excel: =CHIINV(.05,8)=15.507
n CHIINV(significance level, degrees of freedom) n Degrees of freedom* = # digits – 1 (quick calc).
n In the Intuitive Example: c2=0.723 < 15.507, we can’t reject the null hypothesis that both distributions are the same (the population growth distribution is a Benford distribution).
*dof is technically (rows-1) x (columns-1). In our example: (9-1)x(2-1)=8x1=8
Chi-Square Test n If computed c2 > chi-square cut-off
(from CHIINV(prob., DoF)), then reject the null: There is a significant difference between the data sets that cannot be due to chance alone.
47
48
Chi-Square Population Benford Expected Count Actual Count (A-E) 2̂ divide E
1 0.3267327 0.30103 30.40403 33 6.73906 0.22165 2 0.1683168 0.17609 17.78509 17 0.616366 0.034656 3 0.1188119 0.12494 12.61894 12 0.383087 0.030358 4 0.0990099 0.09691 9.78791 10 0.044982 0.004596 5 0.0693069 0.07918 7.99718 7 0.994368 0.12434 6 0.0594059 0.06695 6.76195 6 0.580568 0.085858 7 0.0594059 0.05799 5.85699 6 0.020452 0.003492 8 0.0594059 0.05115 5.16615 6 0.695306 0.134589 9 0.039604 0.04576 4.62176 4 0.386585 0.083645
Computed Chi Sq. 0.723184
Chi Sq. Cut-off, 5% 15.50731
=CHIINV(prob.=0.05, DoF=8)
Benford Prop. × Population size. For 1: 0.30103 × 101 = 30.40403
Chi-Square n An alternative Excel chi-square test is
as follows: n CHISQ.TEST(Actual data
range,Expected data range). n CHISQ.TEST returns the probability that a
value of the c2 statistic at least as high as the value calculated by the c2 formula could have happened by chance under the assumption of independence.
49
Actual digit frequencies
Expected digit freq = Benford pop. × Population total
Chi-Square n You can relate CHIINV(prob., DoF) and
CHISQ.TEST(Actual data range,Expected data range) as follows:
n Use CHISQ.TEST to get a probability. n Now take that probability and put it in
CHIINV and it will return a c2 cut-off equal to the computed c2 value.
50
Chi-Square Test n A good review of chi-square goodness-
of-fit can be found at http://www.biostathandbook.com/chigo f.html (even though it’s in a biological science context).
51
Checked hyperlink 10/20/21, it’s ok.
52
Z-Stat & Confidence Interval n Z-Statistic
|po – pe| - (1/2n) Z = -----------------------
[(pe • (1 – pe)/n)]1/2 Where: pe=expected proportion (Benford digit proportion), po=observed proportion (actual digit proportion), n=number of observations
Continuity correction factor and is only used when it is smaller than the first term in the numerator.
53
Z-Stat & Confidence Interval n Z-Statistic 95% Confidence Interval Upper Bound = Pe + 1.96 •[(pe • (1 – pe)/n)]1/2 + 1/(2n)
Lower Bound = Pe – 1.96 •[(pe • (1 – pe)/n)]1/2 – 1/(2n)
54
Intuitive Explanation of Benford’s Law-Example
Intuitve Example n=100
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1 2 3 4 5 6 7 8 9
First Digit
Pr op
or tio
n UB LB
Population
Benford
Benford Case – XYZ Corp. n You can find the case instructions and
data on Compass2g: Module IIIC ® Benford’s Law Material: Readings + Lecture ® Benford Case
55
Benford CASE – XYZ Corp. n 7,320 sales amounts n Total sales $29,590,385 n Use Benford’s Law to check the
1st digit. n How do things look? Goodness-of-fit? n Fraud? n No fraud?
56
Deliverable à
Sales data in an Excel file in Compass2g.
Benford CASE – XYZ Corp. n Memo on your findings:
n Pictures (graph of XYZ vs. Benford). n Goodness of fit stats (at least chi-square
and M.A.D.) n Does the data fit? n What might your findings mean? What
now? n KISS: Keep It Short & Simple. n About 1 page of verbiage.
57
Benford CASE n Case is due BEFORE class on
Mon., 11/15/21. n You don’t need to use any
software other than Excel to do the case.
58
Excel Hints n Use left : =left(target cell,# left side
digits) n b1=left(a1,1) à returns the first digit of
the number in cell a1. n Use countif : =countif(date
range,”=criteria”). n =countif(b1:b7300,”=1”) à counts the 1s
in cells b1 thru b7300 (and for =2, =3, . . . , =9).
59
Assume your actual data is in column A (a1:a7300).