DQs
Journal of Accounting, Auditing & Finance
2015, Vol. 30(4) 541–557 �The Author(s) 2015
Reprints and permissions: sagepub.com/journalsPermissions.nav
DOI: 10.1177/0148558X15584051 jaf.sagepub.com
Persistent Patterns in Stock Returns, Stock Volumes, and Accounting Data in the U.S. Capital Markets
Mark J. Nigrini 1
Abstract
Benford’s Law gives the expected frequencies of the digits in tabulated data. The expected frequencies show a large bias toward the low digits. An analysis of the Center for Research in Security Prices (CRSP) data shows that the daily returns have a near-perfect fit to Benford’s Law. The daily volumes also have a close fit to Benford’s Law but there are devia- tions due to round lot trading and the fact that some of the data are rounded to the near- est hundred. An analysis of Compustat data also shows a close fit to Benford’s Law with some explainable deviations. The expected returns and the abnormal returns used in event studies over an extended period showed that these numbers also conformed to Benford’s Law. Recent studies have divided a population into subsets and then tested the subsets for conformity to Benford’s Law. The conclusions are that the subsets with the weakest fit to Benford were fraudulent. The problems with this approach are discussed, and these include statistical considerations, issues with using Compustat data, other plausible explanations for a lack of conformity, and the fact that there is no clear link between a change in the leading digit of a number and the materiality of the dollar value of the change.
Keywords
Benford’s Law, stock returns, stock volumes, fraud detection, event studies
Introduction
Ball and Brown (1968) showed that there was a relationship between stock price changes
and the information contained in earnings reports. A few years earlier, Fama (1965a,
1965b) made the conceptual breakthrough of framing the random walks of stock prices as a
function of information flows. The random walk phrase was popularized by Malkiel
(1973). Google Scholar shows that these four studies have been cited more than 15,000
times confirming that the topics of stock prices and accounting data taken by themselves or
seen together are important areas of study. The objective of this study is to show that there
is the same consistent, persistent, and interesting pattern in the random walk of stock
prices, the stock volumes associated with those same stock prices, the expected and
1 West Virginia University, Morgantown, USA
Corresponding Author:
Mark J. Nigrini, College of Business and Economics, West Virginia University, Morgantown, WV 26506, USA.
Email: [email protected]
Tracks Paper
abnormal returns calculated in event studies, and in the numbers shown in earnings reports.
This regularity, namely, Benford’s Law, has also been seen in naturally occurring earth sci-
ence data.
In response to the growing number of studies that use Benford’s Law to identify finan-
cial statement fraud and economic statistics fraud, this study also shows that there are other
possible non-fraud explanations for nonconformity to Benford’s Law. In addition to non-
fraud explanations, there are methodological issues with using the first digits to identify
manipulation, and statistical issues with using Compustat data.
Benford’s Law
Benford (1938) hypothesized that more real-world numbers started with 1s and 2s than
started with 8s or 9s. He analyzed the first digits of numbers from diverse sources (such as
the drainage areas of rivers, scientific constants, and population counts), and his results
showed that 1 was the first digit 30.6% of the time and that 2 was the first digit 18.5% of
the time. A positive number x can be written as S(x) 3 10 k , where S(x) 2 [1, 10) is the sig-
nificand and k is an integer (called the exponent). For example, the number 1,964 can be
written as 1.964 3 10 3 . The integer part of the significand is the first digit. Zero, by defini-
tion, is inadmissible as a first digit. Benford made the assumption that the ordered values
of a data set form a geometric sequence, and using calculus he developed the expected fre-
quencies of the digits in tabulated data. The formulas are shown below, with D1 represent-
ing the first digit and D1D2 representing the first-two digits of a number:
Prob D1 = d1ð Þ= log 1 + 1
d1
� � d1 2 1, 2, . . . , 9f g, ð1Þ
Prob D1D2 = d1d2ð Þ= log 1 + 1
d1d2
� � d1d2 2 10, 11, 12, . . . , 99f g, ð2Þ
where Prob indicates the probability of observing the event in parentheses, and log refers
to the common logarithm. For example, the probability of the first-two digits being 19 is
.0223 (log(1 + 1/19)). For the first-two digits there is a large bias toward the lower digits (1, 2, and 3). From
the third digit onward, the probabilities are close to being uniform at .10 for any of the pos-
sible 10 digits, 0 to 9.
The basis of Benford’s Law is that the mantissas of the logs of the numbers are uni-
formly distributed. For example, the logarithm of 1,964 is 3.2931 and the mantissa is the
0.2931 fractional part and the characteristic is the integer value of 3. Leemis, Schmeiser,
and Evans (‘‘LSE,’’ 2000) state that if W is uniformly distributed U(a, b), where a and b
are real numbers with a \ b, and if the interval (10a, 10b) covers an integer number of orders of magnitude, then the first digit of the random variable T = 10
W satisfies Benford’s
Law (‘‘Benford’’) exactly. They presumably meant that the distribution of the digits of the
possible values of T would conform to Benford. So if b 2 a is an integer, and if the loga-
rithms are uniformly distributed, then the exponentiated numbers (10 W
) will conform to
Benford. When the logs of the numbers are uniformly distributed, the numbers themselves
will form a perfect, or a near-perfect, geometric sequence of the form,
542 Journal of Accounting, Auditing & Finance
Sn = ar n�1, ð3Þ
where a is the first element of the sequence and r is the ratio of the (n + 1)th element divided by the nth element. A geometric sequence with N elements will have n spanning
the range [1, 2, 3, . . ., N].
Benford’s Law has some interesting properties. The scale-invariance property
(Pinkham, 1961) states that if the numbers in a Benford Set (a set of numbers that con-
forms to Benford’s Law) were all multiplied by a (nonzero) constant, then the new data
set would also be a Benford Set. The implication is that if Benford’s Law applies to stock
market or accounting data, then it should do so regardless of the source currency. The
law is also base invariant, which means that if the numbers in a Benford Set were con-
verted to (say) base 8 (where 1,964 becomes 3,654) and if the expected probabilities in
Equations 1 and 2 were recalculated, then the base 8 numbers will conform to the base 8
probabilities (Berger & Hill, 2015). The law is also power invariant in that if each
numeric value in a Benford Set is raised to a power in the sequence {1.5, 2.0, 2.5,
3.0, . . .}, then the new data set would also be a Benford Set. This is a variation on the
scale-invariance property.
Nigrini and Miller (2007) analyzed streamflow records for 140 years. Their data con-
formed almost perfectly to Benford using the Mean Absolute Deviation (MAD) as the con-
formity measure. The formula is shown in Equation 4:
Mean absolute deviation =
PK i = 1
AP �EPj j
K , ð4Þ
where EP denotes the expected proportion, AP the actual proportion, and K represents the
number of bins (which equals 90 for the first-two digits).
The streamflow MAD of 0.00013 meant that there were only small differences between
the actual proportions and the Benford proportions. There is no measure of significance for
the MAD, but Nigrini (2011) contains a table that states that MAD values from 0 to 0.0012
constitute a close conformity to Benford.
Stock Returns
Ley (1996) analyzed the daily returns of the Dow Jones Industrial Average (DJIA) from
1900 to 1993 and the daily returns of the S&P index from 1926 to 1993. The first digits
showed a reasonable conformity to Benford. The MAD values were 0.0047 and
0.0043, which constitute a close conformity result. Rodriguez (2004) analyzed capital
market data, and his results showed that the annual rates of return (from the Ibbotson
stocks, bonds, bills, and inflation data) conformed to Benford, but with only 76 records in
the data set, the chi-square test is somewhat forgiving. His results also showed that the
daily returns of the DJIA did not conform to Benford.
The daily returns of security issues in the Center for Research in Security Prices (CRSP)
database were analyzed. The data used were the Stock/Security Files in the Annual Update
file. The options selected were as follows:
Date range: 1/1/2000 to 12/31/2013.
Company codes: ‘‘Search the entire database.’’
Nigrini 543
Conditional statements: Share Code (shrcd) \ 20. Time series information: Price, Holding Period Return.
The query produced a table with 20,174,725 records. The stock returns are reported to six
decimal places. Records with an absolute value less than 0.00001 were deleted because
values from 0.000001 to 0.000009 do not have explicit first-two digits. A number recorded
as 0.000008 could be any number from 0.00000750 to 0.00000849. Some 1.15 million
returns were equal to zero, 3,200 returns were values from 0.000001 to 0.000009, and
360,000 (American Stock Exchange [AMEX]) returns were missing (null). This left N =
18,667,795. The first-two digits test was first used by Nigrini and Mittermaier (1997) because
it is more informative than the first digits test. The results are shown in Panel A of Figure 1.
The monotonically decreasing line in Panel A represents the expected proportions of
Benford’s Law. The Benford proportions start at 0.0414 at 10 on the x axis and decrease
steadily to a low of 0.0044. The bars show the actual proportions and the bar at 50 indi-
cates that the actual proportion for 50 was 0.0098. The first-two digits have a close confor-
mity to Benford with a MAD of 0.00046. There are small visible spikes (excesses) at 20
and 25 and a slight tendency for systematic spikes at the multiples of 10. The absolute
daily returns that occurred most often were 0.04, 0.05, 0.025, 0.041667, 0.033333,
0.066667, and 0.047619. Each of these values occurred between 25,000 and 29,000 times.
With 18.7 million records, the spikes caused by these number duplications were small.
A plot of the ordered logs of the daily returns is shown in Panel B of Figure 1. Daily
returns present some issues when it comes to an analysis of the logs because the log of a
negative number is undefined. This issue was solved by taking the logs of the absolute
values of the daily return. The ‘‘log’’ of 20.01 was calculated to be 22.00, and so 22.00
was the ‘‘log’’ of both 20.01 and 0.01. The graph of the ordered logs is upward sloping as
would be expected from ordered values. This graph has the same shape as the streamflow
graph in Nigrini and Miller (2007). The digit patterns (and the log patterns) of the daily
returns are the same patterns that were observed in the earth science data.
The next step in the analysis was the preparation of a histogram of the returns. The his-
togram was based on 19,819,978 returns after the deletion of the null values and is shown
in Figure 2.
Figure 1. First-two digits and ordered logs of daily returns. Note. Panel A shows the line of Benford’s Law and the actual proportions as bars of the first-two digits of the daily
returns, and Panel B shows the ordered logs of the daily returns over the same period.
544 Journal of Accounting, Auditing & Finance
The histogram shows 125 intervals (with an interval width of 0.0016) from 20.10 to
+ 0.10. The large spike in the center of the graph is the [0.0000, 0.0016) interval with a proportion of 0.086. The proportion of returns that were negative was 0.4741, whereas the
proportion of returns that were positive was 0.4680, with slightly less than 6% of the
returns being equal to exactly 0. The median return was 0.
The data have a near-perfect fit to Benford. Berger and Hill (2015) note that
none of the familiar classical probability distributions or random variables, such as e.g. normal,
uniform, exponential, beta, binomial, or gamma distributions are Benford. Specifically, no uni-
form distribution is even close to Benford, no matter how large its range or how it is centered.
(p. 36)
They also note that an exponential distribution with a mean equal to 1 comes close to
being Benford, and that ‘‘a log-normal random variable with large variance (compared with
its mean) is practically indistinguishable from a Benford random variable.’’
The histogram was analyzed using the curve fitting function of the software package
TableCurve 2D. The best fitting density function was the Pearson IV distribution. This dis-
tribution was ignored because the pdf is complex and can fit almost any set of continuous
data. The best fit from the familiar distributions was the Cauchy distribution with an r 2
of
.872. The fitted Cauchy distribution is the line shown in Figure 2, which is a simple, sym-
metric, unimodal shape that is similar in shape to the standard normal pdf.
Stock Volumes
The query using the stock returns options returned 20,174,725 records. Daily volumes less
than 10 shares for the day were deleted because the integers from 1 to 9 do not have
Figure 2. Daily stock returns and a fitted Cauchy distribution. Note. A histogram of the ordinary share stock returns and a fitted curve from the Cauchy distribution.
Nigrini 545
explicit first-two digits leaving 19,120,349 records. The digit patterns are shown in Panel A
of Figure 3.
The digits of the daily volumes show a close conformity with a MAD of 0.00070. MADs
less than 0.0012 qualify as being close conformity. There are systematic spikes at the multi-
ples of 10 (10, 20, . . ., 90). A review of the number frequencies shows that the daily
volumes that occurred most often were 100, 200, 500, 1,000, 300, 400, and 600. It seems
that the high frequency volumes are amounts that are the result of investors avoiding odd-lot
trading. The odd-lot premium has almost disappeared in recent years. The CRSP documenta-
tion also reports that ‘‘our source for the NYSE/AMEX reports the numbers rounded to the
nearest hundred.’’ The systematic spikes are there because of a tendency to trade in multiples
of 100 and the fact that New York Stock Exchange (NYSE) reports rounded numbers.
The ordered (ranked from smallest to largest) logs of the stock volumes are shown in
Panel B of Figure 3. The line is either upward sloping or (as can be seen at y = 2) it has a
short segment with a zero slope. This happens when a number (such as 100 which is 10 2 )
has a high enough frequency to cause a visible horizontal segment. This shape is similar to
the ordered log pattern for the streamflow data in Nigrini and Miller (2007).
Accounting Data
Prior studies have analyzed accounting data for conformity to Benford. The dollar amounts
of the invoices approved for payment by a NYSE-listed company were analyzed in Nigrini
and Mittermaier (1997), and the dollar amounts of the invoices approved for payment by a
software company were analyzed in Drake and Nigrini (2000). Carslaw (1988), Thomas
(1989), and Nigrini (2005) analyzed earnings releases for signs of rounding up around psy-
chological reference points (such as $100 million).
The Compustat data used were the Fundamentals Annual section of the North America/
Monthly Updates. The options selected were all fiscal years from 2000 to 2013 and All 382
Balance Sheet items, 328 Income Statement items, 66 Cash Flow items, and 114
Miscellaneous items.
The data were downloaded in four groups. Each group had 157,000 firm years.
Companies do not use every item in every group. Some fields were used by 130,000 firms,
whereas others were unused or were used by fewer than 1,000 firms. The digits of the
accounting data are shown in Figure 4.
Figure 3. First-two digits and ordered logs of daily stock volumes. Note. Panel A shows the first-two digits of the daily volumes and Panel B shows the ordered logs of the daily volumes.
546 Journal of Accounting, Auditing & Finance
Compustat report amounts in millions to three decimal places. Amounts from 0.001 to
0.009 were deleted because these amounts do not have an explicit second digit. An amount
reported as 0.002 could be any number from 0.00150 to 0.00249. The Compustat data have
MADs ranging from 0.00037 for the Cash Flow items to 0.00166 for the Income Statement
items. If the data are aggregated (N = 19,189,396), the result is MAD of 0.00085 which is
comfortably below the 0.0012 upper bound for close conformity. A plot of the ordered logs
shows the same patterns as can be seen in Figures 1 and 3.
A number duplications test shows that the following 12 amounts occurred most often:
Figure 4. First-two digits of Compustat data. Note. The graphs show the first-two digits of the Compustat financial statement items.
All Compustat Income Statement
Amount Count Amount (US$) Count
0.010 179,131 0.01 123,296 0.020 97,617 0.02 83,069 0.030 76,473 0.03 64,479 0.040 62,110 0.04 51,857 0.050 57,298 0.05 44,771 0.100 53,114 0.06 38,348 0.060 46,926 0.07 33,070 0.070 40,388 0.08 29,869 1.000 39,429 0.10 29,858 0.080 37,100 0.09 26,917 0.090 33,567 0.11 22,198 0.200 31,903 0.12 21,520
Nigrini 547
The Income Statement items have disproportionately high counts for the 0.01 to 0.10
numbers. These numbers caused the systematic spikes at 10, 20, 30, . . . , 90. A review of
the items with high 0.01 to 0.10 counts shows that these items are all related to various
Earnings Per Share (EPS) calculations or items showing the EPS effect of transactions
(e.g., GLEPS Gain/Loss Basic EPS Effect). These EPS items caused the Income Statement
to have the largest MAD.
Abnormal Stock Returns
Event studies are used in accounting and finance to study the impact of an event on the
daily (or monthly) return of a security. Events could include the promulgation of a new
accounting rule, a merger, or a dividend announcement. These studies partition the return
into the part due to the new news (the event) and the part due to macroeconomic news.
The part of the daily return that is due to the event is called the abnormal return.
The first step in this analysis was to delete the records for stocks that did not have daily
returns for the full 14-year period from 2000 to 2013 (both years inclusive). Including
stocks with random within-period starting or ending dates would have added a layer of
complexity to the programming that might have introduced errors in the results. The 14-
year period had 3,521 trading days. There were 2,405 stock issues with complete daily
return data.
The next step was to delete firms with low trading volumes. The calculation of abnormal
returns is complicated by non-synchronous trading, which occurs when stocks do not trade
all the way to the closing bell. To reduce this source of noise, stocks were deleted that had
more than 250 days with trading volumes of 100 shares or less in the period. This deletion
reduced the number of firms by 270, which left 2,135 firms each with 3,521 daily returns
for 14 years (N = 7,517,335).
For each stock issue, the first 250 days (essentially the calendar year 2000) was used as
the estimation period. Abnormal returns were calculated for all companies for Day 251.
Thereafter, the returns for Day 2 to Day 251 were used to calculate an abnormal return for
all companies for Day 252 and so on. With 3,521 days and a moving 250-day estimation
window, each company had 3,271 abnormal returns. The expected return in each case was
calculated as follows:
E R0jRM0ð Þ= Intercept + Slope3RM0, ð5Þ
where R is the return for the firm, RM is the return for the market, and the parameters
Intercept and Slope are related to the linear structure of the market model.
The abnormal return is the excess of the actual return, R0, over the expected return cal-
culated using Equation 5. There were 6,983,585 abnormal returns (2,135 firms with abnor-
mal returns for 3,271 days).
The expected return results are shown in Panel A of Figure 5. There is a near-perfect
conformity to Benford with a MAD of just 0.00032. The actual proportions marginally
exceed the Benford proportions in the 10 to 15 range, and the opposite occurs in the 16 to
47 range. The results are consistent, in that both the actual returns (with a MAD of 0.00046
in Figure 1) that were used to calculate the expected returns and the expected returns them-
selves conformed to Benford.
548 Journal of Accounting, Auditing & Finance
A histogram of the expected returns is shown in Panel B of Figure 5. Once again, the
Cauchy distribution provides a good fit to the data with an r 2
of .9804. The digits of the
abnormal returns are shown in Panel A of Figure 6.
The abnormal returns in Figure 6 have a remarkably close conformity to Benford with a
MAD of 0.00044. The actual proportions are almost perfectly monotonically decreasing.
A histogram of the abnormal returns is shown in Panel B. The histogram is slightly posi-
tively skewed, presumably because returns have no limit on the upside. The Cauchy distri-
bution provides a close fit with an r 2
of .9994. These results suggest that researchers could
test their abnormal returns against Benford, and that nonconformity might indicate a sys-
tematic error in the calculations. However, conformity to Benford does not mean that the
sample is complete (a random sample of a Benford Set should give a Benford Set), it also
does not confirm that the researcher has correctly identified the event dates, and it also
does not mean that the researcher has chosen the ‘‘best’’ stock index for the regression
models. Benford is not a guarantor of the validity of the study.
The tests were repeated using the CRSP Equal-Weighted (EW) index, and the results
(not shown) were nearly identical. A random index was simulated with a range of returns
uniformly distributed (20.10, + 0.10). The expected returns and the abnormal returns were calculated for each of the stocks in the same way as was done previously. The results
Figure 5. First-two digits and histogram of expected returns. Note. Panel A shows the first-two digits of the expected returns and Panel B shows a histogram of the expected
returns.
Figure 6. First-two digits and histogram of abnormal returns. Note. Panel A shows the first-two digits and Panel B shows a histogram of the abnormal returns.
Nigrini 549
were surprising, in that the expected returns showed just a small increase in the MAD. The
fictional abnormal returns also had a close conformity to Benford with a MAD of just
0.00049, which is close to the previous results. Benford can therefore not be used to test
the accuracy of the market returns used in event studies.
The Benford–Cauchy fits in Figures 2, 5, and 6 indicate that there is a relationship
between the simple, symmetric, unimodal Cauchy distribution and conformity to Benford.
This Benford–Cauchy relationship was first documented by Rodriguez (2004).
Analysis of Data Subsets
Recent studies have divided economic populations into subsets. The conclusions were that
the worst fits to Benford were fraudulent. For example, Rauch, Göttsche, Brähler, and
Engel (‘‘RGBE,’’ 2011) analyze macroeconomic data for 16 countries for 11 years with
130 records per year for each country. The Greek data had the worst conformity to
Benford. They conclude that as data issues were identified by the European Commission,
this confirmed the effectiveness of Benford as a detector of such manipulations. They jus-
tify their subset approach by noting that ‘‘it is sufficient that the conditional probability of
a Benford distribution is higher for non-manipulated data than for manipulated data.’’
The subset rankings depend, in part, on which reported amounts were included in the
analysis, on the chosen conformity measure, and on the time period analyzed. Also, willful
manipulation would mean that some data categories would be susceptible to overstatement
and others susceptible to an understatement. A country might want to understate its debt
and overstate some categories of social spending. With a small sample of highly aggregated
data, and with incentives to inflate some numbers and to deflate others, it is not clear that
conformity to Benford’s Law has any relationship to fraudulent manipulation. For example,
nonconformity could be caused by an expenditure (or income) line item that starts at (say)
9,000 and ends at 9,800. The amounts infuse the data with extra first digit 9s that inflate
the chi-square statistics. If the numbers had started at (say) 10,000 and had ended at
10,900, then the series would infuse the data with extra first digit 1s, but as 1s have a high
expected count, the effect on the chi-square statistic is muted.
The RGBE results show that Belgium, Austria, Ireland, and Finland had similar poor fits
to Benford’s Law, while Portugal had the second best fit to Benford. Table 1 shows the
RGBE rankings and adds the rank for each of the countries for the Institutions portion of
the Global Competiveness Index (World Economic Forum, 2010 at www.weforum.org).
The Institutional rankings in Table 1 take into account the quality of the government’s
management of public finances and the administrative framework within which everyone
interacts to generate income and wealth in the economy. This presumably includes the
quality and accuracy of government statistics. The Spearman Rank Correlation coefficient
is .082, which is an insignificant (p = .762) correlation. The lack of any link between
Benford and fraud was confirmed by Gonzalez-Garcia and Pastor (2009) who tested confor-
mity to Benford against the Reports of the Observance of Standard and Codes (see http://
www.imf.org/external/NP/rosc/rosc.aspx) of the International Monetary Fund (IMF). They
found that macroeconomic data as a whole conform to Benford’s Law but that the
conformity of various subsets was not a reliable indicator of data quality. There was no
‘‘pattern of consistency’’ between conformity to Benford and the data quality ratings in the
IMF’s Reports. They concluded that nonconformity did not reliably signal poor quality
macroeconomic data.
550 Journal of Accounting, Auditing & Finance
Other studies have copied their subset approach. Google Scholar (http://
scholar.google.com) reports that the RGBE paper has 57 citations. Rauch, Göttsche, and
Langenegger (2014) analyzed military expenditures reported to the United Nations Office
for Disarmament Affairs. They concluded that the United States and the United Kingdom
have the lowest quality military data because they had the worst conformity to Benford.
The best fits to Benford were for Romania and Russia.
To show that factors other than fraud can affect the conformity of subsets, the CRSP
subsets were analyzed as if the test was meant to detect ‘‘CRSP fraud.’’ The stock returns
of individual companies were analyzed to see which stock price patterns generated the larg-
est deviations from Benford. The stock prices of the four companies with the highest
MADs are shown in Panels A to D of Figure 7.
The companies in Panels A to D are all mutual funds with large holdings of fixed
income securities. Their stock prices were relatively stable from 2000 to the financial crisis
of 2008 when they showed only small declines compared with the rest of the market. The
stocks showed small steady price gains for 2011, a year when the overall market treaded
water. The results show that long-term price stability with small daily changes is one condi-
tion that produces large MADs for the stock returns. A histogram of the stock returns (not
shown) shows that the daily returns of these stocks are less dispersed than the conforming
returns (shown in Panels E and F). A review of the first-two digits of the daily returns for
the AllianceBernstein Income Fund (ACG; the highest MAD) shows large spikes at 12, 24,
and 36. These spikes are caused by the 0.01 changes with a stock price of $8.00 (a daily
return of 0.00125), the 0.02 changes with a stock price marginally above $8.00, and also
the 0.03 changes which are responsible for the spike at 36. The spikes are the result of
0.01, 0.02, and 0.03 changes for a stock price that is stuck just above $8.00.
The Nuveen Municipal Value Fund (NUV) result in Panel B is also due to a low level
of volatility. The first-two digits showed a large spike at 10, and a review of the data
Table 1. Conformity to Benford and Competitiveness Rankings.
Country RGBE Institutions
Netherlands 1 12 Portugal 2 54 Luxembourg 3 9 Malta 4 34 France 5 26 Spain 6 53 Slovenia 7 50 Cyprus 8 30 Italy 9 92 Germany 10 13 Slovakia 11 89 Finland 12 4 Ireland 13 24 Austria 14 15 Belgium 15 29 Greece 16 84
Note. The table shows the conformity to Benford rankings of RGBE and the Institutional Competiveness rankings
of the World Economic Forum. RGBE = Rauch, Göttsche, Brähler, and Engel.
Nigrini 551
showed that these were caused by many 0.01 changes with a stock price marginally above
9.00 (e.g., 0.01/9.20). In a sharp contrast to the low volatility of the Panel A-D stocks, the
two stocks on Panels E and F have a high volatility as can be seen by looking at the stock
price ranges (Juniper Networks from $4.43 to $344.00 and Citigroup from $1.02 to
$77.44). The best company MAD of 0.0012 exceeds the average MAD of 0.00046. The
deviations from Benford in the 2,135 subset graphs offset each other to give a population
result that is better than the best subset. Stock price volatility (which has nothing to do
with fraud) is the driver for conformity to Benford for the individual companies.
Figure 7. Stocks with a weak or strong conformity to Benford. Note. Panels A to D show the stock prices for the four two stocks with daily returns that have the weakest
conformity to Benford. Panels E and F show the stock prices of the two stocks (Juniper Networks and Citigroup)
with the best conformity to Benford each with MADs of 0.0012.
552 Journal of Accounting, Auditing & Finance
Alali and Romero (2013) divide a decade into five subsets: 2001-2002, 2003-2004, . . .,
and 2009-2010. They use Compustat financial statement numbers and the conformity of the
periods to Benford’s Law to ‘‘find different indicators of manipulation during the periods.’’
The authors note that they are using deviations from Benford, which might indicate manip-
ulation. Following their approach, the daily returns were analyzed on a year-by-year basis
to identify the CRSP calendar years with the most ‘‘potential manipulation.’’ The total
number of records varied from 1.1 million records to 1.6 million records because of
changes in the number of stock issues, the deletion of returns of 0, and the variation in the
number of trading days per year. For each of the 14 years, the fit was excellent with MADs
ranging from 0.00051 to 0.00129. Once again, the best annual MAD of 0.00051 was
greater than the population MAD of 0.00046 (shown in Figure 1). Over the 14-year period,
the market suffered two large declines and two periods of growth, and it is remarkable that
the digit patterns remained stable from year to year. The highest MAD of 0.00129 was for
2000 and the second highest MAD of 0.00081 was for 2001. These high MADs were
caused by trading in eighths and sixteenths, which restricted the possible number of returns
(and the first-two digits of those returns) for any stock for any particular day. The high
MADs had nothing to do with fraud. The annual graphs (not shown) showed regular small
spikes at 20. There were excessive duplications of 0.020408. This return was caused by a
stock at $6.125 increasing to $6.25, $12.25 to $12.50, $24.50 to $25.00, $49.00 to $50.00,
and $98.00 to $100.00, and this ‘‘pricing in eighths increase’’ happened just enough times
to cause a spike (and a deviation on the graph).
The departures from Benford for the volume data can also be explained with non-fraud
reasons. The volume data in Figure 3 showed spikes at the multiples of 10. These MAD-
inflating spikes were a result of trading in lots of 100 and also the fact that the NYSE data
were reported to the nearest hundred shares for the day. For example, the actual volume
123 would be reported as 100. This would cause MAD-inflating spikes at the multiples of
10 for all actual daily volumes from 50 (rounded up to 100) to 1,049.
To show a large MAD difference due to treading water around an average, the daily
stock volumes of Apple Inc. (AAPL) and Facebook, Inc. (FB) were downloaded from
Yahoo Finance (finance.yahoo.com) from their first day of trading to the time of writing.
Apple was listed in December, 1980 and its history includes the success of the Macintosh,
a period of decline, a return to profitability, a record market capitalization, and four stock
splits. An analysis of the stock volumes shows an acceptable conformity with a MAD of
0.00149 (and N = 8,552). In contrast, Facebook was listed in May, 2012 and an analysis of
the daily volumes showed a MAD of 0.00596 (with N = 623), which is above the lower
bound of 0.0022 for nonconformity. The high MAD for Facebook was because one third of
the volumes were in the 30 million to 50 million range, which caused spikes in the 30 to
49 interval. A company with an average volume that is stuck in a narrow range will tend to
have a high MAD, which is a result of a narrow volume range and the number of records.
There are three conformity-related issues with using firm-year Compustat data. First,
Compustat replaces originally reported data with the new data when a company restates its
past results. Their data are a combination of original data that will remain unchanged,
restated numbers that have replaced some original data, and original data that will be chan-
ged at some time in the future. It is not clear how this data mixture affects the conformity
of the subsets (the financial statements for a company for one year). Second, any subset
analysis should avoid the inclusion of totals or subtotals. For example, Microsoft’s 2012
Form 10-K shows that the three components of its inventories amount to $210 million, $96
million, and $831 million, respectively. Compustat shows INVRM, INVWIP, and INVFG
Nigrini 553
at these values, but it also has another field INVT (Total Inventories) at $1,137 million.
INVT cannot be manipulated, and including this total in any analysis introduces an extra
first-two digits 11 into the MAD calculation. The effect of including subtotals (and totals
such as total current assets) probably improves the conformity of the subsets overall
because these fields have close conformities to Benford. Third, Compustat data are standar-
dized to ensure ‘‘consistent and comparable data across companies, industries and business
cycles.’’ The 2012 Microsoft current liabilities include Securities lending payable of $814
and Other of $3,151. Compustat data combine these amounts and shows Current Liabilities
Other (LCOXDR) of $3,965. Also, long-term unearned revenue of $1,406 and other long-
term liabilities of $8,208 are summed to give $9,614 for their field Liabilities-Other-Total
(LO). Standardization has the effect of replacing some financial statement numbers with
other numbers with the result that researchers end up using different digits for the MAD
calculation than would be the case if the actual reported numbers were used.
In an analysis of subsets, high MADs are generally associated with a small number of
records in a subset. A subset with only one record will have a first-two digit MAD of at
least 0.0213. With two records, the MAD will still be at least 0.0205. An analysis of the
382 Compustat Balance Sheet fields showed that, in general, those items that had a low
number of records had the highest MADs. The items that were applicable to the fewest
companies generally had the highest nonconformity. The correlation between N and the
MAD was 2.426, meaning that lower counts were associated with higher MADs. The car-
rying value of common stock (CSTKCV) stood out from the group of Balance Sheet items
because it had a high count of 92,500 and a relatively high MAD of 0.0118. A review of
the data showed that this was because one half of the amounts were either $0.01 or $1.00.
This field is an anomaly because it is not a balance sheet line item that gets included in the
sum of assets or liabilities or equity. These amounts are not ledger balance dollar amounts.
The high MADs for items with only a few records also apply to Compustat’s income
statement fields. The correlation between N and the MAD is 2.358, meaning that the items
used by only a few companies had the highest MADs. The exceptions (high MAD and high
N) were all for items related to EPS calculations that were variations on the standard EPS
calculation or the EPS effect of some loss or gain. None of these items were ledger balance
amounts. For example, S&P Core Earnings EPS Diluted (SPCED) had a high MAD
because of a pattern that was similar but more pronounced than Panel B in Figure 4. The
spikes at the multiples of 10 were caused by high counts for 0.01, 0.02, 0.09, and 0.10.
There were no cash flow anomalies (high MAD and high N), and the correlation between
the MAD and the number of records was 20.556. There were three anomalies for the mis-
cellaneous items, and these were all related to the options (the life of options in years, the
risk-free rate, and the volatility assumption as a percentage).
Any subset analysis also suffers from a bluntness issue. A number such as 100 can be
increased by 99.99%, and it will still have the same first digit, namely 1. A number such as
900 can be increased by 11.1%, and it will still have the same first digit, namely 9.
Relatively large increases for 100 or 200 will leave the first digits unchanged. But numbers
such as 800 or 900 will change first digits for comparatively small increases. Using calcu-
lus, it can be shown that a random number drawn from a Benford Set can be increased on
average by 22.86% without a change in the first digit. At the extreme, a company with a
close conformity can increase every number to the maximum extent possible (e.g.,
Microsoft can change its Other liabilities of $3,151 to $3,965 or $3,999) and the MAD
would remain unchanged. A close conformity firm can increase every number by an aver-
age of 22.86% and the MAD would be unchanged. The scale-invariance property makes
554 Journal of Accounting, Auditing & Finance
this issue even more serious. If a company has a close conformity to begin with, then every
number can be multiplied by any constant and the MAD would remain unchanged.
The subset studies fail to demonstrate using mathematics that manipulations will always
increase the MAD. If the manipulation changes a first digit 2 (when the first digit 2s is at
17.6% or less) to a 3 (when the first digit 3s is already at 12.5% or higher), the effect will be
to increase the MAD. But, if the manipulation changes a first digit 2 (when we have a spike
at 2) to a first digit 3 (when the actual first digit 3 proportion is below .125), the effect will
be to decrease the MAD. The effect on the MAD of a 2!3 change is therefore indetermi- nate. If Microsoft erroneously reported Securities lending payable of $814 and Other
Liabilities of $3,965, the first digits of both amounts would be unchanged at 3 and 8.
There is also a disconnection issue because of the lack of a relationship between the
first digit of a number and the materiality of the number. Microsoft’s income statement
shows revenues of $73,723. An error that overstated revenues by $10,000 would change
the first digit 7 to 8. Their balance sheet reports an income tax liability of $789. An error
that overstated the tax liability by $100 would also change that first digit 7 to an 8. The
effect on the company MAD would be the same for a $100 million error and for a $10,000
million error. However, the effect of the tax error on EPS would be small but the effect of
the revenue error would be $1.19 (ignoring any income tax on the extra profits). The large
error is more likely to affect the digits of other numbers that would be affected (net
income, accounts receivable, and retained earnings) but the effect of these secondary
changes on the MAD is indeterminate. For example, the net income would rise from
$16,978 to $26,978, and retained earnings would rise from $566 to $10,566. The loss of
the first digit 1 for net income would be offset by the gain in a first digit 1 for retained
earnings. The net effect would be a loss of a first digit 5 and a gain of a first digit 2, which
could move the MAD in any direction.
The disconnection issue is a reminder that financial statement fraud can be achieved by
manipulating only a few financial statement line items. If revenue is inflated, it will inflate
accounts receivable, retained earnings, and perhaps income tax expense and income taxes
payable. It is unlikely that only five numeric changes will inflate a company’s MAD to the
extent that it becomes an outlier with a high MAD.
Conclusion
Daily stock returns and stock volumes have a close conformity to Benford’s Law, which
gives the expected frequencies of the various digits (0-9) in tabulated data. The expected
returns and the abnormal returns generated by accounting and finance event studies also
have a close conformity to Benford as do the financial statement amounts reported on the
Compustat database.
Some recent studies have divided a population into subsets, tested the subsets for confor-
mity to Benford, and concluded that the subsets with the weakest fit to Benford were frau-
dulent. With this approach, the rankings depend on the time period chosen, the conformity
measure used, and the line items (financial statement line items or economic statistics)
used in the analysis. The rankings are also influenced by any standardization steps per-
formed on the data. This approach ignores valid non-fraud reasons for nonconformity and
the fact that the manipulation of a financial statement amount might have no effect on the
first digits. There is also no relationship between a change in the first digit and the magni-
tude or the materiality of the error.
Nigrini 555
Diekmann and Jann (2010) state that to validly use Benford’s Law to detect fraud, one
has to demonstrate that correctly stated data conform to Benford’s Law, while manipulated
data follow a different distribution. Benford’s Law is most applicable to fraud schemes
where one person invents all the numbers and all the numbers are fictitious such as the frau-
dulent vendor scheme described in Nigrini (1999). Benford’s Law is also applicable when
many people all have the same incentive to manipulate numbers in the same way, and the
effect on the digits of those numbers is predictable. Examples would include the upper and
lower limits in the income tax code, and a good example is described in Christian and Gupta
(1993). Another example is the early rounding-up study of Carslaw (1988).
Author’s Note
The Compustat and the CRSP data are available from Wharton Research Data Services (WRDS) at
https://wrds-web.wharton.upenn.edu/wrds/
Acknowledgments
I hereby extend my thanks to the Editor-in-Chief Bharat Sarath, the associate editor, and the reviewer
for their insightful comments and suggestions. The article also benefitted from the comments of work-
shop participants at West Virginia University and from discussions with Ted Hill, Steven Miller,
Dick Riley, and Jack Dorminey. I would also like to express my gratitude to my dissertation chairman
Wallace Wood and to Marty Levy on my dissertation committee for believing, all those years ago,
that the largely unknown phenomenon called Benford’s Law was worthy of being a dissertation topic.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/
or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this
article.
References
Alali, F., & Romero, S. (2013). Benford’s Law: Analyzing a decade of financial data. Journal of
Emerging Technologies in Accounting, 10, 1-39.
Ball, R., & Brown, P. (1968). An empirical evaluation of accounting income numbers. Journal of
Accounting Research, 6, 159-178.
Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical
Society, 78, 551-572.
Berger, A., & Hill, T. (2015). An introduction to Benford’s Law. Princeton, NJ: Princeton University
Press.
Carslaw, C. (1988). Anomalies in income numbers: Evidence of goal oriented behavior. The
Accounting Review, 63, 321-327.
Christian, C., & Gupta, S. (1993). New evidence on ‘‘secondary evasion.’’ The Journal of the
American Taxation Association, 15(1), 72-93.
Diekmann, A., & Jann, B. (2010). Benford’s Law and fraud detection: Facts and legends. German
Economic Review, 11, 397-401.
Drake, P., & Nigrini, M. (2000). Computer assisted analytical procedures using Benford’s Law.
Journal of Accounting Education, 18, 127-146.
556 Journal of Accounting, Auditing & Finance
Fama, E. (1965a). The behavior of stock-market prices. The Journal of Business, 38, 34-105.
Fama, E. (1965b). Random walks in stock market prices. Financial Analysts Journal, 21(5), 55-59.
Gonzalez-Garcia, J., & Pastor, G. (2009, January). Benford’s Law and macroeconomic data quality
(IMF Working Paper No. 09/10). Washington, DC: International Monetary Fund.
Leemis, L., Schmeiser, B., & Evans, D. (2000). Survival distributions satisfying Benford’s Law. The
American Statistician, 54(3), 1-6.
Ley, E. (1996). On the peculiar distribution of the US stock indices first digits. The American
Statistician, 50, 311-314.
Malkiel, B. (1973). A random walk down Wall Street. New York, NY: W.W. Norton.
Nigrini, M. (1999). Fraud detection: I’ve got your number. Journal of Accountancy, 187(5), 79-83.
Nigrini, M. (2005). An assessment of the change in the incidence of earnings management around the
Enron-Andersen episode. Review of Accounting and Finance, 4(1), 92-110.
Nigrini, M. (2011). Forensic analytics: Methods and techniques for forensic accounting investiga-
tions. Hoboken, NJ: John Wiley.
Nigrini, M., & Miller, S. (2007). Benford’s Law applied to hydrology data: Results and relevance to
other geophysical data. Mathematical Geology, 39, 469-490.
Nigrini, M., & Mittermaier, L. (1997). The use of Benford’s Law as an aid in analytical procedures.
Auditing: A Journal of Practice & Theory, 16(2), 52-67.
Pinkham, R. (1961). On the distribution of first significant digits. Annals of Mathematical Statistics,
32, 1223-1230.
Rauch, B., Göttsche, M., Brähler, G., & Engel, S. (2011). Fact and fiction in EU-governmental eco-
nomic data. German Economic Review, 12, 243-255.
Rauch, B., Göttsche, M., & Langenegger, S. (2014). Detecting problems in military expenditure data
using digital analysis. Defense and Peace Economics, 25, 97-111.
Rodriguez, R. (2004). Reducing false alarms in the detection of human influence on data. Journal of
Accounting, Auditing, & Finance, 19, 141-158.
Thomas, J. (1989). Unusual patterns in reported earnings. The Accounting Review, 64, 773-787.
Nigrini 557
Copyright of Journal of Accounting, Auditing & Finance is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.