unit2_reading1_db8040.pdf

Journal of Accounting, Auditing & Finance

2015, Vol. 30(4) 541–557 �The Author(s) 2015

Reprints and permissions: sagepub.com/journalsPermissions.nav

DOI: 10.1177/0148558X15584051 jaf.sagepub.com

Persistent Patterns in Stock Returns, Stock Volumes, and Accounting Data in the U.S. Capital Markets

Mark J. Nigrini 1

Abstract

Benford’s Law gives the expected frequencies of the digits in tabulated data. The expected frequencies show a large bias toward the low digits. An analysis of the Center for Research in Security Prices (CRSP) data shows that the daily returns have a near-perfect fit to Benford’s Law. The daily volumes also have a close fit to Benford’s Law but there are devia- tions due to round lot trading and the fact that some of the data are rounded to the near- est hundred. An analysis of Compustat data also shows a close fit to Benford’s Law with some explainable deviations. The expected returns and the abnormal returns used in event studies over an extended period showed that these numbers also conformed to Benford’s Law. Recent studies have divided a population into subsets and then tested the subsets for conformity to Benford’s Law. The conclusions are that the subsets with the weakest fit to Benford were fraudulent. The problems with this approach are discussed, and these include statistical considerations, issues with using Compustat data, other plausible explanations for a lack of conformity, and the fact that there is no clear link between a change in the leading digit of a number and the materiality of the dollar value of the change.

Keywords

Benford’s Law, stock returns, stock volumes, fraud detection, event studies

Introduction

Ball and Brown (1968) showed that there was a relationship between stock price changes

and the information contained in earnings reports. A few years earlier, Fama (1965a,

1965b) made the conceptual breakthrough of framing the random walks of stock prices as a

function of information flows. The random walk phrase was popularized by Malkiel

(1973). Google Scholar shows that these four studies have been cited more than 15,000

times confirming that the topics of stock prices and accounting data taken by themselves or

seen together are important areas of study. The objective of this study is to show that there

is the same consistent, persistent, and interesting pattern in the random walk of stock

prices, the stock volumes associated with those same stock prices, the expected and

1 West Virginia University, Morgantown, USA

Corresponding Author:

Mark J. Nigrini, College of Business and Economics, West Virginia University, Morgantown, WV 26506, USA.

Email: [email protected]

Tracks Paper

abnormal returns calculated in event studies, and in the numbers shown in earnings reports.

This regularity, namely, Benford’s Law, has also been seen in naturally occurring earth sci-

ence data.

In response to the growing number of studies that use Benford’s Law to identify finan-

cial statement fraud and economic statistics fraud, this study also shows that there are other

possible non-fraud explanations for nonconformity to Benford’s Law. In addition to non-

fraud explanations, there are methodological issues with using the first digits to identify

manipulation, and statistical issues with using Compustat data.

Benford’s Law

Benford (1938) hypothesized that more real-world numbers started with 1s and 2s than

started with 8s or 9s. He analyzed the first digits of numbers from diverse sources (such as

the drainage areas of rivers, scientific constants, and population counts), and his results

showed that 1 was the first digit 30.6% of the time and that 2 was the first digit 18.5% of

the time. A positive number x can be written as S(x) 3 10 k , where S(x) 2 [1, 10) is the sig-

nificand and k is an integer (called the exponent). For example, the number 1,964 can be

written as 1.964 3 10 3 . The integer part of the significand is the first digit. Zero, by defini-

tion, is inadmissible as a first digit. Benford made the assumption that the ordered values

of a data set form a geometric sequence, and using calculus he developed the expected fre-

quencies of the digits in tabulated data. The formulas are shown below, with D1 represent-

ing the first digit and D1D2 representing the first-two digits of a number:

Prob D1 = d1ð Þ= log 1 + 1

d1

� � d1 2 1, 2, . . . , 9f g, ð1Þ

Prob D1D2 = d1d2ð Þ= log 1 + 1

d1d2

� � d1d2 2 10, 11, 12, . . . , 99f g, ð2Þ

where Prob indicates the probability of observing the event in parentheses, and log refers

to the common logarithm. For example, the probability of the first-two digits being 19 is

.0223 (log(1 + 1/19)). For the first-two digits there is a large bias toward the lower digits (1, 2, and 3). From

the third digit onward, the probabilities are close to being uniform at .10 for any of the pos-

sible 10 digits, 0 to 9.

The basis of Benford’s Law is that the mantissas of the logs of the numbers are uni-

formly distributed. For example, the logarithm of 1,964 is 3.2931 and the mantissa is the

0.2931 fractional part and the characteristic is the integer value of 3. Leemis, Schmeiser,

and Evans (‘‘LSE,’’ 2000) state that if W is uniformly distributed U(a, b), where a and b

are real numbers with a \ b, and if the interval (10a, 10b) covers an integer number of orders of magnitude, then the first digit of the random variable T = 10

W satisfies Benford’s

Law (‘‘Benford’’) exactly. They presumably meant that the distribution of the digits of the

possible values of T would conform to Benford. So if b 2 a is an integer, and if the loga-

rithms are uniformly distributed, then the exponentiated numbers (10 W

) will conform to

Benford. When the logs of the numbers are uniformly distributed, the numbers themselves

will form a perfect, or a near-perfect, geometric sequence of the form,

542 Journal of Accounting, Auditing & Finance

Sn = ar n�1, ð3Þ

where a is the first element of the sequence and r is the ratio of the (n + 1)th element divided by the nth element. A geometric sequence with N elements will have n spanning

the range [1, 2, 3, . . ., N].

Benford’s Law has some interesting properties. The scale-invariance property

(Pinkham, 1961) states that if the numbers in a Benford Set (a set of numbers that con-

forms to Benford’s Law) were all multiplied by a (nonzero) constant, then the new data

set would also be a Benford Set. The implication is that if Benford’s Law applies to stock

market or accounting data, then it should do so regardless of the source currency. The

law is also base invariant, which means that if the numbers in a Benford Set were con-

verted to (say) base 8 (where 1,964 becomes 3,654) and if the expected probabilities in

Equations 1 and 2 were recalculated, then the base 8 numbers will conform to the base 8

probabilities (Berger & Hill, 2015). The law is also power invariant in that if each

numeric value in a Benford Set is raised to a power in the sequence {1.5, 2.0, 2.5,

3.0, . . .}, then the new data set would also be a Benford Set. This is a variation on the

scale-invariance property.

Nigrini and Miller (2007) analyzed streamflow records for 140 years. Their data con-

formed almost perfectly to Benford using the Mean Absolute Deviation (MAD) as the con-

formity measure. The formula is shown in Equation 4:

Mean absolute deviation =

PK i = 1

AP �EPj j

K , ð4Þ

where EP denotes the expected proportion, AP the actual proportion, and K represents the

number of bins (which equals 90 for the first-two digits).

The streamflow MAD of 0.00013 meant that there were only small differences between

the actual proportions and the Benford proportions. There is no measure of significance for

the MAD, but Nigrini (2011) contains a table that states that MAD values from 0 to 0.0012

constitute a close conformity to Benford.

Stock Returns

Ley (1996) analyzed the daily returns of the Dow Jones Industrial Average (DJIA) from

1900 to 1993 and the daily returns of the S&P index from 1926 to 1993. The first digits

showed a reasonable conformity to Benford. The MAD values were 0.0047 and

0.0043, which constitute a close conformity result. Rodriguez (2004) analyzed capital

market data, and his results showed that the annual rates of return (from the Ibbotson

stocks, bonds, bills, and inflation data) conformed to Benford, but with only 76 records in

the data set, the chi-square test is somewhat forgiving. His results also showed that the

daily returns of the DJIA did not conform to Benford.

The daily returns of security issues in the Center for Research in Security Prices (CRSP)

database were analyzed. The data used were the Stock/Security Files in the Annual Update

file. The options selected were as follows:

Date range: 1/1/2000 to 12/31/2013.

Company codes: ‘‘Search the entire database.’’

Nigrini 543

Conditional statements: Share Code (shrcd) \ 20. Time series information: Price, Holding Period Return.

The query produced a table with 20,174,725 records. The stock returns are reported to six

decimal places. Records with an absolute value less than 0.00001 were deleted because

values from 0.000001 to 0.000009 do not have explicit first-two digits. A number recorded

as 0.000008 could be any number from 0.00000750 to 0.00000849. Some 1.15 million

returns were equal to zero, 3,200 returns were values from 0.000001 to 0.000009, and

360,000 (American Stock Exchange [AMEX]) returns were missing (null). This left N =

18,667,795. The first-two digits test was first used by Nigrini and Mittermaier (1997) because

it is more informative than the first digits test. The results are shown in Panel A of Figure 1.

The monotonically decreasing line in Panel A represents the expected proportions of

Benford’s Law. The Benford proportions start at 0.0414 at 10 on the x axis and decrease

steadily to a low of 0.0044. The bars show the actual proportions and the bar at 50 indi-

cates that the actual proportion for 50 was 0.0098. The first-two digits have a close confor-

mity to Benford with a MAD of 0.00046. There are small visible spikes (excesses) at 20

and 25 and a slight tendency for systematic spikes at the multiples of 10. The absolute

daily returns that occurred most often were 0.04, 0.05, 0.025, 0.041667, 0.033333,

0.066667, and 0.047619. Each of these values occurred between 25,000 and 29,000 times.

With 18.7 million records, the spikes caused by these number duplications were small.

A plot of the ordered logs of the daily returns is shown in Panel B of Figure 1. Daily

returns present some issues when it comes to an analysis of the logs because the log of a

negative number is undefined. This issue was solved by taking the logs of the absolute

values of the daily return. The ‘‘log’’ of 20.01 was calculated to be 22.00, and so 22.00

was the ‘‘log’’ of both 20.01 and 0.01. The graph of the ordered logs is upward sloping as

would be expected from ordered values. This graph has the same shape as the streamflow

graph in Nigrini and Miller (2007). The digit patterns (and the log patterns) of the daily

returns are the same patterns that were observed in the earth science data.

The next step in the analysis was the preparation of a histogram of the returns. The his-

togram was based on 19,819,978 returns after the deletion of the null values and is shown

in Figure 2.

Figure 1. First-two digits and ordered logs of daily returns. Note. Panel A shows the line of Benford’s Law and the actual proportions as bars of the first-two digits of the daily

returns, and Panel B shows the ordered logs of the daily returns over the same period.

544 Journal of Accounting, Auditing & Finance

The histogram shows 125 intervals (with an interval width of 0.0016) from 20.10 to

+ 0.10. The large spike in the center of the graph is the [0.0000, 0.0016) interval with a proportion of 0.086. The proportion of returns that were negative was 0.4741, whereas the

proportion of returns that were positive was 0.4680, with slightly less than 6% of the

returns being equal to exactly 0. The median return was 0.

The data have a near-perfect fit to Benford. Berger and Hill (2015) note that

none of the familiar classical probability distributions or random variables, such as e.g. normal,

uniform, exponential, beta, binomial, or gamma distributions are Benford. Specifically, no uni-

form distribution is even close to Benford, no matter how large its range or how it is centered.

(p. 36)

They also note that an exponential distribution with a mean equal to 1 comes close to

being Benford, and that ‘‘a log-normal random variable with large variance (compared with

its mean) is practically indistinguishable from a Benford random variable.’’

The histogram was analyzed using the curve fitting function of the software package

TableCurve 2D. The best fitting density function was the Pearson IV distribution. This dis-

tribution was ignored because the pdf is complex and can fit almost any set of continuous

data. The best fit from the familiar distributions was the Cauchy distribution with an r 2

of

.872. The fitted Cauchy distribution is the line shown in Figure 2, which is a simple, sym-

metric, unimodal shape that is similar in shape to the standard normal pdf.

Stock Volumes

The query using the stock returns options returned 20,174,725 records. Daily volumes less

than 10 shares for the day were deleted because the integers from 1 to 9 do not have

Figure 2. Daily stock returns and a fitted Cauchy distribution. Note. A histogram of the ordinary share stock returns and a fitted curve from the Cauchy distribution.

Nigrini 545

explicit first-two digits leaving 19,120,349 records. The digit patterns are shown in Panel A

of Figure 3.

The digits of the daily volumes show a close conformity with a MAD of 0.00070. MADs

less than 0.0012 qualify as being close conformity. There are systematic spikes at the multi-

ples of 10 (10, 20, . . ., 90). A review of the number frequencies shows that the daily

volumes that occurred most often were 100, 200, 500, 1,000, 300, 400, and 600. It seems

that the high frequency volumes are amounts that are the result of investors avoiding odd-lot

trading. The odd-lot premium has almost disappeared in recent years. The CRSP documenta-

tion also reports that ‘‘our source for the NYSE/AMEX reports the numbers rounded to the

nearest hundred.’’ The systematic spikes are there because of a tendency to trade in multiples

of 100 and the fact that New York Stock Exchange (NYSE) reports rounded numbers.

The ordered (ranked from smallest to largest) logs of the stock volumes are shown in

Panel B of Figure 3. The line is either upward sloping or (as can be seen at y = 2) it has a

short segment with a zero slope. This happens when a number (such as 100 which is 10 2 )

has a high enough frequency to cause a visible horizontal segment. This shape is similar to

the ordered log pattern for the streamflow data in Nigrini and Miller (2007).

Accounting Data

Prior studies have analyzed accounting data for conformity to Benford. The dollar amounts

of the invoices approved for payment by a NYSE-listed company were analyzed in Nigrini

and Mittermaier (1997), and the dollar amounts of the invoices approved for payment by a

software company were analyzed in Drake and Nigrini (2000). Carslaw (1988), Thomas

(1989), and Nigrini (2005) analyzed earnings releases for signs of rounding up around psy-

chological reference points (such as $100 million).

The Compustat data used were the Fundamentals Annual section of the North America/

Monthly Updates. The options selected were all fiscal years from 2000 to 2013 and All 382

Balance Sheet items, 328 Income Statement items, 66 Cash Flow items, and 114

Miscellaneous items.

The data were downloaded in four groups. Each group had 157,000 firm years.

Companies do not use every item in every group. Some fields were used by 130,000 firms,

whereas others were unused or were used by fewer than 1,000 firms. The digits of the

accounting data are shown in Figure 4.

Figure 3. First-two digits and ordered logs of daily stock volumes. Note. Panel A shows the first-two digits of the daily volumes and Panel B shows the ordered logs of the daily volumes.

546 Journal of Accounting, Auditing & Finance

Compustat report amounts in millions to three decimal places. Amounts from 0.001 to

0.009 were deleted because these amounts do not have an explicit second digit. An amount

reported as 0.002 could be any number from 0.00150 to 0.00249. The Compustat data have

MADs ranging from 0.00037 for the Cash Flow items to 0.00166 for the Income Statement

items. If the data are aggregated (N = 19,189,396), the result is MAD of 0.00085 which is

comfortably below the 0.0012 upper bound for close conformity. A plot of the ordered logs

shows the same patterns as can be seen in Figures 1 and 3.

A number duplications test shows that the following 12 amounts occurred most often:

Figure 4. First-two digits of Compustat data. Note. The graphs show the first-two digits of the Compustat financial statement items.

All Compustat Income Statement

Amount Count Amount (US$) Count

0.010 179,131 0.01 123,296 0.020 97,617 0.02 83,069 0.030 76,473 0.03 64,479 0.040 62,110 0.04 51,857 0.050 57,298 0.05 44,771 0.100 53,114 0.06 38,348 0.060 46,926 0.07 33,070 0.070 40,388 0.08 29,869 1.000 39,429 0.10 29,858 0.080 37,100 0.09 26,917 0.090 33,567 0.11 22,198 0.200 31,903 0.12 21,520

Nigrini 547

The Income Statement items have disproportionately high counts for the 0.01 to 0.10

numbers. These numbers caused the systematic spikes at 10, 20, 30, . . . , 90. A review of

the items with high 0.01 to 0.10 counts shows that these items are all related to various

Earnings Per Share (EPS) calculations or items showing the EPS effect of transactions

(e.g., GLEPS Gain/Loss Basic EPS Effect). These EPS items caused the Income Statement

to have the largest MAD.

Abnormal Stock Returns

Event studies are used in accounting and finance to study the impact of an event on the

daily (or monthly) return of a security. Events could include the promulgation of a new

accounting rule, a merger, or a dividend announcement. These studies partition the return

into the part due to the new news (the event) and the part due to macroeconomic news.

The part of the daily return that is due to the event is called the abnormal return.

The first step in this analysis was to delete the records for stocks that did not have daily

returns for the full 14-year period from 2000 to 2013 (both years inclusive). Including

stocks with random within-period starting or ending dates would have added a layer of

complexity to the programming that might have introduced errors in the results. The 14-

year period had 3,521 trading days. There were 2,405 stock issues with complete daily

return data.

The next step was to delete firms with low trading volumes. The calculation of abnormal

returns is complicated by non-synchronous trading, which occurs when stocks do not trade

all the way to the closing bell. To reduce this source of noise, stocks were deleted that had

more than 250 days with trading volumes of 100 shares or less in the period. This deletion

reduced the number of firms by 270, which left 2,135 firms each with 3,521 daily returns

for 14 years (N = 7,517,335).

For each stock issue, the first 250 days (essentially the calendar year 2000) was used as

the estimation period. Abnormal returns were calculated for all companies for Day 251.

Thereafter, the returns for Day 2 to Day 251 were used to calculate an abnormal return for

all companies for Day 252 and so on. With 3,521 days and a moving 250-day estimation

window, each company had 3,271 abnormal returns. The expected return in each case was

calculated as follows:

E R0jRM0ð Þ= Intercept + Slope3RM0, ð5Þ

where R is the return for the firm, RM is the return for the market, and the parameters

Intercept and Slope are related to the linear structure of the market model.

The abnormal return is the excess of the actual return, R0, over the expected return cal-

culated using Equation 5. There were 6,983,585 abnormal returns (2,135 firms with abnor-

mal returns for 3,271 days).

The expected return results are shown in Panel A of Figure 5. There is a near-perfect

conformity to Benford with a MAD of just 0.00032. The actual proportions marginally

exceed the Benford proportions in the 10 to 15 range, and the opposite occurs in the 16 to

47 range. The results are consistent, in that both the actual returns (with a MAD of 0.00046

in Figure 1) that were used to calculate the expected returns and the expected returns them-

selves conformed to Benford.

548 Journal of Accounting, Auditing & Finance

A histogram of the expected returns is shown in Panel B of Figure 5. Once again, the

Cauchy distribution provides a good fit to the data with an r 2

of .9804. The digits of the

abnormal returns are shown in Panel A of Figure 6.

The abnormal returns in Figure 6 have a remarkably close conformity to Benford with a

MAD of 0.00044. The actual proportions are almost perfectly monotonically decreasing.

A histogram of the abnormal returns is shown in Panel B. The histogram is slightly posi-

tively skewed, presumably because returns have no limit on the upside. The Cauchy distri-

bution provides a close fit with an r 2

of .9994. These results suggest that researchers could

test their abnormal returns against Benford, and that nonconformity might indicate a sys-

tematic error in the calculations. However, conformity to Benford does not mean that the

sample is complete (a random sample of a Benford Set should give a Benford Set), it also

does not confirm that the researcher has correctly identified the event dates, and it also

does not mean that the researcher has chosen the ‘‘best’’ stock index for the regression

models. Benford is not a guarantor of the validity of the study.

The tests were repeated using the CRSP Equal-Weighted (EW) index, and the results

(not shown) were nearly identical. A random index was simulated with a range of returns

uniformly distributed (20.10, + 0.10). The expected returns and the abnormal returns were calculated for each of the stocks in the same way as was done previously. The results

Figure 5. First-two digits and histogram of expected returns. Note. Panel A shows the first-two digits of the expected returns and Panel B shows a histogram of the expected

returns.

Figure 6. First-two digits and histogram of abnormal returns. Note. Panel A shows the first-two digits and Panel B shows a histogram of the abnormal returns.

Nigrini 549

were surprising, in that the expected returns showed just a small increase in the MAD. The

fictional abnormal returns also had a close conformity to Benford with a MAD of just

0.00049, which is close to the previous results. Benford can therefore not be used to test

the accuracy of the market returns used in event studies.

The Benford–Cauchy fits in Figures 2, 5, and 6 indicate that there is a relationship

between the simple, symmetric, unimodal Cauchy distribution and conformity to Benford.

This Benford–Cauchy relationship was first documented by Rodriguez (2004).

Analysis of Data Subsets

Recent studies have divided economic populations into subsets. The conclusions were that

the worst fits to Benford were fraudulent. For example, Rauch, Göttsche, Brähler, and

Engel (‘‘RGBE,’’ 2011) analyze macroeconomic data for 16 countries for 11 years with

130 records per year for each country. The Greek data had the worst conformity to

Benford. They conclude that as data issues were identified by the European Commission,

this confirmed the effectiveness of Benford as a detector of such manipulations. They jus-

tify their subset approach by noting that ‘‘it is sufficient that the conditional probability of

a Benford distribution is higher for non-manipulated data than for manipulated data.’’

The subset rankings depend, in part, on which reported amounts were included in the

analysis, on the chosen conformity measure, and on the time period analyzed. Also, willful

manipulation would mean that some data categories would be susceptible to overstatement

and others susceptible to an understatement. A country might want to understate its debt

and overstate some categories of social spending. With a small sample of highly aggregated

data, and with incentives to inflate some numbers and to deflate others, it is not clear that

conformity to Benford’s Law has any relationship to fraudulent manipulation. For example,

nonconformity could be caused by an expenditure (or income) line item that starts at (say)

9,000 and ends at 9,800. The amounts infuse the data with extra first digit 9s that inflate

the chi-square statistics. If the numbers had started at (say) 10,000 and had ended at

10,900, then the series would infuse the data with extra first digit 1s, but as 1s have a high

expected count, the effect on the chi-square statistic is muted.

The RGBE results show that Belgium, Austria, Ireland, and Finland had similar poor fits

to Benford’s Law, while Portugal had the second best fit to Benford. Table 1 shows the

RGBE rankings and adds the rank for each of the countries for the Institutions portion of

the Global Competiveness Index (World Economic Forum, 2010 at www.weforum.org).

The Institutional rankings in Table 1 take into account the quality of the government’s

management of public finances and the administrative framework within which everyone

interacts to generate income and wealth in the economy. This presumably includes the

quality and accuracy of government statistics. The Spearman Rank Correlation coefficient

is .082, which is an insignificant (p = .762) correlation. The lack of any link between

Benford and fraud was confirmed by Gonzalez-Garcia and Pastor (2009) who tested confor-

mity to Benford against the Reports of the Observance of Standard and Codes (see http://

www.imf.org/external/NP/rosc/rosc.aspx) of the International Monetary Fund (IMF). They

found that macroeconomic data as a whole conform to Benford’s Law but that the

conformity of various subsets was not a reliable indicator of data quality. There was no

‘‘pattern of consistency’’ between conformity to Benford and the data quality ratings in the

IMF’s Reports. They concluded that nonconformity did not reliably signal poor quality

macroeconomic data.

550 Journal of Accounting, Auditing & Finance

Other studies have copied their subset approach. Google Scholar (http://

scholar.google.com) reports that the RGBE paper has 57 citations. Rauch, Göttsche, and

Langenegger (2014) analyzed military expenditures reported to the United Nations Office

for Disarmament Affairs. They concluded that the United States and the United Kingdom

have the lowest quality military data because they had the worst conformity to Benford.

The best fits to Benford were for Romania and Russia.

To show that factors other than fraud can affect the conformity of subsets, the CRSP

subsets were analyzed as if the test was meant to detect ‘‘CRSP fraud.’’ The stock returns

of individual companies were analyzed to see which stock price patterns generated the larg-

est deviations from Benford. The stock prices of the four companies with the highest

MADs are shown in Panels A to D of Figure 7.

The companies in Panels A to D are all mutual funds with large holdings of fixed

income securities. Their stock prices were relatively stable from 2000 to the financial crisis

of 2008 when they showed only small declines compared with the rest of the market. The

stocks showed small steady price gains for 2011, a year when the overall market treaded

water. The results show that long-term price stability with small daily changes is one condi-

tion that produces large MADs for the stock returns. A histogram of the stock returns (not

shown) shows that the daily returns of these stocks are less dispersed than the conforming

returns (shown in Panels E and F). A review of the first-two digits of the daily returns for

the AllianceBernstein Income Fund (ACG; the highest MAD) shows large spikes at 12, 24,

and 36. These spikes are caused by the 0.01 changes with a stock price of $8.00 (a daily

return of 0.00125), the 0.02 changes with a stock price marginally above $8.00, and also

the 0.03 changes which are responsible for the spike at 36. The spikes are the result of

0.01, 0.02, and 0.03 changes for a stock price that is stuck just above $8.00.

The Nuveen Municipal Value Fund (NUV) result in Panel B is also due to a low level

of volatility. The first-two digits showed a large spike at 10, and a review of the data

Table 1. Conformity to Benford and Competitiveness Rankings.

Country RGBE Institutions

Netherlands 1 12 Portugal 2 54 Luxembourg 3 9 Malta 4 34 France 5 26 Spain 6 53 Slovenia 7 50 Cyprus 8 30 Italy 9 92 Germany 10 13 Slovakia 11 89 Finland 12 4 Ireland 13 24 Austria 14 15 Belgium 15 29 Greece 16 84

Note. The table shows the conformity to Benford rankings of RGBE and the Institutional Competiveness rankings

of the World Economic Forum. RGBE = Rauch, Göttsche, Brähler, and Engel.

Nigrini 551

showed that these were caused by many 0.01 changes with a stock price marginally above

9.00 (e.g., 0.01/9.20). In a sharp contrast to the low volatility of the Panel A-D stocks, the

two stocks on Panels E and F have a high volatility as can be seen by looking at the stock

price ranges (Juniper Networks from $4.43 to $344.00 and Citigroup from $1.02 to

$77.44). The best company MAD of 0.0012 exceeds the average MAD of 0.00046. The

deviations from Benford in the 2,135 subset graphs offset each other to give a population

result that is better than the best subset. Stock price volatility (which has nothing to do

with fraud) is the driver for conformity to Benford for the individual companies.

Figure 7. Stocks with a weak or strong conformity to Benford. Note. Panels A to D show the stock prices for the four two stocks with daily returns that have the weakest

conformity to Benford. Panels E and F show the stock prices of the two stocks (Juniper Networks and Citigroup)

with the best conformity to Benford each with MADs of 0.0012.

552 Journal of Accounting, Auditing & Finance

Alali and Romero (2013) divide a decade into five subsets: 2001-2002, 2003-2004, . . .,

and 2009-2010. They use Compustat financial statement numbers and the conformity of the

periods to Benford’s Law to ‘‘find different indicators of manipulation during the periods.’’

The authors note that they are using deviations from Benford, which might indicate manip-

ulation. Following their approach, the daily returns were analyzed on a year-by-year basis

to identify the CRSP calendar years with the most ‘‘potential manipulation.’’ The total

number of records varied from 1.1 million records to 1.6 million records because of

changes in the number of stock issues, the deletion of returns of 0, and the variation in the

number of trading days per year. For each of the 14 years, the fit was excellent with MADs

ranging from 0.00051 to 0.00129. Once again, the best annual MAD of 0.00051 was

greater than the population MAD of 0.00046 (shown in Figure 1). Over the 14-year period,

the market suffered two large declines and two periods of growth, and it is remarkable that

the digit patterns remained stable from year to year. The highest MAD of 0.00129 was for

2000 and the second highest MAD of 0.00081 was for 2001. These high MADs were

caused by trading in eighths and sixteenths, which restricted the possible number of returns

(and the first-two digits of those returns) for any stock for any particular day. The high

MADs had nothing to do with fraud. The annual graphs (not shown) showed regular small

spikes at 20. There were excessive duplications of 0.020408. This return was caused by a

stock at $6.125 increasing to $6.25, $12.25 to $12.50, $24.50 to $25.00, $49.00 to $50.00,

and $98.00 to $100.00, and this ‘‘pricing in eighths increase’’ happened just enough times

to cause a spike (and a deviation on the graph).

The departures from Benford for the volume data can also be explained with non-fraud

reasons. The volume data in Figure 3 showed spikes at the multiples of 10. These MAD-

inflating spikes were a result of trading in lots of 100 and also the fact that the NYSE data

were reported to the nearest hundred shares for the day. For example, the actual volume

123 would be reported as 100. This would cause MAD-inflating spikes at the multiples of

10 for all actual daily volumes from 50 (rounded up to 100) to 1,049.

To show a large MAD difference due to treading water around an average, the daily

stock volumes of Apple Inc. (AAPL) and Facebook, Inc. (FB) were downloaded from

Yahoo Finance (finance.yahoo.com) from their first day of trading to the time of writing.

Apple was listed in December, 1980 and its history includes the success of the Macintosh,

a period of decline, a return to profitability, a record market capitalization, and four stock

splits. An analysis of the stock volumes shows an acceptable conformity with a MAD of

0.00149 (and N = 8,552). In contrast, Facebook was listed in May, 2012 and an analysis of

the daily volumes showed a MAD of 0.00596 (with N = 623), which is above the lower

bound of 0.0022 for nonconformity. The high MAD for Facebook was because one third of

the volumes were in the 30 million to 50 million range, which caused spikes in the 30 to

49 interval. A company with an average volume that is stuck in a narrow range will tend to

have a high MAD, which is a result of a narrow volume range and the number of records.

There are three conformity-related issues with using firm-year Compustat data. First,

Compustat replaces originally reported data with the new data when a company restates its

past results. Their data are a combination of original data that will remain unchanged,

restated numbers that have replaced some original data, and original data that will be chan-

ged at some time in the future. It is not clear how this data mixture affects the conformity

of the subsets (the financial statements for a company for one year). Second, any subset

analysis should avoid the inclusion of totals or subtotals. For example, Microsoft’s 2012

Form 10-K shows that the three components of its inventories amount to $210 million, $96

million, and $831 million, respectively. Compustat shows INVRM, INVWIP, and INVFG

Nigrini 553

at these values, but it also has another field INVT (Total Inventories) at $1,137 million.

INVT cannot be manipulated, and including this total in any analysis introduces an extra

first-two digits 11 into the MAD calculation. The effect of including subtotals (and totals

such as total current assets) probably improves the conformity of the subsets overall

because these fields have close conformities to Benford. Third, Compustat data are standar-

dized to ensure ‘‘consistent and comparable data across companies, industries and business

cycles.’’ The 2012 Microsoft current liabilities include Securities lending payable of $814

and Other of $3,151. Compustat data combine these amounts and shows Current Liabilities

Other (LCOXDR) of $3,965. Also, long-term unearned revenue of $1,406 and other long-

term liabilities of $8,208 are summed to give $9,614 for their field Liabilities-Other-Total

(LO). Standardization has the effect of replacing some financial statement numbers with

other numbers with the result that researchers end up using different digits for the MAD

calculation than would be the case if the actual reported numbers were used.

In an analysis of subsets, high MADs are generally associated with a small number of

records in a subset. A subset with only one record will have a first-two digit MAD of at

least 0.0213. With two records, the MAD will still be at least 0.0205. An analysis of the

382 Compustat Balance Sheet fields showed that, in general, those items that had a low

number of records had the highest MADs. The items that were applicable to the fewest

companies generally had the highest nonconformity. The correlation between N and the

MAD was 2.426, meaning that lower counts were associated with higher MADs. The car-

rying value of common stock (CSTKCV) stood out from the group of Balance Sheet items

because it had a high count of 92,500 and a relatively high MAD of 0.0118. A review of

the data showed that this was because one half of the amounts were either $0.01 or $1.00.

This field is an anomaly because it is not a balance sheet line item that gets included in the

sum of assets or liabilities or equity. These amounts are not ledger balance dollar amounts.

The high MADs for items with only a few records also apply to Compustat’s income

statement fields. The correlation between N and the MAD is 2.358, meaning that the items

used by only a few companies had the highest MADs. The exceptions (high MAD and high

N) were all for items related to EPS calculations that were variations on the standard EPS

calculation or the EPS effect of some loss or gain. None of these items were ledger balance

amounts. For example, S&P Core Earnings EPS Diluted (SPCED) had a high MAD

because of a pattern that was similar but more pronounced than Panel B in Figure 4. The

spikes at the multiples of 10 were caused by high counts for 0.01, 0.02, 0.09, and 0.10.

There were no cash flow anomalies (high MAD and high N), and the correlation between

the MAD and the number of records was 20.556. There were three anomalies for the mis-

cellaneous items, and these were all related to the options (the life of options in years, the

risk-free rate, and the volatility assumption as a percentage).

Any subset analysis also suffers from a bluntness issue. A number such as 100 can be

increased by 99.99%, and it will still have the same first digit, namely 1. A number such as

900 can be increased by 11.1%, and it will still have the same first digit, namely 9.

Relatively large increases for 100 or 200 will leave the first digits unchanged. But numbers

such as 800 or 900 will change first digits for comparatively small increases. Using calcu-

lus, it can be shown that a random number drawn from a Benford Set can be increased on

average by 22.86% without a change in the first digit. At the extreme, a company with a

close conformity can increase every number to the maximum extent possible (e.g.,

Microsoft can change its Other liabilities of $3,151 to $3,965 or $3,999) and the MAD

would remain unchanged. A close conformity firm can increase every number by an aver-

age of 22.86% and the MAD would be unchanged. The scale-invariance property makes

554 Journal of Accounting, Auditing & Finance

this issue even more serious. If a company has a close conformity to begin with, then every

number can be multiplied by any constant and the MAD would remain unchanged.

The subset studies fail to demonstrate using mathematics that manipulations will always

increase the MAD. If the manipulation changes a first digit 2 (when the first digit 2s is at

17.6% or less) to a 3 (when the first digit 3s is already at 12.5% or higher), the effect will be

to increase the MAD. But, if the manipulation changes a first digit 2 (when we have a spike

at 2) to a first digit 3 (when the actual first digit 3 proportion is below .125), the effect will

be to decrease the MAD. The effect on the MAD of a 2!3 change is therefore indetermi- nate. If Microsoft erroneously reported Securities lending payable of $814 and Other

Liabilities of $3,965, the first digits of both amounts would be unchanged at 3 and 8.

There is also a disconnection issue because of the lack of a relationship between the

first digit of a number and the materiality of the number. Microsoft’s income statement

shows revenues of $73,723. An error that overstated revenues by $10,000 would change

the first digit 7 to 8. Their balance sheet reports an income tax liability of $789. An error

that overstated the tax liability by $100 would also change that first digit 7 to an 8. The

effect on the company MAD would be the same for a $100 million error and for a $10,000

million error. However, the effect of the tax error on EPS would be small but the effect of

the revenue error would be $1.19 (ignoring any income tax on the extra profits). The large

error is more likely to affect the digits of other numbers that would be affected (net

income, accounts receivable, and retained earnings) but the effect of these secondary

changes on the MAD is indeterminate. For example, the net income would rise from

$16,978 to $26,978, and retained earnings would rise from $566 to $10,566. The loss of

the first digit 1 for net income would be offset by the gain in a first digit 1 for retained

earnings. The net effect would be a loss of a first digit 5 and a gain of a first digit 2, which

could move the MAD in any direction.

The disconnection issue is a reminder that financial statement fraud can be achieved by

manipulating only a few financial statement line items. If revenue is inflated, it will inflate

accounts receivable, retained earnings, and perhaps income tax expense and income taxes

payable. It is unlikely that only five numeric changes will inflate a company’s MAD to the

extent that it becomes an outlier with a high MAD.

Conclusion

Daily stock returns and stock volumes have a close conformity to Benford’s Law, which

gives the expected frequencies of the various digits (0-9) in tabulated data. The expected

returns and the abnormal returns generated by accounting and finance event studies also

have a close conformity to Benford as do the financial statement amounts reported on the

Compustat database.

Some recent studies have divided a population into subsets, tested the subsets for confor-

mity to Benford, and concluded that the subsets with the weakest fit to Benford were frau-

dulent. With this approach, the rankings depend on the time period chosen, the conformity

measure used, and the line items (financial statement line items or economic statistics)

used in the analysis. The rankings are also influenced by any standardization steps per-

formed on the data. This approach ignores valid non-fraud reasons for nonconformity and

the fact that the manipulation of a financial statement amount might have no effect on the

first digits. There is also no relationship between a change in the first digit and the magni-

tude or the materiality of the error.

Nigrini 555

Diekmann and Jann (2010) state that to validly use Benford’s Law to detect fraud, one

has to demonstrate that correctly stated data conform to Benford’s Law, while manipulated

data follow a different distribution. Benford’s Law is most applicable to fraud schemes

where one person invents all the numbers and all the numbers are fictitious such as the frau-

dulent vendor scheme described in Nigrini (1999). Benford’s Law is also applicable when

many people all have the same incentive to manipulate numbers in the same way, and the

effect on the digits of those numbers is predictable. Examples would include the upper and

lower limits in the income tax code, and a good example is described in Christian and Gupta

(1993). Another example is the early rounding-up study of Carslaw (1988).

Author’s Note

The Compustat and the CRSP data are available from Wharton Research Data Services (WRDS) at

https://wrds-web.wharton.upenn.edu/wrds/

Acknowledgments

I hereby extend my thanks to the Editor-in-Chief Bharat Sarath, the associate editor, and the reviewer

for their insightful comments and suggestions. The article also benefitted from the comments of work-

shop participants at West Virginia University and from discussions with Ted Hill, Steven Miller,

Dick Riley, and Jack Dorminey. I would also like to express my gratitude to my dissertation chairman

Wallace Wood and to Marty Levy on my dissertation committee for believing, all those years ago,

that the largely unknown phenomenon called Benford’s Law was worthy of being a dissertation topic.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/

or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this

article.

References

Alali, F., & Romero, S. (2013). Benford’s Law: Analyzing a decade of financial data. Journal of

Emerging Technologies in Accounting, 10, 1-39.

Ball, R., & Brown, P. (1968). An empirical evaluation of accounting income numbers. Journal of

Accounting Research, 6, 159-178.

Benford, F. (1938). The law of anomalous numbers. Proceedings of the American Philosophical

Society, 78, 551-572.

Berger, A., & Hill, T. (2015). An introduction to Benford’s Law. Princeton, NJ: Princeton University

Press.

Carslaw, C. (1988). Anomalies in income numbers: Evidence of goal oriented behavior. The

Accounting Review, 63, 321-327.

Christian, C., & Gupta, S. (1993). New evidence on ‘‘secondary evasion.’’ The Journal of the

American Taxation Association, 15(1), 72-93.

Diekmann, A., & Jann, B. (2010). Benford’s Law and fraud detection: Facts and legends. German

Economic Review, 11, 397-401.

Drake, P., & Nigrini, M. (2000). Computer assisted analytical procedures using Benford’s Law.

Journal of Accounting Education, 18, 127-146.

556 Journal of Accounting, Auditing & Finance

Fama, E. (1965a). The behavior of stock-market prices. The Journal of Business, 38, 34-105.

Fama, E. (1965b). Random walks in stock market prices. Financial Analysts Journal, 21(5), 55-59.

Gonzalez-Garcia, J., & Pastor, G. (2009, January). Benford’s Law and macroeconomic data quality

(IMF Working Paper No. 09/10). Washington, DC: International Monetary Fund.

Leemis, L., Schmeiser, B., & Evans, D. (2000). Survival distributions satisfying Benford’s Law. The

American Statistician, 54(3), 1-6.

Ley, E. (1996). On the peculiar distribution of the US stock indices first digits. The American

Statistician, 50, 311-314.

Malkiel, B. (1973). A random walk down Wall Street. New York, NY: W.W. Norton.

Nigrini, M. (1999). Fraud detection: I’ve got your number. Journal of Accountancy, 187(5), 79-83.

Nigrini, M. (2005). An assessment of the change in the incidence of earnings management around the

Enron-Andersen episode. Review of Accounting and Finance, 4(1), 92-110.

Nigrini, M. (2011). Forensic analytics: Methods and techniques for forensic accounting investiga-

tions. Hoboken, NJ: John Wiley.

Nigrini, M., & Miller, S. (2007). Benford’s Law applied to hydrology data: Results and relevance to

other geophysical data. Mathematical Geology, 39, 469-490.

Nigrini, M., & Mittermaier, L. (1997). The use of Benford’s Law as an aid in analytical procedures.

Auditing: A Journal of Practice & Theory, 16(2), 52-67.

Pinkham, R. (1961). On the distribution of first significant digits. Annals of Mathematical Statistics,

32, 1223-1230.

Rauch, B., Göttsche, M., Brähler, G., & Engel, S. (2011). Fact and fiction in EU-governmental eco-

nomic data. German Economic Review, 12, 243-255.

Rauch, B., Göttsche, M., & Langenegger, S. (2014). Detecting problems in military expenditure data

using digital analysis. Defense and Peace Economics, 25, 97-111.

Rodriguez, R. (2004). Reducing false alarms in the detection of human influence on data. Journal of

Accounting, Auditing, & Finance, 19, 141-158.

Thomas, J. (1989). Unusual patterns in reported earnings. The Accounting Review, 64, 773-787.

Nigrini 557

Copyright of Journal of Accounting, Auditing & Finance is the property of Sage Publications Inc. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.