Category Archives: Mathematics

Benford’s Law: Cool math and fraud detector

Since I enjoyed writing about the St. Petersburg Paradox, here’s an interesting mathematical law for you.

I noticed a weird distribution looking at the market indexes the other day. Today, I see the S&P trading at 1099, Dow at 10273, and NASDAQ at 2266. Strange. Two indexes begin with the number 1 and the third with a 2. Looking at the other indexes, I see lots more beginning with 1s (there’s an 1860, 1865, and 11566). There are indexes that start with most of the other digits, but none starting with 8 or 9. Calculating the numbers:

  • 29% of the indexes begin with a 1
  • 12% with a 2
  • 18% with a 3
  • 12% with a 4
  • 0% with a 5
  • 24% with a 6 (3 of the 4 are Russell indexes, which I bet is not coincidence)
  • 6% with a 7
  • 0% with an 8
  • 0% with a 9

Though not a perfect distribution, you clearly see a lot more of the first three digits (59%) than the last three (6%). Why is this?

It’s Benford’s law at work!

Benford’s law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 almost one third of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than one time in twenty. This distribution of first digits arises whenever a set of values has logarithms that are distributed uniformly, as is approximately the case with many measurements of real-world values.

This counter-intuitive result has been found to apply to a wide variety of data sets, including electricity bills, street addresses, stock prices, population numbers, death rates, lengths of rivers, physical and mathematical constants, and processes described by power laws (which are very common in nature). The result holds regardless of the base in which the numbers are expressed (except for trivial bases), although the exact proportions change.

Here’s a table of the actual distribution:

1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%

Here’s an example showing why Benford’s law works:

For example, if a quantity increases continuously and doubles every year, then it will be twice its original value after one year, four times its original value after two years, eight times its original value after three years, and so on. When this quantity reaches a value of 100, the value will have a leading digit of 1 for a year, reaching 200 at the end of the year. Over the course of the next year, the value increases from 200 to 400; it will have a leading digit of 2 for a little over seven months, and 3 for the remaining five months. In the third year, the leading digit will pass through 4, 5, 6, and 7, spending less and less time with each succeeding digit, reaching 800 at the end of the year. Early in the fourth year, the leading digit will pass through 8 and 9. The leading digit returns to 1 when the value reaches 1000, and the process starts again, taking a year to double from 1000 to 2000. From this example, it can be seen that if the value is sampled at uniformly distributed random times throughout those years, it is more likely to be measured when the leading digit is 1, and successively less likely to be measured with higher leading digits.

One of the more interesting applications of Benford’s law is fraud detection:

Based on the plausible assumption that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford’s law ought to show up any anomalous results.[5] Following this idea, Mark Nigrini showed that Benford’s law could be used as an indicator of accounting and expenses fraud.[6] In the United States, evidence based on Benford’s law is legally admissible in criminal cases at the federal, state, and local levels.[7]

Benford’s law has been invoked as evidence of fraud in the 2009 Iranian elections.[8]

I am always looking out for cool mathematical laws and puzzles. If you know about any, forward them to me or leave a comment.

Advertisements

My favorite paradox

What is your favorite paradox? You mean you don’t have one? I thought everybody had a favorite paradox…

Here is mine, the St. Petersburg paradox, first introduced nearly 300 years ago.

In a game of chance, you pay a fixed fee to enter, and then a fair coin will be tossed repeatedly until a tail first appears, ending the game. The pot starts at 1 dollar and is doubled every time a head appears. You win whatever is in the pot after the game ends. Thus you win 1 dollar if a tail appears on the first toss, 2 dollars if a head appears on the first toss and a tail on the second, 4 dollars if a head appears on the first two tosses and a tail on the third, 8 dollars if a head appears on the first three tosses and a tail on the fourth, etc. In short, you win 2k−1 dollars if the coin is tossed k times until the first tail appears.

What would be a fair price to pay for entering the game? To answer this we need to consider what would be the average payout: With probability 1/2, you win 1 dollar; with probability 1/4 you win 2 dollars; with probability 1/8 you win 4 dollars etc. The expected value is thus

This sum diverges to infinity, and so the expected win for the player of this game, at least in its idealized form, in which the casino has unlimited resources, is an infinite amount of money. This means that the player should almost surely come out ahead in the long run, no matter how much he pays to enter; while a large payoff comes along very rarely, when it eventually does it will typically be far more than the amount of money that he has already paid to play. According to the usual treatment of deciding when it is advantageous and therefore rational to play, one should therefore play the game at any price if offered the opportunity.

Pretty straightforward. The paradox is that, even though the expected win of this “game” is infinite, nobody in their right mind would give more than a few dollars to play.

There are various solutions given for this game, though one stands out in my mind: Nobody has an infinite amount of money!

The classical St. Petersburg lottery assumes that the casino has infinite resources. This assumption is often criticized as unrealistic, particularly in connection with the paradox, which involves the reactions of ordinary people to the lottery. Of course, the resources of an actual casino (or any other potential backer of the lottery) are finite. More importantly, the expected value of the lottery only grows logarithmically with the resources of the casino. As a result, the expected value of the lottery, even when played against a casino with the largest resources realistically conceivable, is quite modest. If the total resources (or total maximum jackpot) of the casino are W dollars, then L = 1 + floor(log2(W)) is the maximum number of times the casino can play before it no longer covers the next bet. The expected value E of the lottery then becomes:

The following table shows the expected value E of the game with various potential backers and their bankroll W (with the assumption that if you win more than the bankroll you will be paid what the bank has):

Backer Bankroll Expected value of lottery
Friendly game $100 $4.28
Millionaire $1,000,000 $10.95
Billionaire $1,000,000,000 $15.93
Bill Gates (2008) $58,000,000,000 $18.84
U.S. GDP (2007) $13.8 trillion $22.79
World GDP (2007) $54.3 trillion $23.77
Googolaire $10100 $166.50

Notes: The estimated net worth of Bill Gates is from Forbes. The GDP data are as estimated for 2007 by the International Monetary Fund, where one trillion dollars equals $10^12 (one million times one million dollars). A “googolaire” is a hypothetical person worth a googol dollars ($10^100).

A rational person might not find the lottery worth even the modest amounts in the above table, suggesting that the naive decision model of the expected return causes essentially the same problems as for the infinite lottery. Even so, the possible discrepancy between theory and reality is far less dramatic.

If you have your own “favorite” paradox, share them with me by posting a comment below.