Friday, September 05, 2008

Benford's Law


I was recently looking at a tax blog carnival because it included one of my posts.

One of the other articles was about how, if you get "creative" with your taxes, there is a greater chance that the IRS will flag it because of Benford's Law.

It turns out that Benford's Law is pretty interesting. After reading the article above, I found out more info from the Benford's Law's Wikipedia entry.

The Law deals with the distribution of the first digit in any list of numbers. This could be random numbers, addresses, stock prices, population counts, etc.

Now, your first thought would probably be that the first digit would be evenly distributed between the digits. But, this turns out to be wrong.

Instead, there is about a 30% chance that the first digit would be 1 and the frequency decreases for larger digits - a 9 only has about a 4.6% chance to be a leading digit.

The distribution is logarithmic.

The exact formula is that a digit d (where d is from 1...b-1 and in a base b >=2) occurs as a leading digit with the probability equal to logb((d+1)/d).

So, for our normal base 10 numbering system, this means that the probability of 1 being the leading digit would be logbase10(2/1) = logbase10(2). This would be about .3

I checked this on my unix system using nawk:

[580]-> nawk 'BEGIN {print 10^(.3)}'
1.99526

For 9, it would be logbase10(10/9) = logbase10(1.111), and thus about .046:

[581]-> nawk 'BEGIN {print 10^(.046)}'
1.11173

0 comments: