Important alert: (current site time 7/15/2013 7:43:42 PM EDT)
 

article

An application of Benford's Law using SQL

Email
Submitted on: 10/22/2009 9:03:44 AM
By: Justin Beerley  
Level: Beginner
User Rating: By 2 Users
Compatibility: SQL Server 2000, SQL Server 7.0, SQL Server 6.5 and earlier, Oracle
Views: 10873
 
     Benford's Law maintains several purposes, one of which is forensic accounting. Condsider the fact that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's law ought to show up any anomalous results. Hence, one could use it as an indicator of fraud relating to accounting and expenses. The example I've submitted is a general overview to accomplish the latter.

 
 
Terms of Agreement:   
By using this article, you agree to the following terms...   
  1. You may use this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
  2. You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.   
  3. You may link to this article from another website, but ONLY if it is not wrapped in a frame. 
  4. You will abide by any additional copyright restrictions which the author may have placed in the article or article's description.
				
The law states that the frequency of the leading digit in a set of true randomly distributed numbers should occur with a probability that follows:
P(d) = log_b(d+1)-log_b(d) = log_b(1+1/d)

In base 10, this means that the number 1 should appear 30% of the time as the first digit, 2 should appear 17% of the time as the first digit, and so on up until 9, which should appear 4.5% of the time as the first digit. The full base 10 list is as follows:

1 - 30.1%
2 - 17.6%
3 - 12.5%
4 - 9.7%
5 - 7.9%
6 - 6.7%
7 - 5.8%
8 - 5.1%
9 - 4.6%

This is general overview of the SQL statements you will need to execute:

// Returns the number of values, not including NULL values, of the specified column
SELECT COUNT(column_name) FROM table_name

// Returns the number of values with 1 as the leading digit, which we will call 1_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '1%'

// Returns the number of values with 2 as the leading digit, which we will call 2_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '2%'
.
.
.
// Returns the number of values with 9 as the leading digit, which we will call 9_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '9%'

Now that we are done the counting, we can check the frequency of the leading digit against the chart. For our example, we will use a standard deviation of SQRT(2) to check if our data corresponds with the frequency of digits given by Benford Law.

Example:
if 28 < (1_cnt/total_cnt)*100 < 32 then
The data fits for the numbers with leading digit 1
if 15 < (2_cnt/total_cnt)*100 < 19 then
The data fits for the numbers with leading digit 2
.
.
.
if 2 < (9_cnt/total_cnt)*100 < 6 then
The data fits for the numbers with leading digit 9


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this article (in the Beginner category)?
(The article with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments

10/29/2009 4:02:50 PMAndy Archer

Very interesting. Downright mystical. I was not aware of this phenomenon prior to reading your submission. I took a look at the street number column in the Address table of one of our production Oracle databases. A couple of million rows. Sure enough, "1" as the first character got the big count, with smaller and smaller counts for subsequent first character digits. You learn something new every day...
(If this comment was disrespectful, please report it.)

 
12/31/2009 3:45:24 AMthucnv

khong
(If this comment was disrespectful, please report it.)

 
12/31/2009 3:48:29 AMthucnv

ko
(If this comment was disrespectful, please report it.)

 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular article, please click here instead.)
 

To post feedback, first please login.