Important alert: (current site time 5/25/2013 4:30:40 AM EDT) Brand new: recommended reading for every programming language on the site | revamped discussion board | 7 Source Code CDs

# An application of Benford's Law using SQL

Email
 Submitted on: 10/22/2009 9:03:44 AM By: Justin Beerley Level: Beginner User Rating: By 2 Users Compatibility: SQL Server 2000, SQL Server 7.0, SQL Server 6.5 and earlier, Oracle Views: 10729

Benford's Law maintains several purposes, one of which is forensic accounting. Condsider the fact that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's law ought to show up any anomalous results. Hence, one could use it as an indicator of fraud relating to accounting and expenses. The example I've submitted is a general overview to accomplish the latter.

 Terms of Agreement:    By using this article, you agree to the following terms...    You may use this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge. You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.    You may link to this article from another website, but ONLY if it is not wrapped in a frame.  You will abide by any additional copyright restrictions which the author may have placed in the article or article's description. ``` The law states that the frequency of the leading digit in a set of true randomly distributed numbers should occur with a probability that follows: P(d) = log_b(d+1)-log_b(d) = log_b(1+1/d) In base 10, this means that the number 1 should appear 30% of the time as the first digit, 2 should appear 17% of the time as the first digit, and so on up until 9, which should appear 4.5% of the time as the first digit. The full base 10 list is as follows: 1 - 30.1% 2 - 17.6% 3 - 12.5% 4 - 9.7% 5 - 7.9% 6 - 6.7% 7 - 5.8% 8 - 5.1% 9 - 4.6% This is general overview of the SQL statements you will need to execute: // Returns the number of values, not including NULL values, of the specified column SELECT COUNT(column_name) FROM table_name // Returns the number of values with 1 as the leading digit, which we will call 1_cnt SELECT COUNT(column_name) FROM table_name WHERE column_name LIKE '1%' // Returns the number of values with 2 as the leading digit, which we will call 2_cnt SELECT COUNT(column_name) FROM table_name WHERE column_name LIKE '2%' . . . // Returns the number of values with 9 as the leading digit, which we will call 9_cnt SELECT COUNT(column_name) FROM table_name WHERE column_name LIKE '9%' Now that we are done the counting, we can check the frequency of the leading digit against the chart. For our example, we will use a standard deviation of SQRT(2) to check if our data corresponds with the frequency of digits given by Benford Law. Example: if 28 < (1_cnt/total_cnt)*100 < 32 then The data fits for the numbers with leading digit 1 if 15 < (2_cnt/total_cnt)*100 < 19 then The data fits for the numbers with leading digit 2 . . . if 2 < (9_cnt/total_cnt)*100 < 6 then The data fits for the numbers with leading digit 9```

Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

(The article with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)

10/29/2009 4:02:50 PMAndy Archer

Very interesting. Downright mystical. I was not aware of this phenomenon prior to reading your submission. I took a look at the street number column in the Address table of one of our production Oracle databases. A couple of million rows. Sure enough, "1" as the first character got the big count, with smaller and smaller counts for subsequent first character digits. You learn something new every day...
(If this comment was disrespectful, please report it.)

12/31/2009 3:45:24 AMthucnv

khong
(If this comment was disrespectful, please report it.)

12/31/2009 3:48:29 AMthucnv

ko
(If this comment was disrespectful, please report it.)