SQL Server 2000, SQL Server 7.0, SQL Server 6.5 and earlier, Oracle
Views:
10873
Benford's Law maintains several purposes, one of which is forensic accounting. Condsider the fact that people who make up figures tend to distribute their digits fairly uniformly, a simple comparison of first-digit frequency distribution from the data with the expected distribution according to Benford's law ought to show up any anomalous results. Hence, one could use it as an indicator of fraud relating to accounting and expenses. The example I've submitted is a general overview to accomplish the latter.
Terms of Agreement:
By using this article, you agree to the following terms...
You may use
this article in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.
You MAY NOT redistribute this article (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.
You may link to this article from another website, but ONLY if it is not wrapped in a frame.
You will abide by any additional copyright restrictions which the author may have placed in the article or article's description.
The law states that the frequency of the leading digit in a set of true randomly distributed numbers should occur with a probability that follows:
P(d) = log_b(d+1)-log_b(d) = log_b(1+1/d)
In base 10, this means that the number 1 should appear 30% of the time as the first digit, 2 should appear 17% of the time as the first digit, and so on up until 9, which should appear 4.5% of the time as the first digit. The full base 10 list is as follows:
This is general overview of the SQL statements you will need to execute:
// Returns the number of values, not including NULL values, of the specified column
SELECT COUNT(column_name) FROM table_name
// Returns the number of values with 1 as the leading digit, which we will call 1_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '1%'
// Returns the number of values with 2 as the leading digit, which we will call 2_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '2%'
.
.
.
// Returns the number of values with 9 as the leading digit, which we will call 9_cnt
SELECT COUNT(column_name) FROM table_name
WHERE column_name LIKE '9%'
Now that we are done the counting, we can check the frequency of the leading digit against the chart. For our example, we will use a standard deviation of SQRT(2) to check if our data corresponds with the frequency of digits given by Benford Law.
Example:
if 28 < (1_cnt/total_cnt)*100 < 32 then
The data fits for the numbers with leading digit 1
if 15 < (2_cnt/total_cnt)*100 < 19 then
The data fits for the numbers with leading digit 2
.
.
.
if 2 < (9_cnt/total_cnt)*100 < 6 then
The data fits for the numbers with leading digit 9
Very interesting. Downright mystical. I was not aware of this phenomenon prior to reading your submission. I took a look at the street number column in the Address table of one of our production Oracle databases. A couple of million rows. Sure enough, "1" as the first character got the big count, with smaller and smaller counts for subsequent first character digits. You learn something new every day... (If this comment was disrespectful, please report it.)
ko (If this comment was disrespectful, please report it.)
Add Your Feedback
Your feedback will be posted below and an email sent to
the author. Please remember that the author was kind enough to
share this with you, so any criticisms must be stated politely, or they
will be deleted. (For feedback not related to this particular article, please
click here instead.)