VB icon

Jaro-Winkler String Comparison

Email
Submitted on: 2/21/2015 8:58:00 AM
By: Ernanie F. Gregorio Jr. (from psc cd)  
Level: Beginner
User Rating: By 2 Users
Compatibility: VB 6.0
Views: 4812
 
     I think this is the first Jaro-Winkler Algorithm here on PSC. Description: The Jaro–Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995) and mainly used in the area of record linkage (duplicate detection). The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match. References: http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance lingpipe http://lingpipe-blog.com/2006/12/13/code-spelunking-jaro-winkler-string-comparison/
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
				
'**************************************
' Name: Jaro-Winkler String Comparison
' Description:I think this is the first Jaro-Winkler Algorithm here on PSC.
Description:
The Jaro–Winkler distance (Winkler, 1990) is a measure of similarity between two strings. It is a variant of the Jaro distance metric (Jaro, 1989, 1995) and mainly used in the area of record linkage (duplicate detection). The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match.
References:
http://en.wikipedia.org/wiki/Jaro%E2%80%93Winkler_distance
<B>lingpipe </B>
http://lingpipe-blog.com/2006/12/13/code-spelunking-jaro-winkler-string-comparison/
' By: Ernanie F. Gregorio Jr. (from psc cd)
'**************************************

Public Function JaroWrinkler(ByVal prmKeyword As String, prmCompareTo As String) As Double
Dim iProximity As Integer ' set the number of adjacent characters to compare to
Dim i As Integer
Dim x As Integer
Dim iFrom As Integer
Dim iTo As Integer
Dim iMatchCharacters As Integer
Dim iTransposeCount As Integer
Dim iJaro As Double
prmCompareTo = UCase$(Trim$(prmCompareTo))
prmKeyword = UCase$(Trim$(prmKeyword))
If prmCompareTo <> prmKeyword Then ' check if the two words are the same
If InStr(1, prmCompareTo, prmKeyword) <= 0 Then
' compute for the proximity of character checking
' allows matching characters to be up to X number of characters away.
If Len(prmCompareTo) >= Len(prmKeyword) Then
iProximity = (Len(prmCompareTo) / 2) - 1
Else
iProximity = (Len(prmKeyword) / 2) - 1
End If
For i = 1 To Len(prmKeyword)
' this is the index of the character to be compared to
iTo = (i + iProximity) - 1
' get the left most side character based on the iProximity
If i <= iProximity Then
iFrom = 1
Else
iFrom = i - iProximity + 1
End If
' start the letter by letter comparison
For x = iFrom To iTo
If Mid$(prmKeyword, i, 1) = Mid$(prmCompareTo, x, 1) Then
If i = x Then
iMatchCharacters = iMatchCharacters + 1
GoTo exitfor
End If
iMatchCharacters = iMatchCharacters + 1
iTransposeCount = iTransposeCount + 1
Exit For
End If
Next
exitfor:
Next
iTransposeCount = iTransposeCount \ 2
If iMatchCharacters > 0 Then
x = 0
For i = 1 To 4
If Mid$(prmKeyword, i, 1) = Mid$(prmCompareTo, i, 1) Then
x = x + 1
Else
Exit For
End If
Next
iJaro = ((iMatchCharacters / Len(prmKeyword)) + _
(iMatchCharacters / Len(prmCompareTo)) + _
((iMatchCharacters - iTransposeCount) / iMatchCharacters)) / 3
If x > 0 Then
JaroWrinkler = iJaro + 0.1 * x * (1 - iJaro)
Else
JaroWrinkler = iJaro
End If
Else
JaroWrinkler = 0
End If
Else ' return 1 result if the keyword is within the search string
JaroWrinkler = 1
End If
Else ' return a 1 result if the string are the same
JaroWrinkler = 1
End If
End Function


Other 2 submission(s) by this author

 


Report Bad Submission
Use this form to tell us if this entry should be deleted (i.e contains no code, is a virus, etc.).
This submission should be removed because:

Your Vote

What do you think of this code (in the Beginner category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor (See voting log ...)
 

Other User Comments


 There are no comments on this submission.
 

Add Your Feedback
Your feedback will be posted below and an email sent to the author. Please remember that the author was kind enough to share this with you, so any criticisms must be stated politely, or they will be deleted. (For feedback not related to this particular code, please click here instead.)
 

To post feedback, first please login.