TOP

Fuzzy Lookup for LibreOffice Calc

FUZZYLOOKUP() Description

We all know the well-known VLOOKUP() function that helps us combine data from different tables. However, this function has one significant drawback - it cannot combine similar values, that is, if there is an error in the word, then there will be no match.

To be able to combine approximate values, we can create our own function. Let's call it FuzzyLookup().

Let's imagine that we have two lists. Both have approximately the same elements, but they may be written slightly differently. The task is to find for each element in the first list the most similar element from the second list, i.e. implement a search for the nearest maximally similar text.

The big question, in this case, is what to consider the criterion of "similarity". Just the number of matching characters? Is the number of consecutive matches? Should character case or spaces be considered? What to do with different arrangement of words in a phrase? There are many options and there is no single solution - for each situation one or the other will be better than others.

In our case, we implement the simplest option - search by the maximum number of character matches. It's not perfect, but it works pretty well for most situations.


BASIC code for function FuzzyLookup

To add the FuzzyLookup function, open the Tools - Macros - Edit Macros... menu, select Module1 and copy the following text into the module:


Function FuzzyLOOKUP(LookupValue As String, SrcTable As Variant, Optional SimThreshold As Single) As String 
  ' moonexcel.com.ua 
  Dim Str       As String  
  Dim CellArray As Variant
  Dim StrArray  As Variant
  
  If IsMissing(SimThreshold) Then SimThreshold  = 0
  
  Str      = LCase(LookupValue)
  StrArray = Split(Str)
  StrExt   = UBound(StrArray)  
            
  For Each Cell In SrcTable
                          
    CellArray = Split(LCase(Cell))
    CellExt   = UBound(CellArray)    	    
    CellRate  = 0
	
    ' We check each word in the search phrase 
    For x = 0 To StrExt 
    
      StrWord = StrArray(x)	  
      If Len(StrWord) = 0 Then GoTo continue_x
	  MaxStrWordRate = 0
	  
      ' We check each word in the next cell from the original table of values 
      For i = 0 To CellExt
        
        CellWord = CellArray(i)
		If Len(CellWord) = 0 Then GoTo continue_i
   
        FindCharNum = OccurrenceNum(StrWord, CellWord)
        StrWordRate = FindCharNum / Max(Len(StrWord),Len(CellWord))
		
        If StrWordRate > MaxStrWordRate Then MaxStrWordRate = StrWordRate
		continue_i:
      Next i		    
		    		    
      CellRate = CellRate + MaxStrWordRate
	  continue_x:
    Next x               
        
    ' We keep the best match 
    If CellRate > MaxCellRate Then    
      MaxCellRate = CellRate
      BestCell    = Cell          
       
      FindCharNum = OccurrenceNum(Str, Cell)
      SimRate     = FindCharNum / Max(Len(Str),Len(Cell))
    End If       
        
  Next Cell
    
  IF SimRate >= SimThreshold Then 
    IF SimThreshold = -1 Then
      ReturnValue = BestCell + " (" + Format(SimRate, "0.00") + ")"
    ElseIf SimThreshold = -2 Then
      ReturnValue = Format(SimRate, "0.00")
    Else
      ReturnValue = BestCell
    End If
  Else 
    ReturnValue = ""
  End If    
  
  FuzzyLOOKUP = ReturnValue
End Function


Function OccurrenceNum(ByVal SourceString As String, ByVal TargetString As String)
  For i = 1 To Len(SourceString)	         		        	        
    ' We are looking for the occurrence of each symbol 
    Position = InStr(1, TargetString, Mid(SourceString, i, 1), 1)  
    ' We increase the counter of coincidences 
    If Position > 0 Then		  
      Count = Count + 1
      ' Remove the found symbol 
      TargetString = Left(TargetString, Position - 1) + Right(TargetString, Len(TargetString) - Position)   
    End If
  Next i    
  OccurrenceNum = Count
End Function


Function Max(ByVal value1 As Variant, ByVal value2 As Variant)  
  If value1 > value2 Then
	Result = value1
  Else
	Result = value2
  End If
  Max = Result
End Function

Next, close Macro Editor and return to the LibreOffice Calc worksheet - now you can use our new FuzzyLookup() function.

Using the YouLibreCalc extension

You can also use the function FUZZYLOOKUP() by setting the extension " YouLibreCalc.oxt ". After that, this function will be available in all files that will be opened in LibreOffice Calc.