Is there a way to measure string similarity in Google BigQuery

前端 未结 7 2253
礼貌的吻别
礼貌的吻别 2020-12-03 15:35

I\'m wondering if anyone knows of a way to measure string similarity in BigQuery.

Seems like would be a neat function to have.

My case is i need to compare

相关标签:
7条回答
  • 2020-12-03 16:25

    Ready to use shared UDFs - Levenshtein distance:

    SELECT fhoffa.x.levenshtein('felipe', 'hoffa')
     , fhoffa.x.levenshtein('googgle', 'goggles')
     , fhoffa.x.levenshtein('is this the', 'Is This The')
    
    6  2  0
    

    Soundex:

    SELECT fhoffa.x.soundex('felipe')
     , fhoffa.x.soundex('googgle')
     , fhoffa.x.soundex('guugle')
    
    F410  G240  G240
    

    Fuzzy choose one:

    SELECT fhoffa.x.fuzzy_extract_one('jony' 
      , (SELECT ARRAY_AGG(name) 
       FROM `fh-bigquery.popular_names.gender_probabilities`) 
      #, ['john', 'johnny', 'jonathan', 'jonas']
    )
    
    johnny
    

    How-to:

    • https://medium.com/@hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83
    0 讨论(0)
提交回复
热议问题