Is there a way to measure string similarity in Google BigQuery

前端未结

关注

 7  2253

礼貌的吻别

I\'m wondering if anyone knows of a way to measure string similarity in BigQuery.

Seems like would be a neat function to have.

My case is i need to compare

相关标签:

7条回答

你的背包

2020-12-03 16:25

Ready to use shared UDFs - Levenshtein distance:

SELECT fhoffa.x.levenshtein('felipe', 'hoffa')
 , fhoffa.x.levenshtein('googgle', 'goggles')
 , fhoffa.x.levenshtein('is this the', 'Is This The')

6  2  0

Soundex:

SELECT fhoffa.x.soundex('felipe')
 , fhoffa.x.soundex('googgle')
 , fhoffa.x.soundex('guugle')

F410  G240  G240

Fuzzy choose one:

SELECT fhoffa.x.fuzzy_extract_one('jony' 
  , (SELECT ARRAY_AGG(name) 
   FROM `fh-bigquery.popular_names.gender_probabilities`) 
  #, ['john', 'johnny', 'jonathan', 'jonas']
)

johnny

How-to:

https://medium.com/@hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83

0 讨论(0)

上一页 1 2