问题
How can I do fuzzy string matching within PowerShell scripts?
I have different sets of names of people scraped from different sources and have them stored in an array. When I add a new name, I like to compare the name with existing name and if they fuzzily matches, I like to consider them to be the same. For example, with data set of:
@("George Herbert Walker Bush",
"Barbara Pierce Bush",
"George Walker Bush",
"John Ellis (Jeb) Bush" )
I like to see following outputs from the given input:
"Barbara Bush" -> @("Barbara Pierce Bush")
"George Takei" -> @("")
"George Bush" -> @("George Herbert Walker Bush","George Walker Bush")
At minimum, I like to see matching to be case insensitive, and also flexible enough to handle some level of misspelling if possible.
As far as I can tell, standard libraries does not provide such functionalities. Is there an easy-to-install module which can accomplish this?
回答1:
Searching at PowerShell Gallery with term "fuzzy", I found this package: Communary.PASM.
It can be simply installed with:
PS> Install-Package Communary.PASM
The project is found here in GitHub. I simply looked at this examples file for reference.
Here is my examples:
$colors = @("Red", "Orange", "Yellow", "Green", "Blue", "Violet", "Sky Blue" )
PS> $colors | Select-FuzzyString Red
Score Result
----- ------
300 Red
This is a perfect match, with 100 max score for each characters.
PS> $colors | Select-FuzzyString gren
Score Result
----- ------
295 Green
It tolerate a little missing characters.
PS> $colors | Select-FuzzyString blue
Score Result
----- ------
400 Blue
376 Sky Blue
Multiple values can be returned with different scores.
PS> $colors | Select-FuzzyString vioret
# No output
But it does not tolerate a little bit of misspell. Then I also tried Select-ApproximateString
:
PS> $colors | Select-ApproximateString vioret
Violet
This has different API that it only returns a single match or nothing. Also it may not return anything when Select-FuzzyString
does.
This was tested with PowerShell Core v6.0.0-beta.9 on MacOS and Communary.PASM 1.0.43.
来源:https://stackoverflow.com/questions/47256003/fuzzy-string-match-in-powershell