问题
I have written this function to auto correct gender to M or F from different values in a string array. It works fine but my manager told me to use Dictionary which he said is more efficient. But I have no idea. Anyone like to help me to understand how this can be done ? Thanks.
Public Function AutoGender(ByVal dt As DataTable) As DataTable
Dim Gender As String = ""
Dim Mkeywords() As String = {"boy", "boys", "male", "man", "m", "men", "guy"}
Dim Fkeywords() As String = {"girl", "girls", "female", "woman", "f", "women", "chick"}
Dim row As DataRow
For Each row In dt.Rows
If Mkeywords.Contains(row("Gender").ToString.ToLower) Then
Gender = "M"
row("Gender") = Gender
ElseIf Fkeywords.Contains(row("Gender").ToString.ToLower) Then
Gender = "F"
row("Gender") = Gender
End If
Next
Return dt
End Function
回答1:
Here is an example how you could implement the Dictionary(Of String, String)
to lookup whether this synonym is known or not:
Shared GenderSynonyms As Dictionary(Of String, String) = New Dictionary(Of String, String) From
{{"boy", "M"}, {"boys", "M"}, {"male", "M"}, {"man", "M"}, {"m", "M"}, {"men", "M"}, {"guy", "M"},
{"girl", "F"}, {"girls", "F"}, {"female", "F"}, {"woman", "F"}, {"f", "F"}, {"women", "F"}, {"chick", "F"}}
Public Function AutoGender(ByVal dt As DataTable) As DataTable
If dt.Columns.Contains("Gender") Then
For Each row As DataRow In dt.Rows
Dim oldGender = row.Field(Of String)("Gender").ToLower
Dim newGender As String = String.Empty
If GenderSynonyms.TryGetValue(oldGender, newGender) Then
row.SetField("Gender", newGender)
End If
Next
End If
Return dt
End Function
Note that i've used the collection initializer to fill the Dictionary that is a convenient way to use literals to initialize collections. You could also use the Add method.
Edit: Just another approach that might be more concise is using two HashSet(Of String), one for the male synonyms and one for the female:
Shared maleSynonyms As New HashSet(Of String) From
{"boy", "boys", "male", "man", "m", "men", "guy"}
Shared femaleSynonyms As New HashSet(Of String) From
{"girl", "girls", "female", "woman", "f", "women", "chick"}
Public Function AutoGender(ByVal dt As DataTable) As DataTable
If dt.Columns.Contains("Gender") Then
For Each row As DataRow In dt.Rows
Dim oldGender = row.Field(Of String)("Gender").ToLower
Dim newGender As String = String.Empty
If maleSynonyms.Contains(oldGender) Then
row.SetField("Gender", "M")
ElseIf femaleSynonyms.Contains(oldGender) Then
row.SetField("Gender", "F")
End If
Next
End If
Return dt
End Function
A HashSet
must also be unique, so it cannot contain duplicate Strings
(like the key in the Dictionary
), but it's not a key-value pair but only a set.
回答2:
Simply change both of your arrays to dictionaries, and do a ContainsKey
instead of Contains
.
Dim Mkeywords = New Dictionary(Of String, String) From
{{"boy", ""}, {"boys", ""}, {"male", ""}, {"man", ""}, {"m", ""}, {"men", ""}, {"guy", ""}}
(and follow suit for the female)
However, as you might've noticed I put in all those empty strings. This is because dictionaries have values as well as keys, but since we're not using the values, I made them empty strings. To have the same O(1)
lookup but avoiding all the extraneous values, you can use a HashSet in a similar manner.
All you have to change now is, like I said, use ContainsKey
(or for HashSet
if you go that route, it's still just Contains
):
If Mkeywords.ContainsKey(row("Gender").ToString.ToLower) Then
One final note: this will only be "more efficient" if the data starts growing in size considerably. Right now as you have it, with only those few elements, it may even be slower to use a dictionary.
来源:https://stackoverflow.com/questions/11135488/how-to-use-dictionary-in-vb-net