问题
I have the following Regex to parse out a vCard: (VB)
Dim options As New RegexOptions()
options = RegexOptions.IgnoreCase Or RegexOptions.Multiline Or RegexOptions.IgnorePatternWhitespace
regex = New Regex("(?<strElement>(N)) (;[^:]*)? (;CHARSET=UTF-8)? (:(?<strSurname>([^;\n\r]*))) (;(?<strGivenName>([^;\n\r]*)))? (;(?<strMidName>([^;\n\r]*)))? (;(?<strPrefix>([^;\n\r]*)))? (;(?<strSuffix>[^;\n\r]*))?", options)
m = regex.Match(s)
If m.Success Then
Surname = m.Groups("strSurname").Value
GivenName = m.Groups("strGivenName").Value
MiddleName = m.Groups("strMidName").Value
Prefix = m.Groups("strPrefix").Value
Suffix = m.Groups("strSuffix").Value
End If
It works when I have a vCard like:
BEGIN:VCARD
VERSION:2.1
N:Bacon;Kevin;Francis;Mr.;Jr.
FN: Mr. Kevin Francis Bacon Jr.
ORG:Movies.com
But it doesn't work correctly when the vCard is like this:
BEGIN:VCARD
VERSION:2.1
N:Bacon;Kevin
FN:Kevin Bacon
ORG:Movies.com
The regex assigns the <strSuffix> to Kevin, and not <strGivenName> like I wanted. How can I fix this?
Adapted regex came from here: vCard regex
回答1:
You should be good with regex pattern
^N(?:;(?!CHARSET=UTF-8)[^:]*|)(?:;CHARSET=UTF-8|):(?<strSurname>[^;\n\r]*);?(?<strGivenName>[^;\n\r]*);?(?<strMidName>[^;\n\r]*);?(?<strPrefix>[^;\n\r]*);?(?<strSuffix>[^;\n\r]*)
See this example and this example.
回答2:
I would avoid parsing each line with a unique regex, but instead tokenize each line. Then have the resulting process determine if there are missing (optional) items. Here is a pattern which simply tokenizes each line by its code and data items (use explicit capture & multiline).
^(?<Code>[^:]+)(:)((?<Tokens>[^;\r\n]+)(;?))+
That puts the emphasis on creating individual code objects which handle the business logic of whether data is missing or not. Failures are no longer regex failures, but business logic post processing failures which IMHO are better to debug and maintain.
来源:https://stackoverflow.com/questions/13420851/trying-to-parse-out-vcard-name-entry-with-regex