Trying to parse out vCard name entry with Regex

你离开我真会死。 提交于 2019-12-11 10:58:16

问题


I have the following Regex to parse out a vCard: (VB)

        Dim options As New RegexOptions()
        options = RegexOptions.IgnoreCase Or RegexOptions.Multiline Or RegexOptions.IgnorePatternWhitespace
        regex = New Regex("(?<strElement>(N)) (;[^:]*)? (;CHARSET=UTF-8)? (:(?<strSurname>([^;\n\r]*))) (;(?<strGivenName>([^;\n\r]*)))? (;(?<strMidName>([^;\n\r]*)))? (;(?<strPrefix>([^;\n\r]*)))? (;(?<strSuffix>[^;\n\r]*))?", options)
        m = regex.Match(s)
        If m.Success Then
            Surname = m.Groups("strSurname").Value
            GivenName = m.Groups("strGivenName").Value
            MiddleName = m.Groups("strMidName").Value
            Prefix = m.Groups("strPrefix").Value
            Suffix = m.Groups("strSuffix").Value
        End If

It works when I have a vCard like:

BEGIN:VCARD
VERSION:2.1
N:Bacon;Kevin;Francis;Mr.;Jr.
FN: Mr. Kevin Francis Bacon Jr.
ORG:Movies.com

But it doesn't work correctly when the vCard is like this:

BEGIN:VCARD
VERSION:2.1
N:Bacon;Kevin
FN:Kevin Bacon
ORG:Movies.com

The regex assigns the <strSuffix> to Kevin, and not <strGivenName> like I wanted. How can I fix this?

Adapted regex came from here: vCard regex


回答1:


You should be good with regex pattern

^N(?:;(?!CHARSET=UTF-8)[^:]*|)(?:;CHARSET=UTF-8|):(?<strSurname>[^;\n\r]*);?(?<strGivenName>[^;\n\r]*);?(?<strMidName>[^;\n\r]*);?(?<strPrefix>[^;\n\r]*);?(?<strSuffix>[^;\n\r]*)

See this example and this example.




回答2:


I would avoid parsing each line with a unique regex, but instead tokenize each line. Then have the resulting process determine if there are missing (optional) items. Here is a pattern which simply tokenizes each line by its code and data items (use explicit capture & multiline).

^(?<Code>[^:]+)(:)((?<Tokens>[^;\r\n]+)(;?))+

That puts the emphasis on creating individual code objects which handle the business logic of whether data is missing or not. Failures are no longer regex failures, but business logic post processing failures which IMHO are better to debug and maintain.



来源:https://stackoverflow.com/questions/13420851/trying-to-parse-out-vcard-name-entry-with-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!