How to extract substring in parentheses using Regex pattern

前端 未结 6 539
孤城傲影
孤城傲影 2020-12-09 11:36

This is probably a simple problem, but unfortunately I wasn\'t able to get the results I wanted...

Say, I have the following line:

\"Wouldn\'t It B         


        
相关标签:
6条回答
  • 2020-12-09 11:55

    Edit: After examining your document, the problem is that there are non-breaking spaces before the parentheses, not regular spaces. So this regex should work: ""[ \xA0]*\(([^)]+)\)

    ""       'quote (twice to escape)
    [ \xA0]* 'zero or more non-breaking (\xA0) or a regular spaces
    \(       'left parenthesis
    (        'open capturing group
    [^)]+    'anything not a right parenthesis
    )        'close capturing group
    \)       'right parenthesis
    

    In a function:

    Public Function GetStringInParens(search_str As String)
    Dim regEx As New VBScript_RegExp_55.RegExp
    Dim matches
        GetStringInParens = ""
        regEx.Pattern = """[ \xA0]*\(([^)]+)\)"
        regEx.Global = True
        If regEx.test(search_str) Then
            Set matches = regEx.Execute(search_str)
            GetStringInParens = matches(0).SubMatches(0)
        End If
    End Function
    
    0 讨论(0)
  • 2020-12-09 11:59

    This function worked on your example string:

    Function GetArtist(songMeta As String) As String
      Dim artist As String
      ' split string by ")" and take last portion
      artist = Split(songMeta, "(")(UBound(Split(songMeta, "(")))
      ' remove closing parenthesis
      artist = Replace(artist, ")", "")
    End Function
    

    Ex:

    Sub Test()
    
      Dim songMeta As String
    
      songMeta = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
    
      Debug.Print GetArtist(songMeta)
    
    End Sub
    

    prints "B. Wilson/Asher/Love" to the Immediate Window.

    It also solves the problem alan mentioned. Ex:

    Sub Test()
    
      Dim songMeta As String
    
      songMeta = """Wouldn't (It Be) Nice"" (B. Wilson/Asher/Love)"
    
      Debug.Print GetArtist(songMeta)
    
    End Sub
    

    also prints "B. Wilson/Asher/Love" to the Immediate Window. Unless of course, the artist names also include parentheses.

    0 讨论(0)
  • 2020-12-09 12:06

    I think you need a better data file ;) You might want to consider pre-processing the file to a temp file for modification, so that outliers that don't fit your pattern are modified to where they'll meet your pattern. It's a bit time consuming to do, but it is always difficult when a data file lacks consistency.

    0 讨论(0)
  • 2020-12-09 12:09

    Not strictly an answer to your question, but sometimes, for things this simple, good ol' string functions are less confusing and more concise than Regex.

    Function BetweenParentheses(s As String) As String
        BetweenParentheses = Mid(s, InStr(s, "(") + 1, _
            InStr(s, ")") - InStr(s, "(") - 1)
    End Function
    

    Usage:

    Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
    'B. Wilson/Asher/Love
    

    EDIT @alan points our that this will falsely match the contents of parentheses in the song title. This is easily circumvented with a little modification:

    Function BetweenParentheses(s As String) As String
        Dim iEndQuote As Long
        Dim iLeftParenthesis As Long
        Dim iRightParenthesis As Long
    
        iEndQuote = InStrRev(s, """")
        iLeftParenthesis = InStr(iEndQuote, s, "(")
        iRightParenthesis = InStr(iEndQuote, s, ")")
    
        If iLeftParenthesis <> 0 And iRightParenthesis <> 0 Then
            BetweenParentheses = Mid(s, iLeftParenthesis + 1, _
                iRightParenthesis - iLeftParenthesis - 1)
        End If
    End Function
    

    Usage:

    Debug.Print BetweenParentheses("""Wouldn't It Be Nice"" (B. Wilson/Asher/Love)")
    'B. Wilson/Asher/Love
    Debug.Print BetweenParentheses("""Don't talk (yell)""")
    ' returns empty string
    

    Of course this is less concise than before!

    0 讨论(0)
  • 2020-12-09 12:14

    This a nice regex

    ".*\(([^)]*)
    

    In VBA/VBScript:

    Dim myRegExp, ResultString, myMatches, myMatch As Match
    Dim myRegExp As RegExp
    Set myRegExp = New RegExp
    myRegExp.Pattern = """.*\(([^)]*)"
    Set myMatches = myRegExp.Execute(SubjectString)
    If myMatches.Count >= 1 Then
        Set myMatch = myMatches(0)
        If myMatch.SubMatches.Count >= 3 Then
            ResultString = myMatch.SubMatches(3-1)
        Else
            ResultString = ""
        End If
    Else
        ResultString = ""
    End If
    

    This matches

    Put Your Head on My Shoulder
    

    in

    "Don't Talk (Put Your Head on My Shoulder)"  
    

    Update 1

    I let the regex loose on your doc file and it matches as requested. Quite sure the regex is fine. I'm not fluent in VBA/VBScript but my guess is that's where it goes wrong

    If you want to discuss the regex some further that's fine with me. I'm not eager to start digging into this VBscript API which looks arcane.

    Given the new input the regex is tweaked to

    ".*".*\(([^)]*)
    

    So that it doesn't falsely match (Put Your Head on My Shoulder) which appears inside the quotes.

    enter image description here

    0 讨论(0)
  • 2020-12-09 12:16

    This another Regex tested with a vbscript (?:\()(.*)(?:\)) Demo Here


    Data = """Wouldn't It Be Nice"" (B. Wilson/Asher/Love)"
    wscript.echo Extract(Data)
    '---------------------------------------------------------------
    Function Extract(Data)
    Dim strPattern,oRegExp,Matches
    strPattern = "(?:\()(.*)(?:\))"
    Set oRegExp = New RegExp
    oRegExp.IgnoreCase = True 
    oRegExp.Pattern = strPattern
    set Matches = oRegExp.Execute(Data) 
    If Matches.Count > 0 Then Extract = Matches(0).SubMatches(0)
    End Function
    '---------------------------------------------------------------
    
    0 讨论(0)
提交回复
热议问题