Extracting Text Between Brackets with Regex

后端 未结 4 1356
情书的邮戳
情书的邮戳 2021-01-19 12:24

In sentences like:

\"[x] Alpha

[33] Beta\"

I extract an array of bracketed data as ([x], [33])

using VBA regex Pattern:

\"(\\         


        
相关标签:
4条回答
  • 2021-01-19 13:04

    Try this:

    \[(x)\]|\[(\d*)\]
    

    What you don't want to be captured, don't put them inside (). this is used for grouping

    Explanation

    You will get x and 33 in $1 and $2
    

    Dot Net Sample

    Alright, I prepared it for you , although far away from vb for long. Lots of it might be not needed, yet it might help you to understand it better

    Imports System.Text.RegularExpressions
    
    Module Example
       Public Sub Main()
          Dim text As String = "[x] Alpha      [33] Beta]"
          Dim pattern As String = "\[(x)\]|\[(\d*)\]"
    
          ' Instantiate the regular expression object.
          Dim r As Regex = new Regex(pattern, RegexOptions.IgnoreCase)
    
          ' Match the regular expression pattern against a text string.
          Dim m As Match = r.Match(text)
          Dim matchcount as Integer = 0
          Do While m.Success
             matchCount += 1
             Console.WriteLine("Match" & (matchCount))
             Dim i As Integer
             For i = 1 to 2
                Dim g as Group = m.Groups(i)
                Console.WriteLine("Group" & i & "='" & g.ToString() & "'")
                Dim cc As CaptureCollection = g.Captures
                Dim j As Integer 
                For j = 0 to cc.Count - 1
                  Dim c As Capture = cc(j)
                   Console.WriteLine("Capture" & j & "='" & c.ToString() _
                      & "', Position=" & c.Index)
                Next 
             Next 
             m = m.NextMatch()
          Loop
       End Sub
    End Module
    
    0 讨论(0)
  • 2021-01-19 13:11

    Use capturing around the subpatterns that will fetch you your required value.

    Use

    "\[(x)\]|\[(\d*)\]"
    

    (or \d+ if you need to match at least 1 digit, as * means zero or more occurrences, and + means one or more occurrences).

    Or, use the generic pattern to extract anything inside the square brackets without the brackets:

    "\[([^\][]+)]"
    

    Then, access the right Submatches index by checking the submatch length (since you have an alternation, either of the submatch will be empty), and there you go. Just change your for loop with

    For Each oMatch In .Execute(SourceString)
        ReDim Preserve arrMatches(lngCount)
        If Len(oMatch.SubMatches(0)) > 0 Then
            arrMatches(lngCount) = oMatch.SubMatches(0)
        Else
            arrMatches(lngCount) = oMatch.SubMatches(1)
        End If
        ' Debug.Print arrMatches(lngCount) ' - This outputs x and 33 with your data
        lngCount = lngCount + 1
    Next
    
    0 讨论(0)
  • 2021-01-19 13:16

    Array Without Regex:

    For Each Value In Split(SourceString, Chr(13))
      ReDim Preserve arrMatches(lngCount)
      arrMatches(lngCount) = Split(Split(Value, "]")(0), "[")(1)
      lngCount = lngCount + 1
    Next
    
    0 讨论(0)
  • 2021-01-19 13:22

    With Excel and VBA you can strip the brackets after the regex extraction:

    Sub qwerty()
    
        Dim inpt As String, outpt As String
        Dim MColl As MatchCollection, temp2 As String
        Dim regex As RegExp, L As Long
    
        inpt = "38c6v5hrk[x]537fhvvb"
    
        Set regex = New RegExp
        regex.Pattern = "(\[x\])|(\[\d*\])"
        Set MColl = regex.Execute(inpt)
        temp2 = MColl(0).Value
    
        L = Len(temp2) - 2
        outpt = Mid(temp2, 2, L)
    
        MsgBox inpt & vbCrLf & outpt
    End Sub
    

    0 讨论(0)
提交回复
热议问题