Get Full Range Text to String

不羁的心 提交于 2021-01-05 09:12:24

问题


I'm writing a script that looks through my outgoing emails and searches for frequent stylistic errors I make. It locates them using regex and then highlights them yellow. Code:

Public Sub highlightBadForm()

    Dim oWordDoc As Object
    Dim oMatches As Object
    Dim oRange As Range
    Dim strText As String
    Dim lngFindFrom As Long
    Dim varMtch As Variant

    Set oWordDoc = Application.ActiveInspector.WordEditor
    strText = LCase(oWordDoc.Range.Text)

    lngFindFrom = InStr(strText, "from: ")
    If lngFindFrom > 0 Then
        strText = Left(strText, lngFindFrom - 1)
    End If

    Set oMatches = extractMatches(strText, getBadStrs)
    If Not oMatches Is Nothing Then
        For Each varMtch In oMatches
                Set oRange = oWordDoc.Range(varMtch.firstindex, varMtch.firstindex + varMtch.Length)
                oRange.HighlightColorIndex = wdYellow
        Next varMtch
    End If

    Set oRange = Nothing
    Set oWordDoc = Nothing
    Set oMatches = Nothing
End Sub

extractMatches is a private function implementing VBA's RegEx engine. getBadStrs returns the regular expression containing the errors.

It all works unless I've embedded hyperlinks in my email. If so, oWordDoc.Range.Text returns only the anchor text of the links, not the links (and any other characters Word pads the hyperlinks with - I don't know what they might be). As a result, varMtch.firstindex is correct for strText but not oRange so the text it highlights is offset by several characters.

I tried to assemble the full oRange text by looping through the hyperlinks in oRange and adding the link text to the string assuming it would be included in oRange. Something like:

Dim lngEndLnk as Long
Set oRange = oWordDoc.Range

For Each varMtch In oRange.Hyperlinks
    strText = strText & oWordDoc.Range(lngEndLnk, varMtch.Range.Start)
    strText = strText & varMtch.TextToDisplay & varMtch.Name
    lngEndLnk = varMtch.Range.End
Next varMtch

If lngEndLnk = 0 Then
    strText = oRange.text
Else
    strText = strText & oWordDoc.Range(lngEndLnk, oWordDoc.Range.End)
End If

That reduced the offset, but there still is one. Also, if I were to include a linked image in the email, the .Anchor property of varMtch fails so I'd have to come up with another workaround.

Is there a more straightforward way to get a String containing all the characters of the Range object so the regex indices line up?


回答1:


You can access the hyperlink address using the hyperlinks collection of a document:

Private Sub CommandButton1_Click()
    strtext = ActiveDocument.Range.Text
    MsgBox (strtext)
    For Each hLink In Documents(1).Hyperlinks
     MsgBox (hLink.Address)
    Next hLink
End Sub

This first displays all the text in a document, and then loops through each hyperlink displaying its URL.

This can then be used through your RegEx.

For more information and examples, see hyperlinks.




回答2:


I ended up with a similar solution to @slightly snarky. I don't know that it's better so I won't mark it as the solution. Happy for comments on pros and cons, in case there's a clear winner I'm just not seeing.

Personally, I like looping the character collection and probably should use it in my code, this works. I find using the position array to highlight matches much less intuitive than constructing a string from the range. For my purposes padding the string with # in place of the zero-length characters in oWordDoc.Range works, but I also know it won't work for everybody.

Public Sub highlightBadForm()

    Dim oWordDoc As Object
    Dim oMatches As Object
    Dim oRange As Range
    Dim strText As String
    Dim lngFindFrom As Long, lngC As Long, lngPrevLen As Long
    Dim varMtch As Variant

    Set oWordDoc = Application.ActiveInspector.WordEditor

    For lngC = 0 To oWordDoc.Range.End - 1
        strText = strText & oWordDoc.Range(lngC, lngC + 1)
        If Len(strText) = lngPrevLen Then
            strText = strText & "#"
        End If
        lngPrevLen = lngPrevLen + 1
    Next lngC
    strText = LCase(strText)

    lngFindFrom = InStr(strText, "from: ")
    If lngFindFrom > 0 Then
        strText = Left(strText, lngFindFrom - 1)
    End If

    Set oMatches = extractMatches(strText, getBadStrs)
    If Not oMatches Is Nothing Then
        For Each varMtch In oMatches
                Set oRange = oWordDoc.Range(varMtch.FirstIndex, varMtch.FirstIndex + varMtch.Length)
                oRange.HighlightColorIndex = wdYellow
        Next varMtch
    End If

    Set oRange = Nothing
    Set oWordDoc = Nothing
    Set oMatches = Nothing
End Sub



回答3:


The key to this seems to be that when you iterate through a Range looking at each "position" in the range, e.g. via something like

With ActiveDocument.Range
  For i = 0 to .End - 1
    Debug.Print i,Ascw(.Range(i,i+1).Text)
  Next
End With

The Range does contain all the characters in the code of a field such as HYPERLINK field, and all the characters in its result (which might be displayed or it might be hidden text). But in some cases a Range may contain additional characters which are never displayed. For example, if you have a field code such as { SET x 123 } then the Range contains what are in effect the field braces and the code " SET X 123 ", but before the field end brace it also contains a marker followed by the value "123". But the SET field does not display its result.

That makes it difficult to construct a "find" string that's the same length as the Range.

But Range.Text is the same text as the concatenation of all the characters in Range.Characters, and each Character in that Collection is a Range that contains the .Start position

So that lets us get the .Start and .End as the following example shows.

This assumes you are working with the ActiveDocument in Word, and have some text, a HYPERLINK field (say), and possibly other fields, with the text "test1" in various places.

I haven't done much testing so it may still need tweaking.

Sub HighlightFinds()
Dim match As VBScript_RegExp_55.match
Dim matches As VBScript_RegExp_55.MatchCollection
Dim rng1 As Word.Range
Dim rng2 As Word.Range
Set rng1 = ActiveDocument.Content
Set rng2 = ActiveDocument.Content ' or rng1.Duplicate

' When you do this, rng1.Text returns the text of the field *codes* but
' not the field *results*, and so does rng1.Characters
'rng1.TextRetrievalMode.IncludeFieldCodes = True
' when you do this, it returns the *results* but not the *codes*
rng1.TextRetrievalMode.IncludeFieldCodes = False

' You could do both, one after the other, to try to get all the matches

' You might also need to set .TextRetrievalMode.IncludeHiddenText

With New VBScript_RegExp_55.RegExp
  .Pattern = "test1"
  .Global = True
  Set matches = .Execute(rng1.Text)
End With
For Each match In matches
  rng2.SetRange rng1.Characters(match.FirstIndex + 1).Start, rng1.Characters(match.FirstIndex + 1 + match.Length).End
  rng2.HighlightColorIndex = wdYellow
Next
Set matches = Nothing
Set rng2 = Nothing
Set rng1 = Nothing
End Sub


来源:https://stackoverflow.com/questions/62040233/get-full-range-text-to-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!