Text with /n matching in regex and Openrefine

て烟熏妆下的殇ゞ 提交于 2019-12-11 17:55:34

问题


I'm trying to filter a text that has new lines in open refine.

The input is:

Them Spanish girls love me like I'm Aventura
I'm the man, y'all don't get it, do ya?
Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura
Tell Uncle Luke I'm out in Miami, too
Them Spanish girls love me like I'm Aventura

The expected Result would be:

Type of money, everybody acting like they knew ya
Go Uptown, New York City, bitch
Them Spanish girls love me like I'm Aventura

I'm trying to get the line with the keyword and the lines before and after.

My code to do it with standard regex looks like that:

/((.*\n){2})^.*\b(New York)\b.*((.*\n){3})/m

But that doesn't work in open refine. I tried the following, but it only returns 'null'

value.match(/.*(\New York)/.*)

Any one has an idea how I could do it? I really need to keep the lines, so I cant do a replace(/\n/,'') before the match.


回答1:


The brand new OpenRefine 3 has a find() function much more user friendly than match().

I think this regex should do the trick :

value.find(/(.*\n){1}.+New York.+(\n.*){1}/).join('\n')

Result:

If for some reason you prefer to stay in OpenRefine 2.8, Python/Jython offers an alternative:

import re
matches = re.findall(r".+?\n.+New York.+\n.+", value)
return "\n".join(matches)

Result:




回答2:


If you feel like completely avoiding RegEx and simply read the text and write the line before and the line after this is something you can get, if you write the text in Cell A1 in Excel:

Public Sub TestMe()

    Dim inputString As String
    inputString = Range("A1")

    Dim lookForWord As String
    lookForWord = "New York"

    Dim inputArr As Variant
    inputArr = Split(inputString, vbLf)

    Dim line As Variant
    Dim previousLine As String
    Dim foundWord As Boolean
    Dim linesAfter As Long: linesAfter = 1

    For Each line In inputArr
        If InStr(1, line, lookForWord) Then
            previousLine = previousLine & vbCrLf & line
            foundWord = True
        Else
            If foundWord And linesAfter Then
                previousLine = previousLine & vbCrLf & line
                linesAfter = linesAfter - 1
            ElseIf linesAfter Then
                previousLine = line
            End If
        End If
    Next line

    If Not linesAfter Then Debug.Print previousLine

End Sub

The Split() parses the text to an array like this:

the linesAfter variable can tell you how many lines after the word should be displayed.



来源:https://stackoverflow.com/questions/50758805/text-with-n-matching-in-regex-and-openrefine

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!