问题
my first Q here.
I have a log file that has multiple similar strings as hits:
Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---
Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---
Region: US
OnlineID: Cu128yi
---Start---
KINGDOM HEARTS HD 1.5 +2.5 ReMIX
---END---
Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---
Region: US
OnlineID: CAJ5Y
---Start---
Madden NFL 18: G.O.A.T. Super Bowl Edition
---END---
I wanna find all hits which contain fifa (fifa as a string). Fifa is example, I need to find all hits which contain some strings.
The last thing I could find is this regex: (?s)(?=^\r\n)(.*?)(fifa)(.*?)(?=\r\n\r\n)
But when I use this, it selects all hits including hits with no fifa, until it finds a fifa in a hit, so it selects more than 1 hit sometimes like this.
Second problem is I can't use .*
in (fifa) bcz it causes wrong selection.
What can I do now?
The right output should be like this:
Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---
Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---
Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---
回答1:
You can use
(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z)
See the regex demo
Details
(?si)
-s
makes.
match line break chara (same as.
matches newline ON) and case insensitive matching ON(?:^(?<!.)|\R{2})
- matches start of a file or two line break sequences\K
- omits the matched line breaks(?:(?!\R{2}).)*?
- any char, 0 or more occurrences but as few as possible, not starting a double line break sequence\bfifa\b
- whole wordfifa
.*?
- any 0+ chars as few as possible(?=\R{2}|\z)
- up to the double line break or end of file.
Now, if you want to match a paragraph with fifa
and then 20
on some of its line, use
(?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?(?-s:\bfifa\b.*\b20\b).*?(?=\R{2}|\z)
The (?-s:\bfifa\b.*\b20\b)
is a modifier group where .
stops matching line breaks, and it matches a whole word fifa
, then any 0+ chars other than line break chars, as many as possible, and then a 20
as a whole word.
See this regex demo.
回答2:
It would be better not to use regex for this entire problem. I would use something simpler to cut the log file into pieces, 1 piece per paragraph.
Then use a regex to see if each paragraph is a "hit" or not.
Here is some Python code:
# read the file contents into a string
log_text = open('/input/log/file/path/here', 'r').read().strip()
# split the string into separate paragraphs
paragraphs = log_text.split('\n\n')
# filter the paragraphs to the ones you want
filtered_paragraphs = filter(is_wanted, paragraphs)
# recombine the filtered paragraphs into a new log string
new_log_text = '\n\n'.join(filtered_paragraphs)
# output new log text into new file
open('/output/log/file/path/here', 'w').write(new_log_text)
and of course you will need to define the is_wanted
function:
import re
def is_wanted(paragraph):
# discard first three and last line to get paragraph content
p_content = '\n'.join(paragraph.split('\n')[3:-1])
# input any regex pattern here, such as 'FIFA'. You can pass it into the function as a variable if you need it to be customizable
return bool(re.search(r'FIFA', p_content))
来源:https://stackoverflow.com/questions/64840204/regex-to-find-a-multi-line-string-that-includes-another-string-between-lines