Regex Split at beginning of line containing word

笑着哭i 提交于 2019-12-12 13:01:06

问题


I'm trying to split a text into paragraphs each time a line contains a certain word. I already managed to split the text at the beginning of that word, but not at the beginning of the line containing that word. what's the right expression?

this is what I have

 string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");

I also want to lose any empty paragraphs in the array.

this is the input

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION INT. - NIGHT

and I want to split it up keeping the same layout but just in paragraphs.

The result I have is

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - 

EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION 

INT. - NIGHT

The new paragraphs start at the word and not at the line.

This is the desired result

Paragraph 1

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

Paragraph 2

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

Paragraph 3

LOCATION INT. - NIGHT

The paragraph should always start at the beginning of the line containing the word INT. or EXT. not at the word.


回答1:


Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

check this text scenario

string text = "INT. LOCATION - DAY\n" +
                "Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
                "LOCATION - EXT.\n" +
                "Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
                "LOCATION INT. - NIGHT\n";

            string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);

            for (int i = 0; i < res.Count(); i++)
            {
                int lineNumber = i + 1;   
                Console.WriteLine("paragraph " + lineNumber + "\n"  + res[i]);
            }


#paragraph 1
#INT. LOCATION - DAY
#Lorem ipsum dolor sit amet, consectetur adipiscing elit.

#paragraph 2
#LOCATION - EXT.
#Morbi cursus dictum tempor. Phasellus mattis at massa non porta.

#paragraph 3
#LOCATION INT. - NIGHT


来源:https://stackoverflow.com/questions/30705320/regex-split-at-beginning-of-line-containing-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!