Javascript RegExp for splitting text into sentences and keeping the delimiter

前端 未结 5 1971
青春惊慌失措
青春惊慌失措 2020-11-28 08:31

I am trying to use javascript\'s split to get the sentences out of a string but keep the delimiter eg !?.

So far I have

sentences = text.split(/[\\\         


        
相关标签:
5条回答
  • 2020-11-28 09:22

    A slight improvement on mircealungu's answer:

    string.match(/[^.?!]+[.!?]+[\])'"`’”]*/g);
    
    • There's no need for the opening parenthesis at the beginning.
    • Punctuation like '...', '!!!', '!?' etc. are included inside sentences.
    • Any number of square close brackets and close parentheses are included. [Edit: different closing quotation marks added]
    0 讨论(0)
  • 2020-11-28 09:23

    Try this instead:-

    sentences = text.split(/[\\.!\?]/);
    

    ? is a special char in regular expressions so need to be escaped.

    Sorry I miss read your question - if you want to keep delimiters then you need to use match not split see this question

    0 讨论(0)
  • 2020-11-28 09:26

    The following is a small addition to Larry's answer which will match also paranthetical sentences:

    text.match(/\(?[^\.\?\!]+[\.!\?]\)?/g);
    

    applied on:

    text = "If he's restin', I'll wake him up! (Shouts at the cage.) 
    'Ello, Mister Polly Parrot! (Owner hits the cage.) There, he moved!!!"
    

    giveth:

    ["If he's restin', I'll wake him up!", " (Shouts at the cage.)", 
    " 'Ello, Mister Polly Parrot!", " (Owner hits the cage.)", " There, he moved!!!"]
    
    0 讨论(0)
  • 2020-11-28 09:32

    You need to use match not split.

    Try this.

    var str = "I like turtles. Do you? Awesome! hahaha. lol!!! What's going on????";
    var result = str.match( /[^\.!\?]+[\.!\?]+/g );
    
    var expect = ["I like turtles.", " Do you?", " Awesome!", " hahaha.", " lol!!!", " What's going on????"];
    console.log( result.join(" ") === expect.join(" ") )
    console.log( result.length === 6);
    
    0 讨论(0)
  • 2020-11-28 09:36

    Improving on Mia's answer here is a version which also includes ending sentences with no punctuation:

    string.match(/[^.?!]+[.!?]+[\])'"`’”]*|.+/g)
    
    0 讨论(0)
提交回复
热议问题