Split string into sentences - ignoring abbreviations for splitting

后端 未结 2 1213
無奈伤痛
無奈伤痛 2020-12-21 04:29

I\'m trying to split this string into sentences, but I need to handle abbreviations (which have the fixed format x.y. as a word:

content = \"Thi         


        
2条回答
  •  礼貌的吻别
    2020-12-21 05:10

    The solution is to match and capture the abbreviations and build the replacement using a callback:

    var re = /\b(\w\.\w\.)|([.?!])\s+(?=[A-Za-z])/g; 
    var str = 'This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn\'t split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool.';
    var result = str.replace(re, function(m, g1, g2){
      return g1 ? g1 : g2+"\r";
    });
    var arr = result.split("\r");
    document.body.innerHTML = "
    " + JSON.stringify(arr, 0, 4) + "
    ";

    Regex explanation:

    • \b(\w\.\w\.) - match and capture into Group 1 the abbreviation (consisting of a word character, then . and again a word character and a .) as a whole word
    • | - or...
    • ([.?!])\s+(?=[A-Za-z]):
      • ([.?!]) - match and capture into Group 2 either . or ? or !
      • \s+ - match 1 or more whitespace symbols...
      • (?=[A-Za-z]) - that are before an ASCII letter.

提交回复
热议问题