javascript+remove arabic text diacritic dynamically

前端 未结 5 489
栀梦
栀梦 2020-12-29 11:51

how to remove dynamically Arabic diacritic I\'m designing an ebook \"chm\" and have multi html pages contain Arabic text but some time the search engine want highlight so

5条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2020-12-29 12:30

    I wrote this function which handles strings with mixed Arabic and English characters, removing special characters (including diacritics) and normalizing some Arabic characters like converting all ة's into ه's.

    normalize_text = function(text) {
    
      //remove special characters
      text = text.replace(/([^\u0621-\u063A\u0641-\u064A\u0660-\u0669a-zA-Z 0-9])/g, '');
    
      //normalize Arabic
      text = text.replace(/(آ|إ|أ)/g, 'ا');
      text = text.replace(/(ة)/g, 'ه');
      text = text.replace(/(ئ|ؤ)/g, 'ء')
      text = text.replace(/(ى)/g, 'ي');
    
      //convert arabic numerals to english counterparts.
      var starter = 0x660;
      for (var i = 0; i < 10; i++) {
        text.replace(String.fromCharCode(starter + i), String.fromCharCode(48 + i));
      }
    
      return text;
    }
    
    

提交回复
热议问题