Javascript Find Emoji in String and Parse

吃可爱长大的小学妹 提交于 2021-02-04 17:19:07

问题


After TONS of research, I have found how to parse emoji in realtime using the Twemoji library.

Now, I need to figure out how to identify if there's emoji within some text, grab the position of that emoji and execute the parsing function.

Some example text can be

It is a great day 😀.

Need to find the 😀 within the whole string and use the following function to get its hex code, return the surrogate pairs and parse with the Twemoji library.

function entityForSymbolInContainer(selector) {
    var code = data.message.body.codePointAt(0);
    var codeHex = code.toString(16);
    while (codeHex.length < 4) {
        codeHex = "0" + codeHex;
    }

    return codeHex;
}

// Get emoji hex code
    var emoji = entityForSymbolInContainer(data.message.body);
// For given an HEX codepoint, returns UTF16 surrogate pairs
    var emoji = twemoji.convert.fromCodePoint(emoji);
// Given a generic string, it will replace all emoji with an <img> tag
    var emoji = twemoji.parse(emoji);

I am using the following check to see if there's emoji within the text. Problem is that for a simple grinning face (😀) it doesn't alert me. However, if I type in the "shirt and tie" (👔) it will alert me to that.

var string = "It is a great day 😀.";
var emojiRegex = /([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g;

if (string.match(emojiRegex)) {
    alert("emoji found");
}

Please help on the issue of the regex not picking up the emoji. After that, I should be able to just find that within the string.

Thank you!


回答1:


This post gives a very comprehensive regex for matching emojis with a very good explanation. He bases his regex on the one published by lodash library.

(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])

https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb




回答2:


This determines if there is an emoji in the comment.

var unified_emoji_ranges = ['\ud83c[\udf00-\udfff]','\ud83d[\udc00-\ude4f]','\ud83d[\ude80-\udeff]'];

var reg = new RegExp(unified_emoji_ranges.join('|'), 'g');

var string = "It is a great day 😀.";

if (string.match(reg)) {
    alert("emoji found");
}



回答3:


The problem:

JavaScript defines strings as sequences of UTF-16 code units, not as sequences of characters or code points.

(quoted from source below)

You have to set up the RegExp with surrogate pairs:

I have found a good solution/exlanation here parsing emoji unicode in javascript that does without an extra library. And here's an online Surrogate Pair Calculator.

And in your case:

/\uD83D\uDE00/

regex101




回答4:


In case anyone is still looking for a solution in JS to find emoji's in string.

Can use the following library (emoji-regex).

Here is an example converting all the emojis to Unicode hexadecimal numerical representation of character in a given string:

import emojiRegex  from 'emoji-regex/RGI_Emoji.js';
const emojiRegexPattern = emojiRegex();
const stringThatMightHaveEmojis = ...; //some string that can contain emoji's..

stringThatMightHaveEmojis.replace(emojiRegexPattern,(m, idx) => {
      return `${m.codePointAt(0).toString(16)}]`;
    })

There are more examples in the documentation of the library.

Plus a helpful article I stumbled upon explaining parsing emoji's, codePointAt can be found here



来源:https://stackoverflow.com/questions/37089427/javascript-find-emoji-in-string-and-parse

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!