问题
After TONS of research, I have found how to parse emoji in realtime using the Twemoji library.
Now, I need to figure out how to identify if there's emoji within some text, grab the position of that emoji and execute the parsing function.
Some example text can be
It is a great day 😀.
Need to find the 😀 within the whole string and use the following function to get its hex code, return the surrogate pairs and parse with the Twemoji library.
function entityForSymbolInContainer(selector) {
var code = data.message.body.codePointAt(0);
var codeHex = code.toString(16);
while (codeHex.length < 4) {
codeHex = "0" + codeHex;
}
return codeHex;
}
// Get emoji hex code
var emoji = entityForSymbolInContainer(data.message.body);
// For given an HEX codepoint, returns UTF16 surrogate pairs
var emoji = twemoji.convert.fromCodePoint(emoji);
// Given a generic string, it will replace all emoji with an <img> tag
var emoji = twemoji.parse(emoji);
I am using the following check to see if there's emoji within the text. Problem is that for a simple grinning face (😀) it doesn't alert me. However, if I type in the "shirt and tie" (👔) it will alert me to that.
var string = "It is a great day 😀.";
var emojiRegex = /([\uE000-\uF8FF]|\uD83C[\uDF00-\uDFFF]|\uD83D[\uDC00-\uDDFF])/g;
if (string.match(emojiRegex)) {
alert("emoji found");
}
Please help on the issue of the regex not picking up the emoji. After that, I should be able to just find that within the string.
Thank you!
回答1:
This post gives a very comprehensive regex for matching emojis with a very good explanation. He bases his regex on the one published by lodash library.
(?:[\u2700-\u27bf]|(?:\ud83c[\udde6-\uddff]){2}|[\ud800-\udbff][\udc00-\udfff]|[\u0023-\u0039]\ufe0f?\u20e3|\u3299|\u3297|\u303d|\u3030|\u24c2|\ud83c[\udd70-\udd71]|\ud83c[\udd7e-\udd7f]|\ud83c\udd8e|\ud83c[\udd91-\udd9a]|\ud83c[\udde6-\uddff]|[\ud83c[\ude01-\ude02]|\ud83c\ude1a|\ud83c\ude2f|[\ud83c[\ude32-\ude3a]|[\ud83c[\ude50-\ude51]|\u203c|\u2049|[\u25aa-\u25ab]|\u25b6|\u25c0|[\u25fb-\u25fe]|\u00a9|\u00ae|\u2122|\u2139|\ud83c\udc04|[\u2600-\u26FF]|\u2b05|\u2b06|\u2b07|\u2b1b|\u2b1c|\u2b50|\u2b55|\u231a|\u231b|\u2328|\u23cf|[\u23e9-\u23f3]|[\u23f8-\u23fa]|\ud83c\udccf|\u2934|\u2935|[\u2190-\u21ff])
https://medium.com/@thekevinscott/emojis-in-javascript-f693d0eb79fb
回答2:
This determines if there is an emoji in the comment.
var unified_emoji_ranges = ['\ud83c[\udf00-\udfff]','\ud83d[\udc00-\ude4f]','\ud83d[\ude80-\udeff]'];
var reg = new RegExp(unified_emoji_ranges.join('|'), 'g');
var string = "It is a great day 😀.";
if (string.match(reg)) {
alert("emoji found");
}
回答3:
The problem:
JavaScript defines strings as sequences of UTF-16 code units, not as sequences of characters or code points.
(quoted from source below)
You have to set up the RegExp with surrogate pairs:
I have found a good solution/exlanation here parsing emoji unicode in javascript that does without an extra library. And here's an online Surrogate Pair Calculator.
And in your case:
/\uD83D\uDE00/
regex101
回答4:
In case anyone is still looking for a solution in JS to find emoji's in string.
Can use the following library (emoji-regex).
Here is an example converting all the emojis to Unicode hexadecimal numerical representation of character in a given string:
import emojiRegex from 'emoji-regex/RGI_Emoji.js';
const emojiRegexPattern = emojiRegex();
const stringThatMightHaveEmojis = ...; //some string that can contain emoji's..
stringThatMightHaveEmojis.replace(emojiRegexPattern,(m, idx) => {
return `${m.codePointAt(0).toString(16)}]`;
})
There are more examples in the documentation of the library.
Plus a helpful article I stumbled upon explaining parsing emoji's, codePointAt can be found here
来源:https://stackoverflow.com/questions/37089427/javascript-find-emoji-in-string-and-parse