My problem is to remove emoji from a string, but not CJK (Chinese, Japanese, Korean) characters from a string using regex. I tried to use this regex:
REGEX =
One more alternative
"Scheiße! I hate emoji
This very short Regex covers all Emoji in getemoji.com so far:
[\u{1F300}-\u{1F5FF}|\u{1F1E6}-\u{1F1FF}|\u{2700}-\u{27BF}|\u{1F900}-\u{1F9FF}|\u{1F600}-\u{1F64F}|\u{1F680}-\u{1F6FF}|\u{2600}-\u{26FF}]
CARE the answer from Aray have some side effects.
"-".gsub(/[^\p{L}\s]+/, '').squeeze(' ').strip
=> ""
even when this is suppose to be a simple minus (-)
REGEX = /[^\u{1F600}-\u{1F6FF}\s]/
or
REGEX = /[\u{1F600}-\u{1F6FF}\s]/
REGEX = /[\u{1F600}-\u{1F6FF}]/
REGEX = /[^\u{1F600}-\u{1F6FF}]/
because your original regex seems to indicate you try to find everything that is not an amoji and not a whitespace and I don't know why would you want to do it.
Also:
the emoji are 1F300-1F6FF rather than 1F600-1F6FF; you may want to change that
if you want to remove all astral characters (for example you deal with a software that doesn't support all of Unicode), you should use 10000-10FFFF.
EDIT: You almost certainly want REGEX = /[\u{1F600}-\u{1F6FF}]/
or similar. Your original regex matched everything that is not a whitespace, and not in range 0-\u1F6F
. Since spaces are whitespace, and English letters are in range 0-\u1F6F
, and Chinese characters are in neither, the regex matched Chinese characters and removed them.