I am currently creating a chat and can\'t seem to find a way to stop users from posting special characters that mess with formatting of the chat and lagging end users out of the
Contrary to many answers you'll find on StackOverflow, it is trivial to sanitize "Zalgo" text with a regex engine that supports matching on Unicode categories. PHP's preg_*
functions use the PCRE library. If PCRE is compiled with --enable-unicode-properties
, you can strip all Unicode combining marks using:
$sanitized = preg_replace('/\pM/u', '', $zalgo);
Or allow a certain maximum of consecutive combining marks, say one:
$sanitized = preg_replace('/(\pM)\pM+/u', '\1', $zalgo);
Or two:
$sanitized = preg_replace('/(\pM{2})\pM+/u', '\1', $zalgo);
This will turn Zalgo text like
T̫̺̳o̬̜ ì̬͎̲̟nv̖̗̻̣̹̕o͖̗̠̜̤k͍͚̹͖̼e̦̗̪͍̪͍ ̬ͅt̕h̠͙̮͕͓e̱̜̗͙̭ ̥͔̫͙̪͍̣͝ḥi̼̦͈̼v҉̩̟͚̞͎e͈̟̻͙̦̤-m̷̘̝̱í͚̞̦̳n̝̲̯̙̮͞d̴̺̦͕̫ ̗̭̘͎͖r̞͎̜̜͖͎̫͢ep͇r̝̯̝͖͉͎̺e̴s̥e̵̖̳͉͍̩̗n̢͓̪͕̜̰̠̦t̺̞̰i͟n҉̮̦̖̟g̮͍̱̻͍̜̳ ̳c̖̮̙̣̰̠̩h̷̗͍̖͙̭͇͈a̧͎̯̹̲̺̫ó̭̞̜̣̯͕s̶̤̮̩̘.̨̻̪̖͔ ̳̭̦̭̭̦̞́I̠͍̮n͇̹̪̬v̴͖̭̗̖o̸k҉̬̤͓͚̠͍i͜n̛̩̹͉̘̹g͙ ̠̥ͅt̰͖͞h̫̼̪e̟̩̝ ̭̠̲̫͔fe̤͇̝̱e͖̮̠̹̭͖͕l͖̲̘͖̠̪i̢̖͎̮̗̯͓̩n̸̰g̙̱̘̗͚̬ͅ ͍o͍͍̩̮͢f̖͓̦̥ ̘͘c̵̫̱̗͚͓̦h͝a̝͍͍̳̣͖͉o͙̟s̤̞.̙̝̭̣̳̼͟ ̢̻͖͓̬̞̰̦W̮̲̝̼̩̝͖i͖͖͡ͅt̘̯͘h̷̬̖̞̙̰̭̳ ̭̪̕o̥̤̺̝̼̰̯͟ṳ̞̭̤t̨͚̥̗ ̟̺̫̩̤̳̩o̟̰̩̖ͅr̞̘̫̩̼d̡͍̬͎̪̺͚͔e͓͖̝̙r̰͖̲̲̻̠.̺̝̺̟͈ ̣̭T̪̩̼h̥̫̪͔̀e̫̯͜ ̨N̟e҉͔̤zp̮̭͈̟é͉͈ṛ̹̜̺̭͕d̺̪̜͇͓i̞á͕̹̣̻n͉͘ ̗͔̭͡h̲͖̣̺̺i͔̣̖̤͎̯v̠̯̘͖̭̱̯e̡̥͕-m͖̭̣̬̦͈i͖n̞̩͕̟̼̺͜d̘͉ ̯o̷͇̹͕̦f̰̱ ̝͓͉̱̪̪c͈̲̜̺h̘͚a̞͔̭̰̯̗̝o̙͍s͍͇̱͓.̵͕̰͙͈ͅ ̯̞͈̞̱̖Z̯̮̺̤̥̪̕a͏̺̗̼̬̗ḻg͢o̥̱̼.̺̜͇͡ͅ ̴͓͖̭̩͎̗ ̧̪͈̱̹̳͖͙H̵̰̤̰͕̖e̛ ͚͉̗̼̞w̶̩̥͉̮h̩̺̪̩͘ͅọ͎͉̟ ̜̩͔̦̘ͅW̪̫̩̣̲͔̳a͏͔̳͖i͖͜t͓̤̠͓͙s̘̰̩̥̙̝ͅ ̲̠̬̥Be̡̙̫̦h̰̩i̛̫͙͔̭̤̗̲n̳͞d̸ ͎̻͘T̛͇̝̲̹̠̗ͅh̫̦̝ͅe̩̫͟ ͓͖̼W͕̳͎͚̙̥ą̙l̘͚̺͔͞ͅl̳͍̙̤̤̮̳.̢ ̟̺̜̙͉Z̤̲̙̙͎̥̝A͎̣͔̙͘L̥̻̗̳̻̳̳͢G͉̖̯͓̞̩̦O̹̹̺!̙͈͎̞̬ *
into something like
T̫o̬ ì̬nv̖o͖k͍e̦ ̬t̕h̠e̱ ̥ḥi̼v҉e͈-m̷í͚n̝d̴ ̗r̞ep͇r̝e̴s̥e̵n̢t̺i͟n҉g̮ ̳c̖h̷a̧ó̭s̶.̨ ̳I̠n͇v̴o̸k҉i͜n̛g͙ ̠t̰h̫e̟ ̭fe̤e͖l͖i̢n̸g̙ ͍o͍f̖ ̘c̵h͝a̝o͙s̤.̙ ̢W̮i͖t̘h̷ ̭o̥ṳ̞t̨ ̟o̟r̞d̡e͓r̰.̺ ̣T̪h̥e̫ ̨N̟e҉zp̮é͉ṛ̹d̺i̞á͕n͉ ̗h̲i͔v̠e̡-m͖i͖n̞d̘ ̯o̷f̰ ̝c͈h̘a̞o̙s͍.̵ ̯Z̯a͏ḻg͢o̥.̺ ̴ ̧H̵e̛ ͚w̶h̩ọ͎ ̜W̪a͏i͖t͓s̘ ̲Be̡h̰i̛n̳d̸ ͎T̛h̫e̩ ͓W͕ą̙l̘l̳.̢ ̟Z̤A͎L̥G͉O̹!̙ *