问题
I'm looking for a maximum number of unicode combining characters that appear after a non-combining one in a realistic natural text.
I know that in unicode text there can be an arbitrary number of combinings placed anywhere in the text. However, I am writing a specialized application that has to operate under constrained resources and because of that and other technical reasons displaying an arbitrary number of combining chars after a non-combining one is not an option. However I would still like to display natural languages properly if possible and support for a small number of combinings should not be a problem.
My intuition that natural languages don't need more than some two or three combinings after a proper char, but I'm not sure and can't find any source on that number.
回答1:
Ok, for a lack of a better answer, here's what I did (for future reference if needed):
I ended up using a SmallVec -like thing with a threshold of 8 bytes before allocation and some 50 bytes upper limit (text stored in UTF-8). That should make everyone happy I think and performance doesn't suffer.
Take those numbers with a pinch of salt, they are arbitrary and I might tune them anyway.
来源:https://stackoverflow.com/questions/50272889/what-is-a-realistic-maximum-number-of-unicode-combining-characters