问题
python and tkinter are processing unicode characters correctly.
But they are not able to display unicode encoded characters correctly.
I am using Python 3.1 and tkinter in Ubuntu. I am trying to use Tamil unicode characters.
All the processing are done correctly. But the display is wrong?
Here is Wrong display as in tkinter
https://docs.google.com/leaf?id=0B7YA7kky_NEoM2U3MzI5NGUtNTk2NC00MzYzLTk1N2YtMTJjYTA0Yjc0MmE1&hl=en_GB&authkey=CKORhugK
Here is Correct display (as in gedit)
https://docs.google.com/leaf?id=0B7YA7kky_NEoNDBmMzYzOWEtMjY5Ny00NWM5LWE0MWYtMTg1ZDVhOGQ2MmEz&hl=en_GB&authkey=CPWhi74J
Can someone please help on this ?
回答1:
It's hard to diagnose a program without code. See if you can boil down the code to something short that exhibits the problem, and post that.
I'm not familiar with Tamil glyphs, and they're pretty small, but looking at the screenshots, it looks like all the glyphs are there but certain glyphs are getting swapped, am I right?
(Hmm, I guess this should have been a "comment", not an "answer". Still finding my way around this site.)
回答2:
I had faced similar problems and discovered I used the Zero Width Joiner (U+200D) to explicitly tell the rendering engine to join two characters. That used to work in 2010 but looks like there have been changes in the rendering engine (that I am now aware of) and now in 2011 I find that having the joiner creates the problem ! (It broke my working code) I had to remove the explicit zero width joiners to have my code work again. Hope this helps.
回答3:
It looks like Tk is mishandling things like 'Class Zero Combining Marks', see: http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf#G124820 (Table 4-4)
I assume one of the sequences that do not show correctly are the codepoints: 0BA9 0BC6 (TAMIL SYLLABLE NNNE), where 0BC6 is a reordrant class zero combining mark according to the Unicode standard, which basically means the glyphs get swapped.
The only way to fix it is to file a bug at the Tk bug tracker and hope it gets fixed.
回答4:
Since I could not/don't know how to comment on others comments and answers, I am typing here.
@Bryan Oakley I do not think font is the problem here, but its rendering is. For example, when I type two unicode characters u0BAE and u0BC6, it should be combined as a single Tamil character displaying "மெ". But I think rendering engine is not present in tkinter for displaying some unicode languages.
@Vamana Yes, Indian languages have a 'combined single character notation', which require two unicode characters as I said above. When I type, say, charA, then charB, display should render into a single character, say charBA. But it displays charAB(which is wrong).
@schlenk Yes you are correct. I initially used IDLE, then tried running python in linux console, both rendered Tamil text wrongly for display. Hence I came to tkinter. Now, it's also in vain. I am currently using file IO. Now I think I should learn how to make a simple web page using python for input and output so that browser would render correctly.
来源:https://stackoverflow.com/questions/5166488/tkinter-cannot-display-unicode-characters-correctly