latin1

How to detect latin1 and UTF-8?

我只是一个虾纸丫 提交于 2019-11-29 12:11:05
I am extracting strings from an XML file, and even though it should be pure UTF-8, it is not. My idea was to #!/usr/bin/perl use warnings; use strict; use Encode qw(decode encode); use Data::Dumper; my $x = "m\x{e6}gtig"; my $y = "m\x{c3}\x{a6}gtig"; my $a = encode('UTF-8', $x); my $b = encode('UTF-8', $y); print Dumper $x; print Dumper $y; print Dumper $a; print Dumper $b; if ($x eq $y) { print "1\n"; } if ($x eq $a) { print "2\n"; } if ($a eq $y) { print "3\n"; } if ($a eq $b) { print "4\n"; } if ($x eq $b) { print "5\n"; } if ($y eq $b) { print "6\n"; } outputs $VAR1 = 'm�gtig'; $VAR1 =

MySQL Workbench charset

匆匆过客 提交于 2019-11-28 22:32:31
Does there exist any way to change the MySQL Workbench charset? My schema uses UTF-8 and when I view the table data (saved as UTF-8) or add data manually, it appears with charset errors, probably MySQL Workbench uses LATIN1. sbrbot I think OP was asking about charset that Workbench uses in its editor and how to setup Workbench to use UTF-8 in GUI - not how to setup default charset used for database table in Workbench. At the moment in Workbench one can set database table charset but regardless of it Workbench will in it's GUI represent data using Latin1 charset!!! Also inserting data into UTF

Converting mysql tables from latin1 to utf8

柔情痞子 提交于 2019-11-28 18:24:05
I'm trying to convert some mysql tables from latin1 to utf8. I'm using the following command, which seems to mostly work. ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; However, on one table I get an error about a duplicate key entry. This is caused by a unique index on a "name" field. It seems when converting to utf8, any "special" characters are indexed as their straight english equivalent. For example, there is already a record with a name field value of "Dru". When converting to utf8, a record with "Drü" is considered a duplicate. The same with "Patrick" and

ISO 8859-1 filename not decoding

左心房为你撑大大i 提交于 2019-11-28 13:51:34
I'm extracting files from MIME messages in a python milter and am running across issues with files named as such: =?ISO-8859-1?Q?Certificado=5FZonificaci=F3n=5F2010=2Epdf?= I can't seem to decode this name into UTF. In order to solve a prior ISO-8859-1 issue, I started passing all filenames to this function: def unicodeConvert(self, fname): normalized = False while normalized == False: try: fname = unicodedata.normalize('NFKD', unicode(fname, 'utf-8')).encode('ascii', 'ignore') normalized = True except UnicodeDecodeError: fname = fname.decode('iso-8859-1')#.encode('utf-8') normalized = True

Python Latin Characters and Unicode

ぐ巨炮叔叔 提交于 2019-11-28 10:25:26
问题 I have a tree structure in which keywords may contain some latin characters. I have a function which loops through all leaves of the tree and adds each keyword to a list under certain conditions. Here is the code I have for adding these keywords to the list: print "Adding: " + self.keyword leaf_list.append(self.keyword) print leaf_list If the keyword in this case is université , then my output is: Adding: université ['universit\xc3\xa9'] It appears that the print function properly shows the

Django character latin1 mysql

我只是一个虾纸丫 提交于 2019-11-28 05:58:36
问题 I create a django application with existing MySQL database, the problem is the encoding to database is latin1_general_c and the utf8 characters is save like this ñ => ñ , ó => ó , i need present the information in that page in correctly form but django show the information of database like this recepción, 4 oficinas, 2 baños i need show like this recepcíon, 4 oficinas, 2 baños For many reasons I can't change the database to utf8 what do I do for show information the correctly way? 回答1:

Python 3 chokes on CP-1252/ANSI reading

只谈情不闲聊 提交于 2019-11-27 16:12:54
I'm working on a series of parsers where I get a bunch of tracebacks from my unit tests like: File "c:\Python31\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 112: character maps to <undefined> The files are opened with open() with no extra arguemnts. Can I pass extra arguments to open() or use something in the codec module to open these differently? This came up with code that was written in Python 2 and converted to 3 with the 2to3 tool. UPDATE: it turns out

How can I detect non-western characters?

陌路散爱 提交于 2019-11-27 14:55:58
问题 I want to disallow certain UTF-8 input (server-side), e.g. eastern languages, where example input might be " 伊 ". However, I do want to continue supporting other latin or "latin-like" characters, such as the welsh ŵ and ŷ, so checking against latin-1 is not possible. What are my options? (if language specific, PHP preferred) Thanks very much. Reasoning: browser support for a lot of non-western characters is often missing (e.g. on a different browser I just see a box in the question above), so

MySQL Workbench charset

眉间皱痕 提交于 2019-11-27 14:21:33
问题 Does there exist any way to change the MySQL Workbench charset? My schema uses UTF-8 and when I view the table data (saved as UTF-8) or add data manually, it appears with charset errors, probably MySQL Workbench uses LATIN1. 回答1: I think OP was asking about charset that Workbench uses in its editor and how to setup Workbench to use UTF-8 in GUI - not how to setup default charset used for database table in Workbench. At the moment in Workbench one can set database table charset but regardless

ISO 8859-1 filename not decoding

纵然是瞬间 提交于 2019-11-27 07:56:48
问题 I'm extracting files from MIME messages in a python milter and am running across issues with files named as such: =?ISO-8859-1?Q?Certificado=5FZonificaci=F3n=5F2010=2Epdf?= I can't seem to decode this name into UTF. In order to solve a prior ISO-8859-1 issue, I started passing all filenames to this function: def unicodeConvert(self, fname): normalized = False while normalized == False: try: fname = unicodedata.normalize('NFKD', unicode(fname, 'utf-8')).encode('ascii', 'ignore') normalized =