latin1

PostgreSQL ignores dashes when ordering

你离开我真会死。 提交于 2019-12-05 15:29:35
I have a PostgreSQL 8.4 database that is created with the da_DK.utf8 locale. dbname=> show lc_collate; lc_collate ------------ da_DK.utf8 (1 row) When I select something from a table where I order on a character varying column I get a strange behaviour IMO. When ordering the result PostgreSQL ignores dashes that prefixes the value, e.g.: select name from mytable order by name asc; May return something like name ---------------- Ad... Ae... Ag... - Ak.... At.... The dash prefix seems to be ignored. I can fix this issue by converting the column to latin1 when ordering: select name from mytable

MySql varchar change from Latin1 to UTF8

感情迁移 提交于 2019-12-04 16:10:16
In a mySql table I'm using Latin1 character set to store text in a varchar field. As our website now is supported in more countries we need support for UTF8 instead. What will happen if I change these fields to UTF8 instead? Is it secure to do this or will it mess up the data inside these fields? Is it something I need to think about when changing the field to UTF8? Thanks! MySQL handles this nicely: CREATE TEMPORARY TABLE t1 ( c VARCHAR(10) ) CHARACTER SET ="latin1"; INSERT INTO t1 VALUES ("æøå"); SELECT * FROM t1; # 'æøå' ALTER TABLE t1 CHARACTER SET = "utf8"; SELECT * FROM t1; # 'æøå' DROP

Strip down everything, except alphanumeric and European characters in PHP

天涯浪子 提交于 2019-12-04 10:25:19
I am working on validating my commenting script, and I need to strip down all non-alphanumeric chars except those used in Western Europe. My plan is to regex out all non-alphanumeric characters with: preg_replace("/[^A-Za-z0-9 ]/", '', $string); But that so far strips out all European characters and a £ sign, so "Café Rouge" becomes "Caf Rouge". How can I add an array of Euro chars to the above regex. The array is: £, €, á, à, â, ä, æ, ã, å, è, é, ê, ë, î, ï, í, ì, ô, ö, ò, ó, ø, õ, û, ü, ù, ú, ÿ, ñ, ß I use UTF-8 SOLUTION: $comment = preg_replace('/[^\p{Latin}\d\s\p{P}]/u', '', $comment); and

NodeJS decodeURIComponent not working properly

家住魔仙堡 提交于 2019-12-04 09:28:18
When I tryed to decode the string below in nodeJS using decodeURLCompnent: var decoded = decodeURI('Ulysses%20Guimar%C3%A3es%20-%20lado%20par'); console.log(decoded); I got Ulysses Guimarães - lado par Instead of Avenida Ulysses Guimarães - lado par But when I use the same code on the client side (browser) I can get the right char 'ã'. Is there a way to convert from ã to ã in a Node script? I cannot reproduce it in 0.10 or 0.11 versions of node. You can convert first to second using new Buffer('Ulysses Guimarães - lado par', 'binary').toString('utf8') , but it's a workaround, not a solution

How do I convert a column to ASCII on the fly without saving to check for matches with an external ASCII string?

扶醉桌前 提交于 2019-12-04 03:49:17
问题 I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that input. The problem here is that some names have 'weird' characters like the é in Renée and the user doesn't wanna type the weird character but the normal ASCII substitute e . In PHP I convert the input string to ASCII using iconv (just in case someone types weird characters). In the database however I should also convert

Python converting latin1 to UTF8

核能气质少年 提交于 2019-12-04 02:34:06
In Python 2.7, how do you convert a latin1 string to UTF-8. For example, I'm trying to convert é to utf-8. >>> "é" '\xe9' >>> u"é" u'\xe9' >>> u"é".encode('utf-8') '\xc3\xa9' >>> print u"é".encode('utf-8') é The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9) The UTF-8 byte encoding for is: c3a9 The latin byte encoding is: e9 How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é? To decode a byte sequence from latin 1 to Unicode, use the .decode() method : >>> '\xe9'.decode('latin1') u'\xe9' Python uses \xab escapes for

Convert character from UTF-8 to ISO-8859-1 manually

纵饮孤独 提交于 2019-12-04 02:00:35
问题 I have the character "ö". If I look in this UTF-8 table I see it has the hex value F6 . If I look in the Unicode table I see that "ö" has the indices E0 and 16 . If I add both I get the hex value of the code point of F6 . This is the binary value 1111 0110 . 1) How do I get from the hex value F6 to the indices E0 and 16 ? 2) I don't know how to come from F6 to the two bytes C3 B6 ... Because I didn't got the results I tried to go the other way. "ö" is represented in ISO-8859-1 as "ö". In the

When to use utf-8 and when to use latin1 in MySQL?

十年热恋 提交于 2019-12-03 12:48:53
I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct? I am working on a site that I hope will be used globally. Do I absolutely need to have utf-8 ? Or will I be able to get away with using latin1? Also, I tried to change some tables from latin1 to utf8 but I got this error: Speficief key was too long; max key length is 1000 bytes Does anyone know the solution to this? And should I really solve that or may latin1 be enough? Thanks, Alex it takes 1 byte to store a character in

Convert character from UTF-8 to ISO-8859-1 manually

笑着哭i 提交于 2019-12-01 10:39:26
I have the character "ö". If I look in this UTF-8 table I see it has the hex value F6 . If I look in the Unicode table I see that "ö" has the indices E0 and 16 . If I add both I get the hex value of the code point of F6 . This is the binary value 1111 0110 . 1) How do I get from the hex value F6 to the indices E0 and 16 ? 2) I don't know how to come from F6 to the two bytes C3 B6 ... Because I didn't got the results I tried to go the other way. "ö" is represented in ISO-8859-1 as "ö". In the UTF-8 table I can see that "Ã" has the decimal value 195 and "¶" has the decimal value 182 . Converted

How to detect latin1 and UTF-8?

不羁的心 提交于 2019-11-30 09:06:47
问题 I am extracting strings from an XML file, and even though it should be pure UTF-8, it is not. My idea was to #!/usr/bin/perl use warnings; use strict; use Encode qw(decode encode); use Data::Dumper; my $x = "m\x{e6}gtig"; my $y = "m\x{c3}\x{a6}gtig"; my $a = encode('UTF-8', $x); my $b = encode('UTF-8', $y); print Dumper $x; print Dumper $y; print Dumper $a; print Dumper $b; if ($x eq $y) { print "1\n"; } if ($x eq $a) { print "2\n"; } if ($a eq $y) { print "3\n"; } if ($a eq $b) { print "4\n"