So I have a weird truncate issue! Can\'t find a specific answer on this.
So basically there\'s an issue with an apparent ISO character ½ that truncates the rest of the
Did you call set_charset()
on your MySQLi database connection? It's required to properly use real_escape_string()
.
$db = new mysqli(...);
$db->set_charset('utf8');
Setting session variables in your connection is not enough -- those affect what happens on the server-side. The set_charset
will affect what happens client side.
You can checkout the PHP reference mysqli::real_escape_string
Something in your code isn't handling the string as UTF8. It could be your PHP/HTML, it could be in your connection to the DB, or it could be the DB itself - everything has to be set as UTF8 consistently, and if anything isn't, the string will get truncated exactly as you see when passing across a UTF8/non-UTF8 boundary.
I will assume your DB is UTF8 compliant - that is easiest to check. Note that the collation can be set at the server level, database level, the table level, and the column level within the table. Setting UTF8 collation on the column should override anything else for storage, but the others will still kick in when talking to the DB if they're not also UTF8. If you're not sure, explicitly set the connection to UTF8 after you open it:
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
Now your DB & connection are UTF8, make sure your web page is too. Again, this can be set in more than one place (.htaccess, php.ini). If you're not sure / don't have access, just override whatever PHP is picking up as default at the top of your page:
<?php ini_set('default_charset', 'UTF-8'); ?>
Note that you want the above right at the start, before any text is output from your page. Once text gets output, it is potentially too late to try and specify an encoding - you may already be locked into whatever is default on your server. I also then repeat this in my headers (possibly overkill):
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
</head>
And I override it on forms where I'm taking data as well:
<FORM NAME="utf8-test" METHOD="POST" ACTION="utf8-test.php" enctype="multipart/form-data" accept-charset="UTF-8">"
To be honest, if you've set the encoding at the top, my understanding is that the other overrides aren't required - but I keep them anyway, because it doesn't break anything either, and I'd rather just state the encoding explicitly, than let the server make assumptions.
Finally, you mentioned that in phpMyAdmin you inserted the string and it looked as expected - are you sure though that the phpMyAdmin pages are UTF8? I don't think they are. When I store UTF8 data from my PHP code, it views like raw 8-bit characters in phpMyAdmin. If I take the same string and store it directly in phpMyAdmin, it looks 'correct'. So I'm guessing phpMyAdmin is using the default character set of my local server, not necessarily UTF8.
For example, the following string stored from my web page:
I can¹t wait
Reads like this in my phpMyAdmin:
I can’t wait
So be careful when testing that way, as you don't really know what encoding phpMyAdmin is using for display or DB connection.
If you're still having issues, try my code below. First I create a table to store the text in UTF8:
CREATE TABLE IF NOT EXISTS `utf8_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`my_text` varchar(8000) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
And here's some PHP to test it. It basically takes your input on a form, echoes that input back at you, and stores/retrieves the text from the DB. Like I said, if you view the data directly in phpMyAdmin, you might find it doesn't look right there, but through the page below it should always appear as expected, due to the page & db connection both being locked to UTF8.
<?php
// Override whatever is set in php.ini
ini_set('default_charset', 'UTF-8');
// The following should not be required with the above override
//header('Content-Type:text/html; charset=UTF-8');
// Open the database
$dbh = new PDO('mysql:dbname=utf8db;host=127.0.0.1;charset=utf8', 'root', 'password');
// Set the connection to UTF8
$dbh->setAttribute(PDO::MYSQL_ATTR_INIT_COMMAND, "SET NAMES 'utf8'");
// Tell MySql to do the parameter replacement, not PDO
$dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);
// Throw exceptions (and break the code) if a query is bad
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$id = 0;
if (isset($_POST["StoreText"]))
{
$stmt = $dbh->prepare('INSERT INTO utf8_test (my_text) VALUES (:my_text)');
$stmt->execute(array(':my_text' => $_POST['my_text']));
$id = $dbh->lastInsertId();
}
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional/EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="UTF-8">
<meta http-equiv="Content-type" content="text/html; charset=UTF-8">
<title>UTF-8 Test</title>
</head>
<body>
<?php
// If something was posted, output it
if (isset($_POST['my_text']))
{
echo "POSTED<br>\n";
echo $_POST['my_text'] . "<br>\n";
}
// If something was written to the database, read it back, and output it
if ($id > 0)
{
$stmt = $dbh->prepare('SELECT my_text FROM utf8_test WHERE id = :id');
$stmt->execute(array(':id' => $id));
if ($result = $stmt->fetch())
{
echo "STORED<br>\n";
echo $result['my_text'] . "<br>\n";
}
}
// Create a form to take some user input
echo "<FORM NAME=\"utf8-test\" METHOD=\"POST\" ACTION=\"utf8-test.php\" enctype=\"multipart/form-data\" accept-charset=\"UTF-8\">";
echo "<br>";
echo "<textarea name=\"my_text\" rows=\"20\" cols=\"90\">";
// If something was posted, include it on the form
if (isset($_POST['my_text']))
{
echo $_POST['my_text'];
}
echo "</textarea>";
echo "<br>";
echo "<INPUT TYPE = \"Submit\" Name = \"StoreText\" VALUE=\"Store It\" />";
echo "</FORM>";
?>
<br>
</body>
</html>
Check into mb_convert_encoding if you can't change the way the data is handled. Otherwise, do yourself a favor and get your encoding on the same page before it gets out of hand. UTF-8 uses multibyte characters which aren't recognized in the ISO-8859-1 (Latin) encoding. wikipedia. This page and this page are good sources, as well as this debug table.
Finally, I've run into this when various combinations of htmlentities, htmlspecialchars and html_entity_decode are used..