I am using PHP Simple HTML DOM Parser http://simplehtmldom.sourceforge.net/
to fetch data like Page Title, Meta Description and Meta Tags from other domains and
If I switch browser encoding to UTF-8, it works.
So you're simply not setting the correct HTTP header to designate your document to be UTF-8 encoded and the browser is interpreting it in some other encoding. Use:
header('Content-Type: text/html; charset=utf-8');
@deceze and @Shakti thanks for your help.
+1 for the article link posted by deceze (Handling Unicode Front to Back in a Web App) and it also worth reading Understanding encoding
After reading your comments, answer and of course those two articles, I finally solved my issue.
I have listed the steps I did so far to solve this issue:
header('Content-Type: text/html; charset=utf-8');
on the top of my init.php file,mysql_set_charset('utf8', $connection_link_id);
$meta_title = htmlentities(trim($meta_title_raw), ENT_QUOTES, 'UTF-8');
Now the issue seems to be solved, BUT I still have to do following thing to solve this issue in FULL.
$source_charset
.iconv()
. Example: iconv($source_charset, "UTF-8", $meta_title_raw);
For getting $source_charset
I probably have to use some tricks or multi checking. Like checking headers and meta tag etc. I found a good answer at Detect encoding
Let me know if there are any improvements or any fault on my steps above.
I had the same problem with Romanian characters. Nothing worked until I used
header('Content-Type: text/html; charset=ISO-8859-2');
ISO-8859-2 being the character set for Eastern European letters. So find the right character set for your language and use it in header.