file_get_contents() Breaks Up UTF-8 Characters

匿名 (未验证) 提交于 2019-12-03 01:18:02

问题:

$html = file_get_contents('http://example.com/foreign.html'); 

How can I solve this?

UPDATE:

I tried both saving the HTML to a file and outputting it with UTF-8 encoding. Both doesn't work so it means file_get_contents() is already returning broken HTML.

UPDATE2:

    Test

回答1:

I had similar problem with polish language

I tried:

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true)); 

I tried:

$fileEndEnd = utf8_encode ( $fileEndEnd ); 

I tried:

$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd ); 

And then -

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8"); 

This last worked perfectly !!!!!!



回答2:

Solution suggested in the comments of the PHP manual entry for file_get_contents

function file_get_contents_utf8($fn) {      $content = file_get_contents($fn);       return mb_convert_encoding($content, 'UTF-8',           mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); } 

You might also try your luck with http://php.net/manual/en/function.mb-internal-encoding.php



回答3:

Alright. I have found out the file_get_contents() is not causing this problem. There's a different reason which I talk about in another question. Silly me.

See this question: Why Does DOM Change Encoding?



回答4:

I think you simply have a double conversion of the character type there :D

It may be, because you opened an html document within a html document. So you have something that looks like this in the end

   Test....... 

The use of mb_detect_encoding therefore may lead you to other issues.



回答5:

Try this too

 $url = 'http://www.domain.com/';     $html = file_get_contents($url);      //Change encoding to UTF-8 from ISO-8859-1     $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html); 


回答6:

And also urlencode did not work because of space char converted to + char. It must be %20 for percent encoding.

This one worked!

   $url = rawurlencode($url);    $url = str_replace("%3A", ":", $url);    $url = str_replace("%2F", "/", $url);     $data = file_get_contents($url); 


回答7:

I am working with 35000 lines of data.

$f=fopen("veri1.txt","r"); $i=0; while(!feof($f)){     $i++;     $line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8");     echo $line; } 

This code convert my strange characters into normal.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!