可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
$html = file_get_contents('http://example.com/foreign.html');
How can I solve this?
UPDATE:
I tried both saving the HTML to a file and outputting it with UTF-8 encoding. Both doesn't work so it means file_get_contents() is already returning broken HTML.
UPDATE2:
Test
回答1:
I had similar problem with polish language
I tried:
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));
I tried:
$fileEndEnd = utf8_encode ( $fileEndEnd );
I tried:
$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd );
And then -
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");
This last worked perfectly !!!!!!
回答2:
Solution suggested in the comments of the PHP manual entry for file_get_contents
function file_get_contents_utf8($fn) { $content = file_get_contents($fn); return mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); }
You might also try your luck with http://php.net/manual/en/function.mb-internal-encoding.php
回答3:
Alright. I have found out the file_get_contents() is not causing this problem. There's a different reason which I talk about in another question. Silly me.
See this question: Why Does DOM Change Encoding?
回答4:
I think you simply have a double conversion of the character type there :D
It may be, because you opened an html document within a html document. So you have something that looks like this in the end
Test.......
The use of mb_detect_encoding
therefore may lead you to other issues.
回答5:
Try this too
$url = 'http://www.domain.com/'; $html = file_get_contents($url); //Change encoding to UTF-8 from ISO-8859-1 $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);
回答6:
And also urlencode did not work because of space char converted to + char. It must be %20 for percent encoding.
This one worked!
$url = rawurlencode($url); $url = str_replace("%3A", ":", $url); $url = str_replace("%2F", "/", $url); $data = file_get_contents($url);
回答7:
I am working with 35000 lines of data.
$f=fopen("veri1.txt","r"); $i=0; while(!feof($f)){ $i++; $line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8"); echo $line; }
This code convert my strange characters into normal.