Get contents of BODY without DOCTYPE, HTML, HEAD and BODY tags

后端未结

关注

 7  1590

What I am trying to do is include an HTML file within a PHP system (not a problem) but that HTML file also needs to be usable on its own, for various reasons, so I need to know

相关标签:

7条回答

广开言路

2021-02-12 18:21
You may want to use PHP tidy extension which can fix invalid XHTML structures (in which case DOMDocument load crashes) and also extract body only:
```
$tidy = new tidy();
$htmlBody = $tidy->repairString($html, array(
    'output-xhtml' => true,
    'show-body-only' => true,
), 'utf8');
```
Then load extracted body into DOMDocument:
```
$xml = new DOMDocument();
$xml->loadHTML($htmlBody);
```
Then traverse, extract, move around XML nodes etc .. and save:
```
$output = $xml->saveXML();
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

谎友^

2021-02-12 18:28

A solution with only one instance of DOMDocument and without loops

$d = new DOMDocument();
$d->loadHTML(file_get_contents('/path/to/my.html'));
$body = $d->getElementsByTagName('body')->item(0);
echo $d->saveHTML($body);

0 讨论(0)

不要未来只要你来

2021-02-12 18:30

Use a DOM parser. this is not tested but ought to do what you want

$domDoc = new DOMDocument();
$domDoc.loadHTMLFile('/path/to/file');
$body = $domDoc->GetElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child){
    echo $child->C14N(); //Note this cannonicalizes the representation of the node, but that's not necessarily a bad thing
}

If you want to avoid cannonicalization, you can use this version (thanks to @Jared Farrish)

0 讨论(0)

时光说笑

2021-02-12 18:31

This may be a solution. I tried it and it works fine.

function parseHTML(string) {
      var   parser = new DOMParser
     , result = parser.parseFromString(string, "text/html");
      return result.firstChild.lastChild.firstChild;
    }

0 讨论(0)

栀梦

2021-02-12 18:32

$site = file_get_contents("http://www.google.com/");

preg_match("/<body[^>]*>(.*?)<\/body>/is", $site, $matches);

echo($matches[1]);

0 讨论(0)

无人及你

2021-02-12 18:39

Use DOMDocument to keep what you need rather than strip what you don't need (PHP >= 5.3.6)

$d = new DOMDocument;
$d->loadHTMLFile($fileLocation);
$body = $d->getElementsByTagName('body')->item(0);
// perform innerhtml on $body by enumerating child nodes 
// and saving them individually
foreach ($body->childNodes as $childNode) {
  echo $d->saveHTML($childNode);
}

0 讨论(0)

1 2 下一页