Get contents of BODY without DOCTYPE, HTML, HEAD and BODY tags

后端 未结 7 1573
情话喂你
情话喂你 2021-02-12 17:45

What I am trying to do is include an HTML file within a PHP system (not a problem) but that HTML file also needs to be usable on its own, for various reasons, so I need to know

7条回答
  •  广开言路
    2021-02-12 18:21

    You may want to use PHP tidy extension which can fix invalid XHTML structures (in which case DOMDocument load crashes) and also extract body only:

    $tidy = new tidy();
    $htmlBody = $tidy->repairString($html, array(
        'output-xhtml' => true,
        'show-body-only' => true,
    ), 'utf8');
    

    Then load extracted body into DOMDocument:

    $xml = new DOMDocument();
    $xml->loadHTML($htmlBody);
    

    Then traverse, extract, move around XML nodes etc .. and save:

    $output = $xml->saveXML();
    

提交回复
热议问题