DOMDocument appendXML with special characters

前端 未结 5 737
庸人自扰 2020-12-07 05:25

I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at specia

  •  有刺的猬
    2020-12-07 05:56

    That's a tricky one because it's actually multiple issues in one.

    Like Tomalak points out, there is no   in XML. So you did the right thing specifying a DOMImplementation, because in XHTML there is  . But, for DOM to know that the document is XHTML, you have load and validate against the DTD. The DTD is located at

    but because there is millions of requests to that page daily, the W3C decided to block access to the page, unless there is a UserAgent sent in the request. To supply a UserAgent you have to create a custom stream context.

    In code:

    // make sure DOM passes a User Agent when it fetches the DTD
                'http' => array(
                    'user_agent' => 'PHP libxml agent',
    // specify the implementation
    $imp = new DOMImplementation;
    // create a DTD (here: for XHTML)
    $dtd = $imp->createDocumentType(
        '-//W3C//DTD XHTML 1.0 Transitional//EN',
    // then create a DOMDocument with the configured DTD
    $dom = $imp->createDocument(NULL, "html", $dtd);
    $dom->encoding = 'UTF-8';
    $fragment = $dom->createDocumentFragment();
        XHTML test

    Some text with a   entity

    ' ); $dom->documentElement->appendChild($fragment); $dom->formatOutput = TRUE; echo $dom->saveXml();

    This still takes some time to complete (dont ask me why) but in the end, you'll get (reformatted for SO)

            XHTML test

    Some text with a   entity

    Also see DOMDocument::validate() problem
