DOMDocument appendXML with special characters

前端 未结 5 737
庸人自扰
庸人自扰 2020-12-07 05:25

I am retreiving some html strings from my database and I would like to parse these strings into my DOMDocument. The problem is, that the DOMDocument gives warnings at specia

5条回答
  •  有刺的猬
    2020-12-07 05:56

    That's a tricky one because it's actually multiple issues in one.

    Like Tomalak points out, there is no   in XML. So you did the right thing specifying a DOMImplementation, because in XHTML there is  . But, for DOM to know that the document is XHTML, you have load and validate against the DTD. The DTD is located at

    http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
    

    but because there is millions of requests to that page daily, the W3C decided to block access to the page, unless there is a UserAgent sent in the request. To supply a UserAgent you have to create a custom stream context.

    In code:

    // make sure DOM passes a User Agent when it fetches the DTD
    libxml_set_streams_context(
        stream_context_create(
            array(
                'http' => array(
                    'user_agent' => 'PHP libxml agent',
                )
            )
        )
    );
    
    // specify the implementation
    $imp = new DOMImplementation;
    
    // create a DTD (here: for XHTML)
    $dtd = $imp->createDocumentType(
        'html',
        '-//W3C//DTD XHTML 1.0 Transitional//EN',
        'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
    );
    
    // then create a DOMDocument with the configured DTD
    $dom = $imp->createDocument(NULL, "html", $dtd);
    $dom->encoding = 'UTF-8';
    $dom->validate();
    
    $fragment = $dom->createDocumentFragment();
    $fragment->appendXML('
        XHTML test
        

    Some text with a   entity

    ' ); $dom->documentElement->appendChild($fragment); $dom->formatOutput = TRUE; echo $dom->saveXml();

    This still takes some time to complete (dont ask me why) but in the end, you'll get (reformatted for SO)

    
    
    
    
        
            
            XHTML test
        
        
            

    Some text with a   entity

    Also see DOMDocument::validate() problem

提交回复
热议问题