How to solve Ampersand (&) conversion issue in XML?

前端 未结 4 620
北恋
北恋 2021-01-21 06:07

I am creating XML file using XMLDocument, but when XML node get \'&\' as data, it converting in \"Ampersand(&)amp;\" but i need actual value that is \'&\', Can anyon

相关标签:
4条回答
  • 2021-01-21 06:18

    A single & is illegal in an XML document (outside of CDATA sections; see @rsp's answer), so this is not possible. If there is a verbatim ampersand in your node data, it has to be encoded as &.

    But it's also no problem because any XML reader will decode & as a literal & when parsing the XML file.

    0 讨论(0)
  • 2021-01-21 06:21

    I guess one can use below line. Option like "repair-full" will take & as & only

    let $InputXML := xdmp:unquote($inputSearchDetails, "", ("format-xml", "repair-full"))

    0 讨论(0)
  • 2021-01-21 06:24

    If it is really necessary to have unescaped ampersands in your XML representation, you can use CDATA sections at the expense of the <![CDATA[ start and ]]> end around your character data.

    0 讨论(0)
  • 2021-01-21 06:24

    I once had this situation where I wanted to preserve raw ampersands in XML. Though your parser may not be the same as mine (I use MarkLogic), the following still applies to your situation with any XML parser:

    Issues with the ampersand character

        The ampersand character can be tricky to construct in an XQuery string, as it is an escape character to the XQuery parser. The ways to construct the ampersand character in XQuery are:
    
        Use the XML entity syntax (for example, &amp;).
        Use a CDATA element (<![CDATA[element content here]]>), which tells the XQuery parser to read the content as character data.
        Use the repair option on xdmp:document-load, xdmp:document-get, or xdmp:unquote.
        https://help.marklogic.com/knowledgebase/article/View/55/0/xquery-ampersand-in-string
    

    Obviously, the first option listed above, which is to escape ampersands, was not the direction we wanted to go. We wanted raw ampersands, not the escaped entity.
    The second option seemed at first a good idea, and I played around with CDATA elements for a very long time. CDATA allows "character data", and everything inside is considered character data, not real XML. After playing around with some examples, I discovered that you could potentially make CDATA return ampersands, but CDATA elements are VERY unfriendly. For instance, creating dynamic CDATA elements is near impossible, you cannot simply wrap an XML structure inside of a CDATA. CDATA is meant to have static, predefined characters inside of it. If there is an effective way of using CDATA, I was not able to find it. Xdmp:quote and xdmp:unquote do the trick that we need, though not in the way that we expect them too. For example:

    let $xml := <rootNode title="test"><firstLevel type="crazy"><secondLevel reason="testing">D&amp;C</secondLevel><secondLevel owner="clint">D&amp;C</secondLevel></firstLevel></rootNode>
    return xdmp:quote($xml//secondLevel[1])
    (: Returns <secondLevel reason="testing">D&amp;C</secondLevel> :)
    

    But

    let $xml := <rootNode title="test"><firstLevel type="crazy"><secondLevel reason="testing">D&amp;C</secondLevel><secondLevel owner="clint">D&amp;C</secondLevel></firstLevel></rootNode>
    return xdmp:quote($xml//secondLevel[1]/node())
    (: Returns D&C - an unescaped ampersand! :)
    

    The second example gives us the unescaped ampersand, but only because the object we are trying to xdmp:quote is text, and not an element. In the first example, if we try to quote the element, it will return us with the text version of the XML, but with D&C - escaped ampersand. Thus, in order to have xdmp:quote give us a string with ampersands, the object with the ampersand must be stand-alone text.
    From here, there are probably a few different directions we could go, and my idea is surely not the most elegant or efficient. But I decided to make a recursive function, parsing all the XML as text, and allowing an xdmp:quote of pure text for ampersands.

    declare function local:stringify($xml)
    {
      if (xdmp:node-kind($xml) eq "text") then
        xdmp:quote($xml, <options xmlns="xdmp:quote">
                      <method>text</method>
                    </options>)
      else if (xdmp:node-kind($xml) eq "element") then
          fn:string-join(
            (fn:concat("<", fn:local-name($xml)),
            for $attr in $xml/@*
              return fn:concat(' ', fn:local-name($attr), '="', $attr, '"'),
            ">",
            for $node in $xml/node()
              return local:stringify($node),
            fn:concat("</", fn:local-name($xml), ">")
          ), "")
      else ()
    };
    
    let $xml := <rootNode title="test"><firstLevel type="crazy"><secondLevel reason="testing">D&amp;C</secondLevel><secondLevel owner="clint">D&amp;C</secondLevel></firstLevel></rootNode>
    
    
    return local:stringify($xml)
    (: Returns <rootNode title="test"><firstLevel type="crazy"><secondLevel reason="testing">D&C</secondLevel><secondLevel owner="clint">D&C</secondLevel></firstLevel></rootNode> :)
    

    So while this solution does not allow an ampersand to exist in XML that is passed around in our application, it does allow this packaged XML that is being treated as text to be passed around.

    0 讨论(0)
提交回复
热议问题