How to solve Ampersand (&) conversion issue in XML?

前端 未结 4 618
北恋
北恋 2021-01-21 06:07

I am creating XML file using XMLDocument, but when XML node get \'&\' as data, it converting in \"Ampersand(&)amp;\" but i need actual value that is \'&\', Can anyon

4条回答
  •  时光说笑
    2021-01-21 06:24

    I once had this situation where I wanted to preserve raw ampersands in XML. Though your parser may not be the same as mine (I use MarkLogic), the following still applies to your situation with any XML parser:

    Issues with the ampersand character

        The ampersand character can be tricky to construct in an XQuery string, as it is an escape character to the XQuery parser. The ways to construct the ampersand character in XQuery are:
    
        Use the XML entity syntax (for example, &).
        Use a CDATA element (), which tells the XQuery parser to read the content as character data.
        Use the repair option on xdmp:document-load, xdmp:document-get, or xdmp:unquote.
        https://help.marklogic.com/knowledgebase/article/View/55/0/xquery-ampersand-in-string
    

    Obviously, the first option listed above, which is to escape ampersands, was not the direction we wanted to go. We wanted raw ampersands, not the escaped entity.
    The second option seemed at first a good idea, and I played around with CDATA elements for a very long time. CDATA allows "character data", and everything inside is considered character data, not real XML. After playing around with some examples, I discovered that you could potentially make CDATA return ampersands, but CDATA elements are VERY unfriendly. For instance, creating dynamic CDATA elements is near impossible, you cannot simply wrap an XML structure inside of a CDATA. CDATA is meant to have static, predefined characters inside of it. If there is an effective way of using CDATA, I was not able to find it. Xdmp:quote and xdmp:unquote do the trick that we need, though not in the way that we expect them too. For example:

    let $xml := D&CD&C
    return xdmp:quote($xml//secondLevel[1])
    (: Returns D&C :)
    

    But

    let $xml := D&CD&C
    return xdmp:quote($xml//secondLevel[1]/node())
    (: Returns D&C - an unescaped ampersand! :)
    

    The second example gives us the unescaped ampersand, but only because the object we are trying to xdmp:quote is text, and not an element. In the first example, if we try to quote the element, it will return us with the text version of the XML, but with D&C - escaped ampersand. Thus, in order to have xdmp:quote give us a string with ampersands, the object with the ampersand must be stand-alone text.
    From here, there are probably a few different directions we could go, and my idea is surely not the most elegant or efficient. But I decided to make a recursive function, parsing all the XML as text, and allowing an xdmp:quote of pure text for ampersands.

    declare function local:stringify($xml)
    {
      if (xdmp:node-kind($xml) eq "text") then
        xdmp:quote($xml, 
                      text
                    )
      else if (xdmp:node-kind($xml) eq "element") then
          fn:string-join(
            (fn:concat("<", fn:local-name($xml)),
            for $attr in $xml/@*
              return fn:concat(' ', fn:local-name($attr), '="', $attr, '"'),
            ">",
            for $node in $xml/node()
              return local:stringify($node),
            fn:concat("")
          ), "")
      else ()
    };
    
    let $xml := D&CD&C
    
    
    return local:stringify($xml)
    (: Returns D&CD&C :)
    

    So while this solution does not allow an ampersand to exist in XML that is passed around in our application, it does allow this packaged XML that is being treated as text to be passed around.

提交回复
热议问题