How should I load XML file which has comments and spaces in them and then using XMLGET on the root element, I'm not able to get the child elements

你离开我真会死。 提交于 2019-12-25 00:34:33

问题


(Submitting on behalf of a Snowflake User)


Using:

<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</seco

I'm getting null instead of getting the root element contents. I've read "How to Easily Load and Query XML Data with Snowflake Part 2" from Snowflake's documentation, and am using:

SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;

...but it's giving me NULL as I'm trying to get the content of root elements using above SQL.


Any ideas, recommendations, and/or workarounds?


回答1:


As Mike Walton has stated, the XML is incomplete (which prevents others from readily reproducing the NULLs that the OP is asking about). If we close the open XML elements, the issue with the NULL from the XMLGET is that "clinical_study" is the root node... XMLGET retrieves elements within the root node. In order to return the contents of the root node, you can use an expression of:

src_xml:"$" AS clinical_study_contents

Here is a simple test harness to demonstrate this, as well as a valid use of XMLGET (to extract the contents of the "id_info" element):

WITH STG_XML AS (
  SELECT PARSE_XML($1) AS src_xml
    FROM VALUES
           ($$
<clinical_study>
 <!-- This xml conforms to an XML Schema at:
  https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
 <required_header>
  <download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
  <link_text>Link to the current ClinicalTrials.gov record.</link_text>
  <url>https://clinicaltrials.gov/show/NCT00010010</url>
 </required_header>
 <id_info>
  <org_study_id>CDR0000068431</org_study_id>
  <secondary_id>NYU-0004</secondary_id>
  <secondary_id>P-UPJOHN-NYU-0004</secondary_id>
  <secondary_id>NCI-G00-1906</secondary_id>
 </id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
      ,XMLGET(src_xml, 'id_info') as id_info_element
      ,*
  FROM STG_XML
;



回答2:


Here is the Good Blog :

https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake

Also , PFB  way to query nested XML elements.

    Sample XML :

    <?xml version="1.0"?>
    <comtec version="2008">
        <customer_transport_order>
            <id>2880ORO</id>
            <order_number>99833104701</order_number>
            <priority>0</priority>
            <order_date>2019-03-22</order_date>
            <order_kind>
                <code>VMI</code>
                <name>VMI</name>
            </order_kind>
            <operational>true</operational>
            <order_status>
                <code>cancel</code>
                <name>cancel</name>
                <status_kind>cancel</status_kind>
            </order_status>
            <contact>
                <id>CEN143096</id>
                <code>CEN127431</code>
                <name>SOUTHERN UNITED ENTERPRISES</name>
            </contact>
        </customer_transport_order>
    </comtec>

    Sample Query:


        select
               XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
               XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
               XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
               XMLGET( contactval.value, 'id' ):"$"::string as contactval,
               XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
               XMLGET( contactval.value, 'name' ):"$"::string as contactname
        from
            dept_emp_addr
            ,  lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
            , lateral FLATTEN(cust.value:"$") orderkind
            , lateral FLATTEN(cust.value:"$") contactval
          where cust.value like '<customer_transport_order>%' AND  orderkind.value like '<order_kind>%'
          AND contactval.value like '<contact>%'
          ORDER BY cust_order;


  [1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake


来源:https://stackoverflow.com/questions/58486204/how-should-i-load-xml-file-which-has-comments-and-spaces-in-them-and-then-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!