问题
(Submitting on behalf of a Snowflake User)
Using:
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</seco
I'm getting null instead of getting the root element contents. I've read "How to Easily Load and Query XML Data with Snowflake Part 2" from Snowflake's documentation, and am using:
SELECT XMLGET(src_xml, 'clinical_study'):"$",
*
FROM STG_XML
;
...but it's giving me NULL as I'm trying to get the content of root elements using above SQL.
Any ideas, recommendations, and/or workarounds?
回答1:
As Mike Walton has stated, the XML is incomplete (which prevents others from readily reproducing the NULLs that the OP is asking about). If we close the open XML elements, the issue with the NULL from the XMLGET is that "clinical_study" is the root node... XMLGET retrieves elements within the root node. In order to return the contents of the root node, you can use an expression of:
src_xml:"$" AS clinical_study_contents
Here is a simple test harness to demonstrate this, as well as a valid use of XMLGET (to extract the contents of the "id_info" element):
WITH STG_XML AS (
SELECT PARSE_XML($1) AS src_xml
FROM VALUES
($$
<clinical_study>
<!-- This xml conforms to an XML Schema at:
https://clinicaltrials.gov/ct2/html/images/info/public.xsd -->
<required_header>
<download_date>ClinicalTrials.gov processed this data on September 13, 2019</download_date>
<link_text>Link to the current ClinicalTrials.gov record.</link_text>
<url>https://clinicaltrials.gov/show/NCT00010010</url>
</required_header>
<id_info>
<org_study_id>CDR0000068431</org_study_id>
<secondary_id>NYU-0004</secondary_id>
<secondary_id>P-UPJOHN-NYU-0004</secondary_id>
<secondary_id>NCI-G00-1906</secondary_id>
</id_info>
</clinical_study>
$$)
)
SELECT src_xml:"$" AS clinical_study_contents
,XMLGET(src_xml, 'id_info') as id_info_element
,*
FROM STG_XML
;
回答2:
Here is the Good Blog :
https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake
Also , PFB way to query nested XML elements.
Sample XML :
<?xml version="1.0"?>
<comtec version="2008">
<customer_transport_order>
<id>2880ORO</id>
<order_number>99833104701</order_number>
<priority>0</priority>
<order_date>2019-03-22</order_date>
<order_kind>
<code>VMI</code>
<name>VMI</name>
</order_kind>
<operational>true</operational>
<order_status>
<code>cancel</code>
<name>cancel</name>
<status_kind>cancel</status_kind>
</order_status>
<contact>
<id>CEN143096</id>
<code>CEN127431</code>
<name>SOUTHERN UNITED ENTERPRISES</name>
</contact>
</customer_transport_order>
</comtec>
Sample Query:
select
XMLGET( cust.value, 'order_number' ):"$"::integer as cust_order,
XMLGET( cust.value, 'order_date' ):"$"::string as cust_date,
XMLGET( orderkind.value, 'code' ):"$"::string as order_kind,
XMLGET( contactval.value, 'id' ):"$"::string as contactval,
XMLGET( contactval.value, 'code' ):"$"::string as contactcode,
XMLGET( contactval.value, 'name' ):"$"::string as contactname
from
dept_emp_addr
, lateral FLATTEN(dept_emp_addr.xmldata:"$") cust
, lateral FLATTEN(cust.value:"$") orderkind
, lateral FLATTEN(cust.value:"$") contactval
where cust.value like '<customer_transport_order>%' AND orderkind.value like '<order_kind>%'
AND contactval.value like '<contact>%'
ORDER BY cust_order;
[1]: https://community.snowflake.com/s/article/Querying-Nested-XML-in-Snowflake
来源:https://stackoverflow.com/questions/58486204/how-should-i-load-xml-file-which-has-comments-and-spaces-in-them-and-then-using