Converting XML files to JSON or CSV?

孤街醉人 提交于 2020-05-28 04:28:05

问题


I have complex XML files with nested elements. I built a process to handle using SSIS and T-SQL. We utilize Azure Data Factory and I'd like to explore converting XML files to JSON or CSV, since those are supported by ADF and XML is not.

It appears logic apps is one option. Has anyone had other luck with taking XML and converting within a pipeline?

Current Workflow: pick up XML files from folder, drop to on network drives, bulk insert XML into a staging row, parse XML to various SQL tables for analysis.

Sample:

<HEADER>
<SurveyID> 1234 </SURVEYID>
  <RESPONSES>
      <VAR>Question1</VAR>
      <VALUE>Answer1</VALUE>
  </RESPONSES>
  <RESPONSES>
      <VAR>Question2</VAR>
      <VALUE>Answer2</VALUE>
  </RESPONSES>
<SurveyID> 1234 </SURVEYID>
 <RESPONSES>
      <VAR>Question1</VAR>
      <VALUE>DifferentAnswer</VALUE>
  </RESPONSES>
</HEADER>

Note: I don't need to know how to parse XML, that is done. I also know that you can execute SSIS within ADF. I am looking at alternatives to the overall process.


回答1:


I'm not sure why this question got downvoted - I had a similar need a few months ago. The situation was exacerbated by the fact the XML we receive is poorly formatted and wouldn't even parse correctly. To solve this, I wrote a .NET console app and deployed it to Azure Batch. It reads the XML from Blob Storage, corrects the formatting errors, then parses the XML and outputs it to a JSON file back in Blob Storage. ADF supports Azure Batch through the "Custom" activity, and so this plugs right into our pipeline. Depending on your data structure, you could output it to CSV if that is more appropriate.

The tricky bits of using Azure Batch from ADF are in passing and processing parameter data. In the ADF configuration, these are listed under "Extended properties":

These properties are available to the Batch job at runtime in a JSON file named "activity.json":In the Console app, you will need to access the JSON file to read the extended properties:

var activity_json = File.ReadAllText("activity.json");
dynamic activity = JsonConvert.DeserializeObject(activity_json);

parameters.Add("alertId", activity.typeProperties.extendedProperties.AlertId.ToString());
parameters.Add("hashKey", activity.typeProperties.extendedProperties.HashKey.ToString());
parameters.Add("startTime", activity.typeProperties.extendedProperties.StartTime.ToString());
parameters.Add("endTime", activity.typeProperties.extendedProperties.EndTime.ToString());

The property names are Case Sensitive. {Note that in this example I am writing them to a "parameters" Dictionary - I do that so I can run the Console app either locally or in Azure Batch.] There are a few other "interesting" aspects to using Azure Batch, but this is the biggest hurdle in my opinion.

| improve this answer | |

来源:https://stackoverflow.com/questions/59938374/converting-xml-files-to-json-or-csv

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!