How to transform xml data using datafactory pipeline

不打扰是莪最后的温柔 提交于 2020-11-30 12:07:27

问题


How do we save data inside of an XML payload to blob storage?

input

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>sdfsdfjijOIJOISJDFQPWORPJkjsdlfkjlksdf==</bytes>
</root>

desired result

<root>
  <alexIsAwesome>yes he is</alexIsAwesome>
  <bytes>/blob/path/toSavedPayload</bytes>
</root>
  1. save bytes somewhere in blob
  2. replace bytes with URI of where bytes were saved

How do we use data factory to extract a node from XML and save it to blob?


回答1:


Currently, ADF doesn’t support XML natively. But

  1. You may write your own code and then use custom activity of ADF.
  2. SSIS has built-in support for XML as a source. Maybe you could take a look.



回答2:


For that case you have to use some custom code to do this. I would choose from these options

  • Azure Functions - only for some simple data processing
  • Azure Databricks - in the case you need to process some big XML data



回答3:


As Azure Data Factory does not support XML natively, I would suggest you to go for SSIS package.

  1. In the Data flow task, have XML source and read bytes from the xml into a variable of DT_Image datatype.
  2. Create a script task, which uploads the byte array (DT_Image) got in step no.1 to azure blob storage as mentioned in the below. Code slightly modified for the requirement. Reference of SO post
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;    

// Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse("StorageKey");

// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");

// Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob");

byte[] byteArrayIn = Dts.Variables["User::ImageVariable"].Value;

// Create or overwrite the "myblob" blob with contents from a local file.
using (var memoryStream = new MemoryStream(byteArrayIn);)
{
    blockBlob.UploadFromStream(memoryStream);
}
  1. Now, host this SSIS Package in SSIS Runtime in Azure Data Factory and execute the SSIS package.

SSIS Runtime in Azure DataFactory



来源:https://stackoverflow.com/questions/56224407/how-to-transform-xml-data-using-datafactory-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!