问题
How do we save data inside of an XML payload to blob storage?
input
<root>
<alexIsAwesome>yes he is</alexIsAwesome>
<bytes>sdfsdfjijOIJOISJDFQPWORPJkjsdlfkjlksdf==</bytes>
</root>
desired result
<root>
<alexIsAwesome>yes he is</alexIsAwesome>
<bytes>/blob/path/toSavedPayload</bytes>
</root>
- save bytes somewhere in blob
- replace bytes with URI of where bytes were saved
How do we use data factory to extract a node from XML and save it to blob?
回答1:
Currently, ADF doesn’t support XML natively. But
- You may write your own code and then use custom activity of ADF.
- SSIS has built-in support for XML as a source. Maybe you could take a look.
回答2:
For that case you have to use some custom code to do this. I would choose from these options
- Azure Functions - only for some simple data processing
- Azure Databricks - in the case you need to process some big XML data
回答3:
As Azure Data Factory does not support XML natively, I would suggest you to go for SSIS package.
- In the Data flow task, have XML source and read bytes from the xml into a variable of DT_Image datatype.
- Create a script task, which uploads the byte array (DT_Image) got in step no.1 to azure blob storage as mentioned in the below. Code slightly modified for the requirement. Reference of SO post
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Auth;
using Microsoft.WindowsAzure.Storage.Blob;
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("StorageKey");
// Create the blob client.
CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
// Retrieve reference to a previously created container.
CloudBlobContainer container = blobClient.GetContainerReference("mycontainer");
// Retrieve reference to a blob named "myblob".
CloudBlockBlob blockBlob = container.GetBlockBlobReference("myblob");
byte[] byteArrayIn = Dts.Variables["User::ImageVariable"].Value;
// Create or overwrite the "myblob" blob with contents from a local file.
using (var memoryStream = new MemoryStream(byteArrayIn);)
{
blockBlob.UploadFromStream(memoryStream);
}
- Now, host this SSIS Package in SSIS Runtime in Azure Data Factory and execute the SSIS package.
SSIS Runtime in Azure DataFactory
来源:https://stackoverflow.com/questions/56224407/how-to-transform-xml-data-using-datafactory-pipeline