问题
I am trying to copy data from SQL Table in a on-prem sql server and upload it to a Document DB using custom activity in Azure data factory pipeline. Can anyone tell me how can I accomplish it using IDotNetActivity or any other interface or class.
回答1:
Actually, Custom activity cannot access on-prem data today.
Similar question here: On-Prem SQL connection throwing SqlException in Datafactory custom activity
The solution is copy on-prem data to cloud. Then run custom activity against cloud storage. wBob shared good sample above.
If you have to complete it in one activity, you can setup vNet and ExpressRoute to connect your Azure public cloud with your onprem environment.
回答2:
I got this to work with conventional Azure Data Factory (ADF) tasks. No custom task is required. I wouldn't make things more complicated than they need to be particularly with these components which can be hard to debug.
The following sample shows:
- A linked service of type OnPremisesSqlServer.
- A linked service of type DocumentDb.
- An input dataset of type SQLServerDataset.
- An output dataset of type DocumentDbCollection.
- The pipeline with Copy activity that uses SqlSource and DocumentDbCollectionSink.
Linked Service of type On Premises SQL Server:
{
"name": "OnPremLinkedService",
"properties": {
"type": "OnPremisesSqlServer",
"description": "",
"typeProperties": {
"connectionString": "Data Source=<servername - required for credential encryption>;Initial Catalog=<databasename - required for credential encryption>;Integrated Security=False;User ID=<username>;Password=<password>;",
"gatewayName": "<Name of the gateway that the Data Factory service should use to connect to the on-premises SQL Server database - required for credential encryption>",
"userName": "<Specify user name if you are using Windows Authentication>",
"password": "<Specify password for the user account>"
}
}
}
Linked Service of type DocumentDB:
{
"name": "DocumentDbLinkedService",
"properties": {
"type": "DocumentDb",
"typeProperties": {
"connectionString": "AccountEndpoint=<EndpointUrl>;AccountKey=<AccessKey>;Database=<Database>"
}
}
}
Input Dataset of type SqlServerTable:
{
"name": "SQLServerDataset",
"properties": {
"structure": [
{
"name": "Id",
"type": "Int32"
},
{
"name": "FirstName",
"type": "String"
},
{
"name": "MiddleName",
"type": "String"
},
{
"name": "LastName",
"type": "String"
}
],
"published": false,
"type": "SqlServerTable",
"linkedServiceName": "OnPremLinkedService",
"typeProperties": {
"tableName": "dbo.Users"
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
}
Output Dataset of type DocumentDbCollection:
{
"name": "PersonDocumentDbTableOut",
"properties": {
"structure": [
{
"name": "Id",
"type": "Int32"
},
{
"name": "Name.First",
"type": "String"
},
{
"name": "Name.Middle",
"type": "String"
},
{
"name": "Name.Last",
"type": "String"
}
],
"published": false,
"type": "DocumentDbCollection",
"linkedServiceName": "DocumentDbLinkedService",
"typeProperties": {
"collectionName": "Person"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Pipeline with Copy activity using SqlSource and DocumentDbCollectionSink:
{
"name": "PipelineTemplate 3",
"properties": {
"description": "On prem to DocDb test",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource"
},
"sink": {
"type": "DocumentDbCollectionSink",
"writeBatchSize": 2,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "id: id, FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last"
}
},
"inputs": [
{
"name": "SQLServerDataset"
}
],
"outputs": [
{
"name": "PersonDocumentDbTableOut"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"retry": 3
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "CopyActivityTemplate"
}
],
"start": "2016-10-05T00:00:00Z",
"end": "2016-10-05T00:00:00Z",
"isPaused": false,
"hubName": "adfdocdb2_hub",
"pipelineMode": "Scheduled"
}
}
回答3:
I was able to solve the problem. The solution is to write the code in custom activity itself that copies data from On-Prem SQL Server to DocumentDB by using the below code:
public async Task CopyDataFromTo(string source)
{
try
{
DataTable dtSource = new DataTable();
string EndpointUrl = "https://yourendpoint.documents.azure.com:443/";
string AuthorizationKey = "*****";
SecureString authKey = new SecureString();
foreach(char c in AuthorizationKey.ToCharArray())
{
authKey.AppendChar(c);
}
SqlDataAdapter adapSource = new SqlDataAdapter("Select * From YourTable", source);
adapSource.Fill(dtSource);
foreach (DataRow Dr in dtSource.Rows)
{
dynamic docFirst = new
{
UserID = Int32.Parse(Dr["ColumnOne"].ToString()),
UserAlias = Dr["ColumnTwo"].ToString()
};
using (var client = new DocumentClient(new Uri(EndpointUrl), authKey))
{
Document newDocument = await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("DatabaseName", "CollectionName"), docFirst);
};
}
}
catch (Exception Ex)
{
throw Ex;
}
}
回答4:
Thanks Charles. Turns out you are right. The solution I implemented was:
Part 1:
Implemented a data factory pipeline to move data from on-prem databases to staging DocumentDB collections.
Part 2:
Used custom activity to combine data from different collections(staged) in documentdb to create a new documentdb collection with required output data.
来源:https://stackoverflow.com/questions/39894991/copy-from-onprem-sql-server-to-documentdb-using-custom-activity-in-adf-pipeline