Copy From OnPrem SQL server to DocumentDB using custom activity in ADF Pipeline

随声附和 提交于 2019-12-02 08:32:05

问题


I am trying to copy data from SQL Table in a on-prem sql server and upload it to a Document DB using custom activity in Azure data factory pipeline. Can anyone tell me how can I accomplish it using IDotNetActivity or any other interface or class.


回答1:


Actually, Custom activity cannot access on-prem data today.

Similar question here: On-Prem SQL connection throwing SqlException in Datafactory custom activity

The solution is copy on-prem data to cloud. Then run custom activity against cloud storage. wBob shared good sample above.

If you have to complete it in one activity, you can setup vNet and ExpressRoute to connect your Azure public cloud with your onprem environment.




回答2:


I got this to work with conventional Azure Data Factory (ADF) tasks. No custom task is required. I wouldn't make things more complicated than they need to be particularly with these components which can be hard to debug.

The following sample shows:

  1. A linked service of type OnPremisesSqlServer.
  2. A linked service of type DocumentDb.
  3. An input dataset of type SQLServerDataset.
  4. An output dataset of type DocumentDbCollection.
  5. The pipeline with Copy activity that uses SqlSource and DocumentDbCollectionSink.

Linked Service of type On Premises SQL Server:

{
    "name": "OnPremLinkedService",
    "properties": {
        "type": "OnPremisesSqlServer",
        "description": "",
        "typeProperties": {
            "connectionString": "Data Source=<servername - required for credential encryption>;Initial Catalog=<databasename - required for credential encryption>;Integrated Security=False;User ID=<username>;Password=<password>;",
            "gatewayName": "<Name of the gateway that the Data Factory service should use to connect to the on-premises SQL Server database - required for credential encryption>",
            "userName": "<Specify user name if you are using Windows Authentication>",
            "password": "<Specify password for the user account>"
        }
    }
}

Linked Service of type DocumentDB:

{
    "name": "DocumentDbLinkedService",
    "properties": {
        "type": "DocumentDb",
        "typeProperties": {
            "connectionString": "AccountEndpoint=<EndpointUrl>;AccountKey=<AccessKey>;Database=<Database>"
        }
    }
}

Input Dataset of type SqlServerTable:

{
    "name": "SQLServerDataset",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int32"
            },
            {
                "name": "FirstName",
                "type": "String"
            },
            {
                "name": "MiddleName",
                "type": "String"
            },
            {
                "name": "LastName",
                "type": "String"
            }
        ],
        "published": false,
        "type": "SqlServerTable",
        "linkedServiceName": "OnPremLinkedService",
        "typeProperties": {
            "tableName": "dbo.Users"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

Output Dataset of type DocumentDbCollection:

{
    "name": "PersonDocumentDbTableOut",
    "properties": {
        "structure": [
            {
                "name": "Id",
                "type": "Int32"
            },
            {
                "name": "Name.First",
                "type": "String"
            },
            {
                "name": "Name.Middle",
                "type": "String"
            },
            {
                "name": "Name.Last",
                "type": "String"
            }
        ],
        "published": false,
        "type": "DocumentDbCollection",
        "linkedServiceName": "DocumentDbLinkedService",
        "typeProperties": {
            "collectionName": "Person"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

Pipeline with Copy activity using SqlSource and DocumentDbCollectionSink:

{
    "name": "PipelineTemplate 3",
    "properties": {
        "description": "On prem to DocDb test",
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "SqlSource"
                    },
                    "sink": {
                        "type": "DocumentDbCollectionSink",
                        "writeBatchSize": 2,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": "id: id, FirstName: Name.First, MiddleName: Name.Middle, LastName: Name.Last"
                    }
                },
                "inputs": [
                    {
                        "name": "SQLServerDataset"
                    }
                ],
                "outputs": [
                    {
                        "name": "PersonDocumentDbTableOut"
                    }
                ],
                "policy": {
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "retry": 3
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "CopyActivityTemplate"
            }
        ],
        "start": "2016-10-05T00:00:00Z",
        "end": "2016-10-05T00:00:00Z",
        "isPaused": false,
        "hubName": "adfdocdb2_hub",
        "pipelineMode": "Scheduled"
    }
}



回答3:


I was able to solve the problem. The solution is to write the code in custom activity itself that copies data from On-Prem SQL Server to DocumentDB by using the below code:

 public async Task CopyDataFromTo(string source)
    {
        try
        {
            DataTable dtSource = new DataTable();
            string EndpointUrl = "https://yourendpoint.documents.azure.com:443/";
            string AuthorizationKey = "*****";
            SecureString authKey = new SecureString();
            foreach(char c in AuthorizationKey.ToCharArray())
            {
                authKey.AppendChar(c);
            }
            SqlDataAdapter adapSource = new SqlDataAdapter("Select * From YourTable", source);
            adapSource.Fill(dtSource);
            foreach (DataRow Dr in dtSource.Rows)
            {
                dynamic docFirst = new
                {
                    UserID = Int32.Parse(Dr["ColumnOne"].ToString()),
                    UserAlias = Dr["ColumnTwo"].ToString()
                };
                using (var client = new DocumentClient(new Uri(EndpointUrl), authKey))
                {
                    Document newDocument = await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri("DatabaseName", "CollectionName"), docFirst);
                };
            }
        }
        catch (Exception Ex)
        {
            throw Ex;
        }
    }



回答4:


Thanks Charles. Turns out you are right. The solution I implemented was:

Part 1:

Implemented a data factory pipeline to move data from on-prem databases to staging DocumentDB collections.

Part 2:

Used custom activity to combine data from different collections(staged) in documentdb to create a new documentdb collection with required output data.



来源:https://stackoverflow.com/questions/39894991/copy-from-onprem-sql-server-to-documentdb-using-custom-activity-in-adf-pipeline

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!