问题
We are creating a DataFactory for running a pySpark job, that uses a HDInsight on demand cluster.
The problem is that we need to use additional python dependencies for running this job, such as numpy, that are not installed.
We believe that the way of doing so is configuring a Script Action for the HDInsightOnDemandLinkedService, but we cannot find this option on DataFactory or LikedServices.
Is there an alternative for automating the HDInsightOnDemand installation of the dependencies?
回答1:
Currently the Script Actions for HDInsightOnDemandLinkedService are not supported. You can use Azure Automation to run a PowerShell script that does the following:
- create HDInsight cluster
- execute Script Action
- run pipeline in your DataFactory
- delete the cluster.
来源:https://stackoverflow.com/questions/49456110/how-to-create-a-hdinsightondemand-linkedservice-with-a-script-action-in-data-fac