How to use the Azure Python SDK to provision a Databricks service?

北城余情 提交于 2020-08-20 11:08:25

问题


[Previously in this post I asked how to provision a databricks services without any workspace. Now I'm asking how to provision a service with a workspace as the first scenario seems unfeasible.]

As a cloud admin I'm asked to write a script using the Azure Python SDK which will provision a Databricks service for one of our big data dev teams.

I can't find much online about Databricks within the Azure Python SDK other than https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.operations.html

and

https://azuresdkdocs.blob.core.windows.net/$web/python/azure-mgmt-databricks/0.1.0/azure.mgmt.databricks.html

These appear to offer some help provisioning a workspace, but I am not quite there yet.

What am I missing?

EDITS:

Thanks to @Laurent Mazuel and @Jim Xu for their help.

Here's the code I'm running now, and the error I'm receiving:

client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
WorkspacesOperations.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name",
custom_headers=None,
raw=False,
polling=True
)

error:

TypeError: create_or_update() missing 1 required positional argument: 'workspace_name'

I'm a bit puzzled by that error as I've provided the workspace name as the third parameter, and according to this documentation, that's just what this method requires.

I also tried the following code:

client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get("example_rg_name", "example_databricks_workspace_name")
client.workspaces.create_or_update(
workspace_obj,
"example_rg_name",
"example_databricks_workspace_name"
)

Which results in:

 Traceback (most recent call last):
   File "./build_azure_visibility_core.py", line 112, in <module>
     ca_databricks.create_or_update_databricks(SUB_PREFIX)
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/expd_az_databricks.py", line 34, in create_or_update_databricks
     self.databricks_workspace_name
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 264, in create_or_update
     **operation_config
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/azure/mgmt/databricks/operations/workspaces_operations.py", line 210, in _create_or_update_initial
     body_content = self._serialize.body(parameters, 'Workspace')
   File "/home/gitlab-runner/builds/XrbbggWj/0/SA-Cloud/azure-visibility-core/azure-visibility-core/lib64/python3.6/site-packages/msrest/serialization.py", line 589, in body
     raise ValidationError("required", "body", True)
 msrest.exceptions.ValidationError: Parameter 'body' can not be None.
 ERROR: Job failed: exit status 1

So Line 589 in serialization.py has an error. I don't see where an error in my code is causing that. Thanks to all who have been generous to assist!


回答1:


you need to create a databrick client, and workspaces will be attached to it:

client = DatabricksClient(credentials, subscription_id)
workspace = client.workspaces.get(resource_group_name, workspace_name)

I don't think creating a service without a workspace is even possible, trying to create databricks service on the portal, you will see workspace name is required as well so using the SDK I would look at the doc for client.workspaces.create_or_update

(I work at MS in the SDK team)




回答2:


with help from @Laurent Mazuel and support engineers at Microsoft, I have a solution:

managed_resource_group_ID = ("/subscriptions/"+sub_id+"/resourceGroups/"+managed_rg_name)
client = DatabricksClient(credentials, subscription_id)
workspace_obj = client.workspaces.get(rg_name, databricks_workspace_name)
client.workspaces.create_or_update(
    {
        "managedResourceGroupId": managed_resource_group_ID,
        "sku": {"name":"premium"},
        "location":location
    },
    rg_name,
    databricks_workspace_name
).wait()


来源:https://stackoverflow.com/questions/62902691/how-to-use-the-azure-python-sdk-to-provision-a-databricks-service

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!