Write dataframe to blob using azure databricks

前端 未结 1 911
渐次进展
渐次进展 2021-01-22 17:41

Is there any link or sample code where we can write dataframe to azure blob storage using python (not using pyspark module).

相关标签:
1条回答
  • 2021-01-22 18:07

    Below is the code snippet for writing (dataframe) CSV data directly to an Azure blob storage container in an Azure Databricks Notebook.

    # Configure blob storage account access key globally
    spark.conf.set(
      "fs.azure.account.key.%s.blob.core.windows.net" % storage_name,
      sas_key)
    
    output_container_path = "wasbs://%s@%s.blob.core.windows.net" % (output_container_name, storage_name)
    output_blob_folder = "%s/wrangled_data_folder" % output_container_path
    
    # write the dataframe as a single file to blob storage
    (dataframe
     .coalesce(1)
     .write
     .mode("overwrite")
     .option("header", "true")
     .format("com.databricks.spark.csv")
     .save(output_blob_folder))
    
    # Get the name of the wrangled-data CSV file that was just saved to Azure blob storage (it starts with 'part-')
    files = dbutils.fs.ls(output_blob_folder)
    output_file = [x for x in files if x.name.startswith("part-")]
    
    # Move the wrangled-data CSV file from a sub-folder (wrangled_data_folder) to the root of the blob container
    # While simultaneously changing the file name
    dbutils.fs.mv(output_file[0].path, "%s/predict-transform-output.csv" % output_container_path)
    

    Example: notebook

    Output: Dataframe written to blob storage using Azure Databricks

    0 讨论(0)
提交回复
热议问题