问题
I am having a requirement like I want to connect to my Azure data lake v2(ADLS) from Azure functions, read file, process it using python(pyspark) and write it again in Azure data lake. So my input and output binding would be to ADLS. Is there any ADLS binding for Azure function in python available? Could somebody give any suggestions on this?
Thank, Anten D
回答1:
Update:
1, When we read the data, we can use blob input binding.
2, But when we write the data, we can not use blob output binding.(This is because the object is different.) And azure function not support ADLS output binding so we need to put the logic code in the body of the function when we want to write the code.
This is the doc of what kind of binding that azure function can support:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings
Below is a simply code example:
import logging
import azure.functions as func
from azure.storage.filedatalake import DataLakeServiceClient
def main(req: func.HttpRequest, inputblob: func.InputStream) -> func.HttpResponse:
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfile = "FileName.txt"
file_system_client = datalake_service_client.get_file_system_client(myfilesystem)
file_client = file_system_client.create_file(myfile)
inputstr = inputblob.read().decode("utf-8")
print("length of data is "+str(len(inputstr)))
filesize_previous = 0
print("length of currentfile is "+str(filesize_previous))
file_client.append_data(inputstr, offset=filesize_previous, length=len(inputstr))
file_client.flush_data(filesize_previous+len(inputstr))
return func.HttpResponse(
"This is a test."+inputstr,
status_code=200
)
Original Answer:
I think below doc will helps you:
How to read:
https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp
How to write:
https://docs.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python
By the way, don't use blob's output binding. Reading can be achieved with binding, but writing cannot.(Blob Storage Service and Datalake Service are based on different objects. Although using blob input binding to read files is completely fine, please do not use blob output binding to write files, because it does not create an object based on Datalake Service.)
Let me know whether above doc can helps you, if not I will update a simple python example.
来源:https://stackoverflow.com/questions/64527808/azure-function-binding-for-azure-data-lake-python