Connect Azure Event Hubs with Data Lake Store

感情迁移 提交于 2019-12-05 14:11:46

I am assuming you want to ingest data from EventHubs to Data Lake Store on a regular basis. Like Nava said, you can use Azure Stream Analytics to get data from EventHub into Azure Storage Blobs. Thereafter you can use Azure Data Factory (ADF) to copy data on a scheduled basis from Blobs to Azure Data Lake Store. More details on using ADF are available here: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-datalake-connector/. Hope this helps.

== March 17, 2016 update.

Support for Azure Data Lake Store as an output for Azure Stream Analytics is now available. https://blogs.msdn.microsoft.com/streamanalytics/2016/03/14/integration-with-azure-data-lake-store/ . This will be the best option for your scenario.

Sachin Sheth

Program Manager, Azure Data Lake

In addition to Nava's reply: you can query data in a Windows Azure Blob Storage container with ADLA/U-SQL as well. Or you can use the Blob Store to ADL Storage copy service (see https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/).

One way would be to write a process to read messages from the event hub event hub API and writes them into a Data Lake Store. Data Lake SDK.

Another alternative would be to use Steam Analytics to get data from Event Hub into a Blob, and Azure Automation to run a powershell that would read the data from the blob and write into a data lake store.

Not taking credit for this, but sharing with the community:

It is also possible to archive the Events (look into properties\archive), this leaves an Avro blob.

Then using the AvroExtractor you can convert the records into Json as described in Anthony's blob: http://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/

One of the ways would be to connect your EventHub to Data Lake using EventHub capture functionality (Data Lake and Blob Storage is currently supported). Event Hub would write to Data Lake every N mins interval or once data size threshold is reached. It is used to optimize storage "write" operations as they are expensive on a high scale.

The data is stored in Avro format, so if you want to query it using USQL you'd have to use an Extractor class. Uri gave a good reference to it https://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!