delta-lake

Is it possible to connect to databricks deltalake tables from adf

廉价感情. 提交于 2020-07-03 10:10:30
问题 I'm looking for a way to be able to connect to Databricks deltalake tables from ADF and other Azure Services(like Data Catalog). I don't see databricks data store listed in ADF data sources. On a similar question - Is possible to read an Azure Databricks table from Azure Data Factory? @simon_dmorias seems to have suggested using ODBC connection to connect to databricks tables. I tried to set up the ODBC connection but it requires IR to be setup. There are 2 options I see when creating the IR.

How to import Delta Lake module in Zeppelin notebook and pyspark?

左心房为你撑大大i 提交于 2020-06-14 07:56:11
问题 I am trying to use Delta Lake in a Zeppelin notebook with pyspark and seems it cannot import the module successfully. e.g. %pyspark from delta.tables import * It fails with the following error: ModuleNotFoundError: No module named 'delta' However, there is no problem to save/read the data frame using delta format. And the module can be loaded successfully if using scala spark %spark Is there any way to use Delta Lake in Zeppelin and pyspark? 回答1: Finally managed to load it on zeppelin pyspark

Where is the Delta table location stored?

北战南征 提交于 2020-03-25 21:59:29
问题 We just migrated to Databricks Delta from parquet using Hive metastore. So far everything seems to work fine, when I try to print out the location of the new Delta table using DESCRIBE EXTENDED my_table the location is correct although it is different than the one found in the hiveMetastore database. When I access the hiveMetastore database I can successfully identify the target table (also provider is correctly set to Delta). To retrieve the previous information I am executing a join between

How to use Delta Lake with spark-shell?

一世执手 提交于 2020-01-09 11:27:27
问题 I'm trying to write as Spark DF as a DeltaTable. It's working fine in my IDE Intelliji , But with the same dependencies and versions it's not working in my spark REPL(Spark shell) Spark Version :2.4.0 Scala Version :2.11.8 Dependencies in Intelliji (Dependencies for whole project , Kindly ignore relevant) compile 'org.scala-lang:scala-library:2.11.8' compile 'org.scala-lang:scala-reflect:2.11.8' compile 'org.scala-lang:scala-compiler:2.11.8' compile 'org.scala-lang.modules:scala-parser

How to write / writeStream each row of a dataframe into a different delta table

南笙酒味 提交于 2019-12-13 18:38:14
问题 Each row of my dataframe has a CSV content. I am strugling to save each row in a different and specific table. I believe I need to use a foreach or UDF in order to accomplish this, but this is simply not working. All the content I managed to find was just like simple prints inside foreachs or codes using .collect() (which I really don't want to use). I also found the repartition way, but that doesn't allow me to choose where each row will go. rows = df.count() df.repartition(rows).write.csv(

Append only new aggregates based on groupby keys

Deadly 提交于 2019-12-11 19:47:21
问题 I have to process some files which arrive to me daily. The information have primary key (date,client_id,operation_id) . So I created a Stream which append only new data into a delta table: operations\ .repartition('date')\ .writeStream\ .outputMode('append')\ .trigger(once=True)\ .option("checkpointLocation", "/mnt/sandbox/operations/_chk")\ .format('delta')\ .partitionBy('date')\ .start('/mnt/sandbox/operations') This is working fine, but i need to summarize this information grouped by (date

streaming aggregate not writing into sink

情到浓时终转凉″ 提交于 2019-12-08 06:05:12
问题 I have to process some files which arrive to me daily. The information have primary key (date,client_id,operation_id). So I created a Stream which append only new data into a delta table: operations\ .repartition('date')\ .writeStream\ .outputMode('append')\ .trigger(once=True)\ .option("checkpointLocation", "/mnt/sandbox/operations/_chk")\ .format('delta')\ .partitionBy('date')\ .start('/mnt/sandbox/operations') This is working fine, but i need to summarize this information grouped by (date