If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j)
I tried this:
Scala paragraph: <
You can register DataFrame
as a temporary table in Scala:
// registerTempTable in Spark 1.x
df.createTempView("df")
and read it in Python with SQLContext.table
:
df = sqlContext.table("df")
If you really want to use put
/ get
you'll have build Python DataFrame
from scratch:
z.put("df", df: org.apache.spark.sql.DataFrame)
from pyspark.sql import DataFrame
df = DataFrame(z.get("df"), sqlContext)
To plot with matplotlib
you'll have convert DataFrame
to a local Python object with either collect
or toPandas
:
pdf = df.toPandas()
Please note that it will fetch data to the driver.
See also moving Spark DataFrame from Python to Scala whithn Zeppelin