Zeppelin: Scala Dataframe to python

前端 未结 1 970
走了就别回头了
走了就别回头了 2020-12-05 11:46

If I have a Scala paragraph with a DataFrame, can I share and use that with python. (As I understand it pyspark uses py4j)

I tried this:

Scala paragraph: <

相关标签:
1条回答
  • 2020-12-05 12:29

    You can register DataFrame as a temporary table in Scala:

    // registerTempTable in Spark 1.x
    df.createTempView("df")
    

    and read it in Python with SQLContext.table:

    df = sqlContext.table("df")
    

    If you really want to use put / get you'll have build Python DataFrame from scratch:

    z.put("df", df: org.apache.spark.sql.DataFrame)
    
    from pyspark.sql import DataFrame
    
    df = DataFrame(z.get("df"), sqlContext)
    

    To plot with matplotlib you'll have convert DataFrame to a local Python object with either collect or toPandas:

    pdf = df.toPandas()
    

    Please note that it will fetch data to the driver.

    See also moving Spark DataFrame from Python to Scala whithn Zeppelin

    0 讨论(0)
提交回复
热议问题