converting pandas dataframes to spark dataframe in zeppelin

后端 未结 3 1904
南旧
南旧 2021-02-04 05:16

I am new to zeppelin. I have a usecase wherein i have a pandas dataframe.I need to visualize the collections using in-built chart of zeppelin I do not have a clear approach here

相关标签:
3条回答
  • 2021-02-04 05:28

    I've just copied and pasted your code in a notebook and it works.

    %pyspark
    import pandas as pd
    from pyspark.sql import SQLContext
    print sc
    df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
    print type(df)
    print df
    sqlCtx = SQLContext(sc)
    sqlCtx.createDataFrame(df).show()
    
    <pyspark.context.SparkContext object at 0x10b0a2b10>
    <class 'pandas.core.frame.DataFrame'>
         k  v
    0  foo  1
    1  bar  2
    +---+-+
    |  k|v|
    +---+-+
    |foo|1|
    |bar|2|
    +---+-+
    

    I am using this version: zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz

    0 讨论(0)
  • 2021-02-04 05:46

    The following works for me with Zeppelin 0.6.0, Spark 1.6.2 and Python 3.5.2:

    %pyspark
    import pandas as pd
    df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
    z.show(sqlContext.createDataFrame(df))
    

    which renders as:

    enter image description here

    0 讨论(0)
  • 2021-02-04 05:49

    Try setting the SPARK_HOME and PYTHONPATH Variables in bash and then rerunning it

        export SPARK_HOME=path to spark
        export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
        export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
    
    0 讨论(0)
提交回复
热议问题