I am new to zeppelin. I have a usecase wherein i have a pandas dataframe.I need to visualize the collections using in-built chart of zeppelin I do not have a clear approach here
I've just copied and pasted your code in a notebook and it works.
%pyspark
import pandas as pd
from pyspark.sql import SQLContext
print sc
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
print type(df)
print df
sqlCtx = SQLContext(sc)
sqlCtx.createDataFrame(df).show()
<pyspark.context.SparkContext object at 0x10b0a2b10>
<class 'pandas.core.frame.DataFrame'>
k v
0 foo 1
1 bar 2
+---+-+
| k|v|
+---+-+
|foo|1|
|bar|2|
+---+-+
I am using this version: zeppelin-0.5.0-incubating-bin-spark-1.4.0_hadoop-2.3.tgz
The following works for me with Zeppelin 0.6.0, Spark 1.6.2 and Python 3.5.2:
%pyspark
import pandas as pd
df = pd.DataFrame([("foo", 1), ("bar", 2)], columns=("k", "v"))
z.show(sqlContext.createDataFrame(df))
which renders as:
Try setting the SPARK_HOME and PYTHONPATH Variables in bash and then rerunning it
export SPARK_HOME=path to spark
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH