How to use matplotlib to plot pyspark sql results

前端 未结 1 955
长情又很酷
长情又很酷 2021-01-02 05:29

I am new to pyspark. I want to plot the result using matplotlib, but not sure which function to use. I searched for a way to convert sql result to pandas and then use plot.<

相关标签:
1条回答
  • 2021-01-02 05:58

    I have found the solution for this. I converted sql dataframe to pandas dataframe and then I was able to plot the graphs. below is the sample code.from

    pyspark.sql import Row
    from pyspark.sql import HiveContext
    import pyspark
    from IPython.display import display
    import matplotlib
    import matplotlib.pyplot as plt
    %matplotlib inline 
    sc = pyspark.SparkContext()
    sqlContext = HiveContext(sc)
    test_list = [(1, 'hasan'),(2, 'nana'),(3, 'dad'),(4, 'mon')]
    rdd = sc.parallelize(test_list)
    people = rdd.map(lambda x: Row(id=int(x[0]), name=x[1]))
    schemaPeople = sqlContext.createDataFrame(people)
    # Register it as a temp table
    sqlContext.registerDataFrameAsTable(schemaPeople, "test_table")
    df1=sqlContext.sql("Select * from test_table")
    pdf1=df1.toPandas()
    pdf1.plot(kind='barh',x='name',y='id',colormap='winter_r')
    
    0 讨论(0)
提交回复
热议问题