In Pyspark HiveContext what is the equivalent of SQL OFFSET?

前端 未结 1 1956
旧时难觅i
旧时难觅i 2021-01-25 22:24

Or a more specific question would be how can I process large amounts of data that do not fit into memory at once? With OFFSET I was trying to do hiveContext.sql(\"select ... lim

1条回答
  •  鱼传尺愫
    2021-01-25 22:45

    You code will look like

      from pyspark.sql import HiveContext
    hiveContext = HiveContext(sc)
    hiveContext.sql("    with result as
     (   SELECT colunm1 ,column2,column3, ROW_NUMBER() OVER (ORDER BY columnname) AS RowNum FROM tablename )
    select colunm1 ,column2,column3 from result where RowNum >= OFFSEtvalue and  RowNum < (OFFSEtvalue +limtvalue ").show()
    

    Note: Update below variables according your requirement tcolunm1 , tablename, OFFSEtvalue, limtvalue

    0 讨论(0)
提交回复
热议问题