In Pyspark HiveContext what is the equivalent of SQL OFFSET?

前端未结

关注

 1  1956

旧时难觅i 2021-01-25 22:24

Or a more specific question would be how can I process large amounts of data that do not fit into memory at once? With OFFSET I was trying to do hiveContext.sql(\"select ... lim

1条回答

鱼传尺愫 (楼主)

2021-01-25 22:45

You code will look like

  from pyspark.sql import HiveContext
hiveContext = HiveContext(sc)
hiveContext.sql("    with result as
 (   SELECT colunm1 ,column2,column3, ROW_NUMBER() OVER (ORDER BY columnname) AS RowNum FROM tablename )
select colunm1 ,column2,column3 from result where RowNum >= OFFSEtvalue and  RowNum < (OFFSEtvalue +limtvalue ").show()

Note: Update below variables according your requirement tcolunm1 , tablename, OFFSEtvalue, limtvalue

0 讨论(0)