Or a more specific question would be how can I process large amounts of data that do not fit into memory at once? With OFFSET I was trying to do hiveContext.sql(\"select ... lim
You code will look like
from pyspark.sql import HiveContext
hiveContext = HiveContext(sc)
hiveContext.sql(" with result as
( SELECT colunm1 ,column2,column3, ROW_NUMBER() OVER (ORDER BY columnname) AS RowNum FROM tablename )
select colunm1 ,column2,column3 from result where RowNum >= OFFSEtvalue and RowNum < (OFFSEtvalue +limtvalue ").show()
Note: Update below variables according your requirement tcolunm1 , tablename, OFFSEtvalue, limtvalue