How to Remove header and footer from Dataframe?

后端 未结 4 937
小蘑菇
小蘑菇 2021-01-24 07:23

I am reading a text (not CSV) file that has header, content and footer using

spark.read.format(\"text\").option(\"delimiter\",\"|\")...load(file)
4条回答
  •  被撕碎了的回忆
    2021-01-24 08:11

    Assuming your text file has JSON header and Footer, Spark SQL way,

    Sample Data

    {"":[{:},{:}]}
    

    Here the header can be avoided by following 3 lines (Assumption No Tilda in data),

    jsonToCsvDF=spark.read.format("com.databricks.spark.csv").option("delimiter", "~").load()
    
    jsonToCsvDF.createOrReplaceTempView("json_to_csv")
    
    spark.sql("SELECT SUBSTR(`_c0`,5,length(`_c0`)-5) FROM json_to_csv").coalesce(1).write.option("header",false).mode("overwrite").text()
    

    Now the output will look like,

    [{:},{:}]
    

    Hope it helps.

提交回复
热议问题