How to parse a csv that uses ^A (i.e. \001) as the delimiter with spark-csv?

后端 未结 2 415
盖世英雄少女心
盖世英雄少女心 2020-12-29 08:53

Terribly new to spark and hive and big data and scala and all. I\'m trying to write a simple function that takes an sqlContext, loads a csv file from s3 and returns a DataFr

相关标签:
2条回答
  • 2020-12-29 09:10

    If you check the GitHub page, there is a delimiter parameter for spark-csv (as you also noted). Use it like this:

    val df = sqlContext.read
        .format("com.databricks.spark.csv")
        .option("header", "true") // Use first line of all files as header
        .option("inferSchema", "true") // Automatically infer data types
        .option("delimiter", "\u0001")
        .load("cars.csv")
    
    0 讨论(0)
  • 2020-12-29 09:15

    With Spark 2.x and the CSV API, use the sep option:

    val df = spark.read
      .option("sep", "\u0001")
      .csv("path_to_csv_files")
    
    0 讨论(0)
提交回复
热议问题