How to convert .txt / .csv file to ORC format

后端 未结 2 1922
花落未央
花落未央 2020-12-31 22:33

For some requirement I want to convert text file(delimited) to ORC(Optimized Row Columnar) format. As I have to run it in regular intervals

相关标签:
2条回答
  • 2020-12-31 22:54

    You can insert text data into a orc table by such command:

    insert overwrite table orcTable select * from textTable;
    

    The first table is orcTable is created by the following command:

    create table orcTable(name string, city string) stored as orc;
    

    And the textTable is as the same structure as orcTable.

    0 讨论(0)
  • 2020-12-31 23:17

    You can use Spark dataframes to convert a delimited file to orc format very easily. You can also specify/impose a schema and filter specific columns as well.

    public class OrcConvert {
       public static void main(String[] args) {
        SparkConf conf = new SparkConf().setAppName("OrcConvert");
    
        JavaSparkContext jsc = new JavaSparkContext(conf);
        HiveContext hiveContext = new HiveContext(jsc);
    
        String inputPath = args[0];
        String outputPath = args[1];
    
    
        DataFrame inputDf = hiveContext.read().format("com.databricks.spark.csv")
                .option("quote", "'").option("delimiter", "\001")
                .load(inputPath);
    
        inputDf.write().orc(outputPath);
      }
    }
    

    Make sure all dependencies are met, a hive should be running to use HiveContext also, currently in Spark ORC format is only supported in HiveContext.

    0 讨论(0)
提交回复
热议问题