How to convert .txt / .csv file to ORC format

后端未结

关注

 2  1922

For some requirement I want to convert text file(delimited) to ORC(Optimized Row Columnar) format. As I have to run it in regular intervals

相关标签:

2条回答

一向

2020-12-31 22:54
You can insert text data into a orc table by such command:
```
insert overwrite table orcTable select * from textTable;
```
The first table is orcTable is created by the following command:
```
create table orcTable(name string, city string) stored as orc;
```
And the textTable is as the same structure as orcTable.
0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2020-12-31 23:17

You can use Spark dataframes to convert a delimited file to orc format very easily. You can also specify/impose a schema and filter specific columns as well.

public class OrcConvert {
   public static void main(String[] args) {
    SparkConf conf = new SparkConf().setAppName("OrcConvert");

    JavaSparkContext jsc = new JavaSparkContext(conf);
    HiveContext hiveContext = new HiveContext(jsc);

    String inputPath = args[0];
    String outputPath = args[1];


    DataFrame inputDf = hiveContext.read().format("com.databricks.spark.csv")
            .option("quote", "'").option("delimiter", "\001")
            .load(inputPath);

    inputDf.write().orc(outputPath);
  }
}

Make sure all dependencies are met, a hive should be running to use HiveContext also, currently in Spark ORC format is only supported in HiveContext.

0 讨论(0)