GUI or CLI to create parquet file

你。 提交于 2021-01-29 13:14:38

问题


I want to provide the people I work with, a tool to create parquet files to be use for unit-tests of modules that read and process such files.

I use ParquetViewer to view the content of parquet files, but I like to have a tool to make (sample) parquet files. Is there such a tool to create parquet file with a GUI or some practical CLI otherwise?

Note: I would prefer a cross-platform solution, but if not I am looking for a windows/mingw solution in order to use it at work - where I cannot choose the OS :\


回答1:


parquet-cli written in Java can convert from CSV to parquet.

(This is a sample on Windows)

test.csv is below:

emp_id,dept_id,name,created_at,updated_at
1,1,"test1","2019-02-17 10:00:00","2019-02-17 12:00:00"
2,2,"test2","2019-02-17 10:00:00","2019-02-17 12:00:00"

It requires winutils on Windows. Download and set environment value.

$ set HADOOP_HOME=D:\development\hadoop

Clone parquet-mr, build all and run 'convert-csv' command of parquet-cli.

$ cd parquet-cli
$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main convert-csv C:\Users\foo\Downloads\test.csv -o C:\Users\foo\Downloads\test-csv.parquet

'cat' command shows the content of that parquet file.

$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main cat C:\Users\foo\Downloads\test-csv.parquet
{"emp_id": 1, "dept_id": 1, "name": "test1", "created_at": "2019-02-17 10:00:00", "updated_at": "2019-02-17 12:00:00"}
{"emp_id": 2, "dept_id": 2, "name": "test2", "created_at": "2019-02-17 10:00:00", "updated_at": "2019-02-17 12:00:00"}


来源:https://stackoverflow.com/questions/57560049/gui-or-cli-to-create-parquet-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!