问题
I want to provide the people I work with, a tool to create parquet files to be use for unit-tests of modules that read and process such files.
I use ParquetViewer to view the content of parquet files, but I like to have a tool to make (sample) parquet files. Is there such a tool to create parquet file with a GUI or some practical CLI otherwise?
Note: I would prefer a cross-platform solution, but if not I am looking for a windows/mingw solution in order to use it at work - where I cannot choose the OS :\
回答1:
parquet-cli written in Java can convert from CSV to parquet.
(This is a sample on Windows)
test.csv is below:
emp_id,dept_id,name,created_at,updated_at
1,1,"test1","2019-02-17 10:00:00","2019-02-17 12:00:00"
2,2,"test2","2019-02-17 10:00:00","2019-02-17 12:00:00"
It requires winutils on Windows. Download and set environment value.
$ set HADOOP_HOME=D:\development\hadoop
Clone parquet-mr, build all and run 'convert-csv' command of parquet-cli.
$ cd parquet-cli
$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main convert-csv C:\Users\foo\Downloads\test.csv -o C:\Users\foo\Downloads\test-csv.parquet
'cat' command shows the content of that parquet file.
$ java -cp target/classes;target/dependency/* org.apache.parquet.cli.Main cat C:\Users\foo\Downloads\test-csv.parquet
{"emp_id": 1, "dept_id": 1, "name": "test1", "created_at": "2019-02-17 10:00:00", "updated_at": "2019-02-17 12:00:00"}
{"emp_id": 2, "dept_id": 2, "name": "test2", "created_at": "2019-02-17 10:00:00", "updated_at": "2019-02-17 12:00:00"}
来源:https://stackoverflow.com/questions/57560049/gui-or-cli-to-create-parquet-file