I want to use parquet in one of my projects as columnar storage. But i dont want to depends on hadoop/hdfs libs. Is it possible to use parquet outside of hdfs? Or What is th
What type of data do you have in Parquet? You don't require HDFS to read Parquet files. It is definitely not a pre-requisite. We use parquet files at Incorta for our staging tables. We do not ship with a dependency on HDFS, however, you can store the files on HDFS if you want. Obviously, we at Incorta can read directly from the parquet files, but you can also use Apache Drill to connect, use file:/// as the connection and not hdfs:/// See below for an example.
To read or write Parquet data, you need to include the Parquet format in the storage plugin format definitions. The dfs plugin definition includes the Parquet format.
{
"type" : "file",
"enabled" : true,
"connection" : "file:///",
"workspaces" : {
"json_files" : {
"location" : "/incorta/tenants/demo//drill/json/",
"writable" : false,
"defaultInputFormat" : json
}
},