How to read and write Map from/to parquet file in Java or Scala?

前端 未结 3 897
一整个雨季
一整个雨季 2021-01-04 06:58

Looking for a concise example on how to read and write Map from/to parquet file in Java or Scala?

Here is expected structure, usin

相关标签:
3条回答
  • 2021-01-04 07:20

    Apache Drill is your answer!

    Convert to parquet : You can use the CTAS(create table as) feature in drill. By default drill creates a folder with parquet files after executing the below query. You can substitute any query and drill writes the output of you query into parquet files

    create table file_parquet as select * from dfs.`/data/file.json`;
    

    Convert from parquet : We also use the CTAS feature here, however we request drill to use a different format for writing the output

    alter session set `store.format`='json';
    create table file_json as select * from dfs.`/data/file.parquet`;
    

    Refer to http://drill.apache.org/docs/create-table-as-ctas-command/ for more information

    0 讨论(0)
  • 2021-01-04 07:24

    i'm not quite good about parquet but, from here:

    Schema schema = new Schema.Parser().parse(Resources.getResource("map.avsc").openStream());
    
        File tmp = File.createTempFile(getClass().getSimpleName(), ".tmp");
        tmp.deleteOnExit();
        tmp.delete();
        Path file = new Path(tmp.getPath());
    
        AvroParquetWriter<GenericRecord> writer = 
            new AvroParquetWriter<GenericRecord>(file, schema);
    
        // Write a record with an empty map.
        ImmutableMap emptyMap = new ImmutableMap.Builder<String, Integer>().build();
        GenericData.Record record = new GenericRecordBuilder(schema)
            .set("mymap", emptyMap).build();
        writer.write(record);
        writer.close();
    
        AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>(file);
        GenericRecord nextRecord = reader.read();
    
        assertNotNull(nextRecord);
        assertEquals(emptyMap, nextRecord.get("mymap"));
    

    In your situation change ImmutableMap (Google Collections) with a default Map as below:

    Schema schema = new Schema.Parser().parse( Resources.getResource( "map.avsc" ).openStream() );
    
            File tmp = File.createTempFile( getClass().getSimpleName(), ".tmp" );
            tmp.deleteOnExit();
            tmp.delete();
            Path file = new Path( tmp.getPath() );
    
            AvroParquetWriter<GenericRecord> writer = new AvroParquetWriter<GenericRecord>( file, schema );
    
            // Write a record with an empty map.
            Map<String,Object> emptyMap = new HashMap<String, Object>();
    
            // not empty any more
            emptyMap.put( "SOMETHING", new SOMETHING() );
            GenericData.Record record = new GenericRecordBuilder( schema ).set( "mymap", emptyMap ).build();
            writer.write( record );
            writer.close();
    
            AvroParquetReader<GenericRecord> reader = new AvroParquetReader<GenericRecord>( file );
            GenericRecord nextRecord = reader.read();
    
            assertNotNull( nextRecord );
            assertEquals( emptyMap, nextRecord.get( "mymap" ) );
    

    I didn't test the code, but give it a try..

    0 讨论(0)
  • 2021-01-04 07:42

    I doubt there is a solution to this readily available. When you talk about Maps, its still possible to create a AvroSchema out of it provided the values of the maps is a primitive type, or a complexType which inturn has primitive type fields.

    In your case,

    • If you have a Map => which will create schema with values of map being int.
    • If you have a Map,
      • a. CustomObject has fields int, float, char ... (i.e. any primitive type) the schema generation will be valid and can then be used to successfully convert to parquet.
      • b. CustomObject has fields which are non primitive, the schema generated will be malformed and the resulting ParquetWritter will fail.

    To resolve this issue, you can try to convert your object into a JsonObject and then use the Apache Spark libraries to convert it to Parquet.

    0 讨论(0)
提交回复
热议问题