Enum value implementing Writable interface of Hadoop

前端 未结 3 596
孤独总比滥情好
孤独总比滥情好 2021-01-19 03:17

Suppose I have an enumeration:

public enum SomeEnumType implements Writable {
  A(0), B(1);

  private int value;

  private SomeEnumType(int value) {
    th         


        
相关标签:
3条回答
  • 2021-01-19 04:04

    My normal and preferred solution for enums in Hadoop is serializing the enums through their ordinal value.

    public class EnumWritable implements Writable {
    
        static enum EnumName {
            ENUM_1, ENUM_2, ENUM_3
        }
    
        private int enumOrdinal;
    
        // never forget your default constructor in Hadoop Writables
        public EnumWritable() {
        }
    
        public EnumWritable(Enum<?> arbitraryEnum) {
            this.enumOrdinal = arbitraryEnum.ordinal();
        }
    
        public int getEnumOrdinal() {
            return enumOrdinal;
        }
    
        @Override
        public void readFields(DataInput in) throws IOException {
            enumOrdinal = in.readInt();
        }
    
        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(enumOrdinal);
        }
    
        public static void main(String[] args) {
            // use it like this:
            EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
            // let Hadoop do the write and read stuff
            EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
        }
    
    }
    

    Obviously it has drawbacks: Ordinals can change, so if you exchange ENUM_2 with ENUM_3 and read a previously serialized file, this will return the other wrong enum.

    So if you know the enum class beforehand, you can write the name of your enum and use it like this:

     enumInstance = EnumName.valueOf(in.readUTF());
    

    This will use slightly more space, but it is more save to changes to your enum names.

    The full example would look like this:

    public class EnumWritable implements Writable {
    
        static enum EnumName {
            ENUM_1, ENUM_2, ENUM_3
        }
    
        private EnumName enumInstance;
    
        // never forget your default constructor in Hadoop Writables
        public EnumWritable() {
        }
    
        public EnumWritable(EnumName e) {
            this.enumInstance = e;
        }
    
        public EnumName getEnum() {
            return enumInstance;
        }
    
        @Override
        public void write(DataOutput out) throws IOException {
            out.writeUTF(enumInstance.name());
        }
    
        @Override
        public void readFields(DataInput in) throws IOException {
            enumInstance = EnumName.valueOf(in.readUTF());
        }
    
        public static void main(String[] args) {
            // use it like this:
            EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
            // let Hadoop do the write and read stuff
            EnumName yourDeserializedEnum = enumWritable.getEnum();
    
        }
    
    }
    
    0 讨论(0)
  • 2021-01-19 04:05

    WritableUtils has convenience methods that make this easier.

    WritableUtils.writeEnum(dataOutput,enumData);
    enumData = WritableUtils.readEnum(dataInput,MyEnum.class);
    
    0 讨论(0)
  • 2021-01-19 04:13

    I don't know anything about Hadoop, but based on the documentation of the interface, you could probably do it like that:

    public void readFields(DataInput in) throws IOException {
         // do nothing
    }
    
    public static SomeEnumType read(DataInput in) throws IOException {
        int value = in.readInt();
        if (value == 0) {
            return SomeEnumType.A;
        }
        else if (value == 1) {
            return SomeEnumType.B;
        }
        else {
            throw new IOException("Invalid value " + value);
        }
    }
    
    0 讨论(0)
提交回复
热议问题