How does Hive stores data and what is SerDe?

后端 未结 4 1183
执笔经年
执笔经年 2021-02-04 12:50

when querying a table, a SerDe will deserialize a row of data from the bytes in the file to objects used internally by Hive to operate on that row of data. when

4条回答
  •  北恋
    北恋 (楼主)
    2021-02-04 13:45

    Answers

    1. Yes, SerDe is a Library which is built-in to the Hadoop API
    2. Hive uses Files systems like HDFS or any other storage (FTP) to store data, data here is in the form of tables (which has rows and columns).
    3. SerDe - Serializer, Deserializer instructs hive on how to process a record (Row). Hive enables semi-structured (XML, Email, etc) or unstructured records (Audio, Video, etc) to be processed also. For Example If you have 1000 GB worth of RSS Feeds (RSS XMLs). You can ingest those to a location in HDFS. You would need to write a custom SerDe based on your XML structure so that Hive knows how to load XML files to Hive tables or other way around.

    For more information on how to write a SerDe read this post

提交回复
热议问题