How does Hive stores data and what is SerDe?

后端未结

关注

 4  1186

执笔经年 2021-02-04 12:50

when querying a table, a SerDe will deserialize a row of data from the bytes in the file to objects used internally by Hive to operate on that row of data. when

4条回答

野趣味 (楼主)

2021-02-04 13:33

In this aspect we can see Hive as some kind of database engine. This engine is working on tables which are built from records.
When we let Hive (as well as any other database) to work in its own internal formats - we do not care.
When we want Hive to process our own files as tables (external tables) we have to let him know - how to translate data in files into records. This is exactly the role of SerDe. You can see it as plug-in which enables Hive to read / write your data.
For example - you want to work with CSV. Here is example of CSV_Serde https://github.com/ogrodnek/csv-serde/blob/master/src/main/java/com/bizo/hive/serde/csv/CSVSerde.java Method serialize will read the data, and chop it into fields assuming it is CSV
Method deserialize will take a record and format it as CSV.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...