Loading JSON file with serde in Cloudera

前端 未结 2 1762
长发绾君心
长发绾君心 2021-01-25 21:03

I am trying to work with a JSON file with this bag structure :

{
   \"user_id\": \"kim95\",
   \"type\": \"Book\",
   \"title\": \"Modern Database Systems: The O         


        
相关标签:
2条回答
  • 2021-01-25 21:46

    add jar only add to session which won't be available and finally it is getting error. Get the JAR loaded on all the nodes at Hive and Map Reduce path like the below location so that HIVE and Map Reduce component will pick this whenever it’s been called.

    1. /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0- 1.cdh5.2.0.p0.36/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar

    2. /hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce/lib/json-serde-1.3.6-jar-with-dependencies.jar

    Note: This path varies to cluster.

    0 讨论(0)
  • 2021-01-25 21:54

    Hive does not have built in support for JSON. So for using JSON with Hive we need to use third part jars like: https://github.com/rcongiu/Hive-JSON-Serde

    You have couple of issues with the create table statement. It should look like this:

    CREATE EXTERNAL TABLE IF NOT EXISTS serd ( 
    user_id string,type string,title string,year string,publisher string,authors array<string>,source:string)
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    LOCATION...
    

    The JSON records your are using keep each record in a single line like this:

    {"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"} 
    {"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press","authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}
    

    After downloading the project from GIT you need to compile the the project which will create a jar you need to add this jar in the Hive session before running the create table statement.

    Hope it helps...!!!

    0 讨论(0)
提交回复
热议问题