I am trying to work with a JSON file with this bag structure :
{
\"user_id\": \"kim95\",
\"type\": \"Book\",
\"title\": \"Modern Database Systems: The O
add jar only add to session which won't be available and finally it is getting error. Get the JAR loaded on all the nodes at Hive and Map Reduce path like the below location so that HIVE and Map Reduce component will pick this whenever it’s been called.
/hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0- 1.cdh5.2.0.p0.36/lib/hive/lib/json-serde-1.3.6-jar-with-dependencies.jar
/hadoop/CDH_5.2.0_Linux_parcel/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce/lib/json-serde-1.3.6-jar-with-dependencies.jar
Note: This path varies to cluster.
Hive does not have built in support for JSON. So for using JSON with Hive we need to use third part jars like: https://github.com/rcongiu/Hive-JSON-Serde
You have couple of issues with the create table statement. It should look like this:
CREATE EXTERNAL TABLE IF NOT EXISTS serd (
user_id string,type string,title string,year string,publisher string,authors array<string>,source:string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION...
The JSON records your are using keep each record in a single line like this:
{"user_id": "kim95", "type": "Book", "title": "Modern Database Systems: The Object Model, Interoperability, and Beyond.", "year": "1995", "publisher": "ACM Press and Addison-Wesley", "authors": [{"name":"null"}], "source": "DBLP"}
{"user_id": "marshallo79", "type": "Book", "title": "Inequalities: Theory of Majorization and Its Application.", "year": "1979", "publisher": "Academic Press","authors": [{"name":"Albert W. Marshall"},{"name":"Ingram Olkin"}], "source": "DBLP"}
After downloading the project from GIT you need to compile the the project which will create a jar you need to add this jar in the Hive session before running the create table statement.
Hope it helps...!!!