MapReduce error when selecting column from JSON file in Cosmos

问题

The problem is the following:

After having created a table with Cygnus 0.2.1, I receive a MapReduce error when trying to select a column from Hive. If we see the files created in hadoop by Cygnus, we can see that the format used is JSON. This problem didn't appear in previous versions of Cygnus as it was creating hadoop files in CSV format.

In order to test it, I left 2 tables created reading from each format. You can compare and see the error with the following queries:

SELECT entitytype FROM fiware_ports_meteo; (it fails, created with 0.2.1 in JSON format)
SELECT entitytype FROM fiware_test_table; (it works, created with 0.2 in CSV format)

The path to the HDFS files are, respectively:

/user/fiware/ports/meteo
/user/fiware/testTable/

I suspect the error comes from parsing the JSON file by the MapReduce job since the CSV format works as expected.

How can this issue be avoided?

回答1:

You simply have to add the Json serde to the Hive classpath. As a not priviledged user, you can do that from the Hive CLI:

hive> ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar;

If you have developed a remote Hive client, you can perform the same operation as any other query execution. Let's say you are using Java:

Statement stmt = con.createStatement();
stmt.executeQuery(“ADD JAR /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar”);
stmt.close();

来源：https://stackoverflow.com/questions/25024342/mapreduce-error-when-selecting-column-from-json-file-in-cosmos

标签

json

csv

Hadoop

Hive

fiware-cygnus