NULL column names in Hive query result

前端 未结 2 841
离开以前
离开以前 2021-01-21 01:34

I have downloaded the weather .txt files from NOAA, which looks like:

WBAN,Date,Time,StationType,SkyCondition,SkyConditionFlag,Visibility,VisibilityFlag,WeatherTy         


        
相关标签:
2条回答
  • 2021-01-21 02:20

    Interesting question, it took me a minute to realize what is going on but with the right knowledge of hive it is actually obvious!

    1. The first thing to note here is that the NULL values occur in columns that are not of type string.
    2. The second thing to realize is that hive (unlike beeline for example) normally does NOT print column headers above your selection.

    So, putting 1 and 2 together:

    • The column names are fine, as you will see from a query like Describe Weather.
    • The file that you use as datasource, appears to have had column names on the first row. These are now making up the first row of your hive table. Of course the columns of type string have no problem dealing with this data, but columns of type int will show NULL when they are asked to handle strings that cannot be cast to int properly.

    Suggestion:

    Try to get rid of the first row, preferably before creating the external table.

    0 讨论(0)
  • 2021-01-21 02:31

    To add to Dennis' comment above, you can skip the first line from being inserted into your table if you're using a CSV SerDe like so:

    CREATE EXTERNAL TABLE cases (
      id INT,
      case_number STRING,
      name STRING,
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    STORED AS TEXTFILE
    LOCATION '/hdfs/path'
    tblproperties("skip.header.line.count"="1");
    

    The operative line being:

    tblproperties("skip.header.line.count"="1")
    
    0 讨论(0)
提交回复
热议问题