Update , SET option in Hive

后端 未结 2 558
花落未央
花落未央 2020-12-03 00:22

I know there is no update of file in Hadoop but in Hive it is possible with syntactic sugar to merge the new values with the old data in the table and then to rewrite the ta

相关标签:
2条回答
  • 2020-12-03 00:25
    INSERT OVERWRITE TABLE _tableName_ PARTITION (_partitionColumn_= _partitionValue_) 
    SELECT [other Things], CASE WHEN id=206 THEN 'florida' ELSE location END AS location, [other Other Things] 
    FROM _tableName_ WHERE [_whereClause_];
    

    You can have multiple partitions listed by separating them by commas. ... PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, ...). I haven't done this with multiple partitions, just one at a time, so I'd check the results on a test/dev env before doing all partitions at once. I had other reasons for limiting each OVERWRITE to a single partition as well.

    This page https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML has a little more on it.
    This site https://cwiki.apache.org/confluence/display/Hive/LanguageManual, in general, is your best friend when working with HiveSQL.

    I've developed something identical to this to migrate some data and it's worked. I haven't tried it against large datasets, only a few GB and it has worked perfectly.

    To Note - This will OVERWRITE the partition. It will make previous files go bye-bye. Create backup and restore scripts/procedures. The [other Things] and [other Other Things] are the rest of the columns from the table. They need to be in the correct order. This is very important or else your data will be corrupted.

    Hope this helps. :)

    0 讨论(0)
  • 2020-12-03 00:42

    This may be hacky but it's worked for somethings I've had to do at work.

        INSERT OVERWRITE TABLE tabletop PARTITION(partname = 'valueIwantToSet')
        SELECT things FROM databases.tables WHERE whereclause;
    

    As you might expect this breaks your data up into partitions, but if the distribution of the value you want to set is proportional to "good data chunk sizes" (this is up to you to design) then your queries on that data will be better optimized

    @Jothi: Could you please post the query you used?

    0 讨论(0)
提交回复
热议问题