Spark issues in creating hfiles- Added a key not lexically larger than previous cell

前端 未结 1 843
逝去的感伤
逝去的感伤 2021-01-21 20:03

I am trying to create hfiles to do bulk load into Hbase and it keeps throwing the error with the row key even though everything looks fine. I am using the following code:

<
相关标签:
1条回答
  • 2021-01-21 20:49

    After spending couple of hours I found the solution, rootcause is that the columns are not sorted.

    Since Hfile needs keyvalue in lexicographically sorted order and in your case while writing HFileOutputFormat2->AbstractHFileWriter found Added a key not lexically larger than previous. Current cell. You have already applied sorting at row level once you sort the columns also it would work.

    Question here with good explanation why-hbase-keyvaluesortreducer-need-to-sort-all-keyvalue.

    Solution:

    //sort columns
    val cols = companyDs.columns.sorted
    
    //Rest of the code is same
    
    val output = companyDs.rdd.flatMap(x => {
      val rowKey = Bytes.toBytes(x(0).toString)
     val hkey = new ImmutableBytesWritable(rowKey)
      for (i <- 0 to cols.length - 1) yield {
        val index = x.fieldIndex(new String(cols(i)))
        val value = if (x.isNullAt(index)) "".getBytes else x(index).toString.getBytes
        val kv = new KeyValue(rowKey,COLUMN_FAMILY, cols(i).getBytes(),System.currentTimeMillis()+i ,x(i).toString.getBytes())
        (hkey,kv)
      }
    })
    output.saveAsNewAPIHadoopFile("<path>"
      , classOf[ImmutableBytesWritable], classOf[KeyValue],
      classOf[HFileOutputFormat2], config)
    
    0 讨论(0)
提交回复
热议问题