I am trying to create hfiles to do bulk load into Hbase and it keeps throwing the error with the row key even though everything looks fine. I am using the following code:
<After spending couple of hours I found the solution, rootcause is that the columns are not sorted.
Since Hfile needs keyvalue in lexicographically sorted order and in your case while writing HFileOutputFormat2->AbstractHFileWriter
found Added a key not lexically larger than previous. Current cell
. You have already applied sorting at row level once you sort the columns also it would work.
Question here with good explanation why-hbase-keyvaluesortreducer-need-to-sort-all-keyvalue.
Solution:
//sort columns
val cols = companyDs.columns.sorted
//Rest of the code is same
val output = companyDs.rdd.flatMap(x => {
val rowKey = Bytes.toBytes(x(0).toString)
val hkey = new ImmutableBytesWritable(rowKey)
for (i <- 0 to cols.length - 1) yield {
val index = x.fieldIndex(new String(cols(i)))
val value = if (x.isNullAt(index)) "".getBytes else x(index).toString.getBytes
val kv = new KeyValue(rowKey,COLUMN_FAMILY, cols(i).getBytes(),System.currentTimeMillis()+i ,x(i).toString.getBytes())
(hkey,kv)
}
})
output.saveAsNewAPIHadoopFile("<path>"
, classOf[ImmutableBytesWritable], classOf[KeyValue],
classOf[HFileOutputFormat2], config)