问题
I have a table with two columns: id, datastring
The id column is just a bigint and the datastring column has elements that look like
{"12345":[6789,true]}
{"1234678":[5678, false]}
I would like to select a table where the first column is the id and the second column is the number in the quotes part of the datastring. However, this number is not always the same number of digits.
The result should be
id, numstring
4321, 12345
4322, 134678
Thanks in advance.
回答1:
You have at least two options. The first one is slow and clean, the second one is quick and dirty.
Let's start with the second option - built-in Hive function regexp_extract
:
SELECT id, regexp_extract(datastring, '^\\{"(\\d+)".*$', 1) AS numstring
FROM your_table
Done!
regexp_extract
takes three arguments: string, regexp, and group index. If regexp matches the string then regexp_extract
returns regexp group specified by the group index.
The first option is to write your own UDF in Java that'll extract numstring from datastring using Jackson or whatever. I would go with the regexp_extract
. But that's me.
来源:https://stackoverflow.com/questions/21129956/hive-ql-selecting-numeric-substring-of-string