Hive QL selecting numeric substring of string

我的未来我决定 提交于 2021-01-28 04:10:50

问题


I have a table with two columns: id, datastring

The id column is just a bigint and the datastring column has elements that look like

{"12345":[6789,true]}

{"1234678":[5678, false]}

I would like to select a table where the first column is the id and the second column is the number in the quotes part of the datastring. However, this number is not always the same number of digits.

The result should be

id, numstring

4321, 12345
4322, 134678

Thanks in advance.


回答1:


You have at least two options. The first one is slow and clean, the second one is quick and dirty.

Let's start with the second option - built-in Hive function regexp_extract:

SELECT id, regexp_extract(datastring, '^\\{"(\\d+)".*$', 1) AS numstring 
  FROM your_table

Done!

regexp_extract takes three arguments: string, regexp, and group index. If regexp matches the string then regexp_extract returns regexp group specified by the group index.

The first option is to write your own UDF in Java that'll extract numstring from datastring using Jackson or whatever. I would go with the regexp_extract. But that's me.



来源:https://stackoverflow.com/questions/21129956/hive-ql-selecting-numeric-substring-of-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!