I have a page URL column components of which are delimited by /
. I tried to run the SPLIT()
function in BigQuery but it only gives the first value.
#standardSQL
SELECT SPLIT(path, '/')[OFFSET(0)] part1,
SPLIT(path, '/')[OFFSET(1)] part2,
SPLIT(path, '/')[OFFSET(2)] part3
FROM (SELECT "/a/b/aaaa?c" path)
Now I understand you want them in different columns.
An alternative to the query you provided:
SELECT FIRST(SPLIT(path, '/')) part1,
NTH(2, SPLIT(path, '/')) part2,
NTH(3, SPLIT(path, '/')) part3
FROM (SELECT "/a/b/aaaa?c" path)
NTH(X, SPLIT(s))
will provide the Xth value from the SPLIT. FIRST(s)
is the same as NTH(1, s)
You can also try the following with SPLIT function, however you need to know how many '/' your url would have or make enough entries so that if your url contains more '/' then you can still get those values in the seperate columns
SPLIT(`url`, '/')[safe_ordinal(1)] AS `Col1`,
SPLIT(`url`, '/')[safe_ordinal(2)] AS `Col2`,
SPLIT(`url`, '/')[safe_ordinal(3)] AS `Col3`,
SPLIT(`url`, '/')[safe_ordinal(4)] AS `Col4`,
.
.
SPLIT(`url`, '/')[safe_ordinal(N)] AS `ColN`
Solved it in a way.
SELECT
date,
hits_time,
fullVisitorId,
visitNumber,
hits_hitNumber,
X.page_path,
REGEXP_EXTRACT(X.page_path,r'/(\w*)\/') as one,
REGEXP_EXTRACT(X.page_path,r'/\w*\/(\w*)') as two,
REGEXP_EXTRACT(X.page_path,r'/\w*\/\w*\/(\w*)') as three,
REGEXP_EXTRACT(X.page_path,r'/\w*/\w*/\w*\/(\w*)\/.*') as four
from
(
select
date, hits_time, fullVisitorId, visitNumber, hits_hitNumber,
REGEXP_REPLACE (hits_page_pagePath, '-', '') as page_path
from
[Intent.All2mon]
) X
limit 1000
in standard sql, you can use the following functions:
array[OFFSET(zero_based_offset)]
array[ORDINAL(one_based_ordinal)]
so
SELECT SPLIT(path, '/')[OFFSET(1)] part2,
SPLIT(path, '/')[ORDINAL(2)] part2_again,
SPLIT(path, '/')[ORDINAL(3)] part3
FROM (SELECT "/a/b/aaaa?c" path)
part2 part2_again part3
a a b
part1
in this case, is empty string (before the first slash)
This works for me:
SELECT SPLIT(path, '/') part
FROM (SELECT "/a/b/aaaa?c" path)
Row part
1 a
2 b
3 aaaa?c
Not sure why it wouldn't work for you. What does your data look like?