问题
I find the following phenomena:
I have a BQ query with 100s of fields extracted using REGEXP_EXTRACT function.
I added a new expression and got the following Error: Failed to parse regular expression "": pattern too large - compile failed.
When querying this expression alone, everything runs fine, in a larger query, i get the error.
This is a replica of the problem base on the github sample data and a simple regex:
SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){0}(\w)') as Pos1,
REGEXP_EXTRACT(repository.description,r'(?:\w){1}(\w)') as Pos2,
REGEXP_EXTRACT(repository.description,r'(?:\w){2}(\w)') as Pos3,
.
. here it goes on and on in the same pattern
.
REGEXP_EXTRACT(repository.description,r'(?:\w){198}(\w)') as Pos199,
REGEXP_EXTRACT(repository.description,r'(?:\w){199}(\w)') as Pos200,
REGEXP_EXTRACT(repository.description,r'(?:\w){200}(\w)') as Pos201,
FROM [publicdata:samples.github_nested] LIMIT 1000
It returns:
Failed to parse regular expression "(?:\w){162}(\w)": pattern too large - compile failed
but when running:
SELECT repository.description,
REGEXP_EXTRACT(repository.description,r'(?:\w){162}(\w)') as Pos163,
FROM [publicdata:samples.github_nested] LIMIT 1000
Everything runs OK...
Is there a limit to # of REGEXP_EXTRACTs, or their combined complexity, that can be used in a single query?
回答1:
I'll look into the issue. As a workaround, it looks like what you're trying to do is to split out the field into separate fields per character position... so turn "abc" into {pos1: "a", pos2: "b", pos3: "c"}. Is that correct? If so, you might want to try the LEFT() and RIGHT() functions. As in
LEFT(1, reponsitory.description) as pos1,
RIGHT(1, LEFT(2, reponsitory.description)) as pos2,
RIGHT(1, LEFT(3, reponsitory.description)) as pos3.
This should use fewer resources than compiling 200 regular expressions (although it is still not likely to be fast).
来源:https://stackoverflow.com/questions/22691498/error-failed-to-parse-regular-expression-pattern-too-large-compile-failed