问题
I have a requirement to exclude certain folder from prefix and process the data in snowflake (Copy statement)
In the below example I need to process files under emp/ and exclude files from abc/
Input :
s3://bucket1/emp/
E1.CSV
E2.CSV
/abc/E11.csv
s3://bucket1/emp/abc/ - E11.csv
Output :
s3://bucket1/emp/
E1.CSV
E2.CSV
Is there any suggestion around pattern to handle this ?
回答1:
With the pattern
keyword you can try to exclude certain files. However when using the pattern matching with the NOT syntax, you exclude any file with any of the characters.
Assuming your stage URL is defined as s3://bucket1/emp/
LS @MY_STAGE pattern = '[^abc].*';
- Excludes anything starting with a, b, or c
LS @MY_STAGE pattern = '[^a][^b][^c][^\\/].*';
- Excludes anything where:
- The first character is a, OR
- The second character is b, OR
- The third character is c, OR
- The fourth character is a forward slash /
Edit
After testing with Sharvan's example. Here is what I've found:
Doesn't work:
ls @my_stage PATTERN='^((?!/abc/).)*$';
because the first forward slash is duplicated as part of the stage URL (it is automatically appended to the stage URL if not present)
Works: ls @my_stage PATTERN='^((?!abc/).)*$';
because the first forward slash is removed
Updated as the forward slash does not need to be escaped
Snowflake does not support backreferences (per their documentation) but there is no mention of lookaheads or lookbehinds, which I thought was un-supported.
https://docs.snowflake.net/manuals/sql-reference/functions-regexp.html#backreferences
回答2:
Use this to exclude the prefix pattern
ls @stage PATTERN='^((?!/abc/).)*$'
来源:https://stackoverflow.com/questions/59417105/how-to-avoid-sub-folders-in-snowflake-copy-statement