Try something like:
where field1 ~ '^[0-9]{4}'
It will match any string, that starts with 4 digits.
Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).
The UDF function is:
create or replace function isnumeric (aval VARCHAR(20000))
returns bool
IMMUTABLE
as $$
try:
x = int(aval);
except:
return (1==2);
else:
return (1==1);
$$ language plpythonu;
Usage would be:
select isnumeric(mycolumn), * from mytable
where isnumeric(mycolumn)=false
where regexp_instr(field1,'^[0-9]{4}') = 0
will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)
redshift should support similar to.
WHERE field1 SIMILAR TO '[0-9]{4}%'
This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.
It seems that redshift doesn't support any of the following:
where left(field1,4) like '[0-9][0-9][0-9][0-9]'
where left(field1,4) ~ '^[0-9]{4}'
where left(field1,4) like '^[0-9]{4}'
what does seem to work is:
where left(field1,4) between 0 and 9999
this returns all rows that start with four numeric characters.
it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.
Per Amazon, the posix style ~regex style expressions are slow... https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html
Using their own REGEXP_*
functions seems to be faster.
https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html
For checking just a true/false for integers I've been using the following with success.
REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0
this returns 1 if only numeric, 0 if anything else