SQL Server's isNumeric() equivalent in amazon redshift

后端 未结 8 1898
庸人自扰
庸人自扰 2021-01-04 02:17
  • I\'m using amazon redshift as my data warehouse
  • I have a field (field1)of type string. Some of the strings start with four numbers and others with letters:
相关标签:
8条回答
  • 2021-01-04 02:45

    Try something like:

    where field1 ~ '^[0-9]{4}'
    

    It will match any string, that starts with 4 digits.

    0 讨论(0)
  • 2021-01-04 02:45

    Although long time has passed since this question was asked I have not found an adequate response. So I feel obliged to share my solution which works fine on my Redshift cluster today (March 2016).

    The UDF function is:

    create or replace function isnumeric (aval VARCHAR(20000))
      returns bool
    IMMUTABLE 
    as $$
        try:
           x = int(aval);
        except:
           return (1==2);
        else:
           return (1==1);
    $$ language plpythonu;
    

    Usage would be:

    select isnumeric(mycolumn), * from mytable
        where isnumeric(mycolumn)=false
    
    0 讨论(0)
  • 2021-01-04 02:45
    where regexp_instr(field1,'^[0-9]{4}') = 0
    

    will remove rows starting with 4 digits (the above regexp_instr will return 1 for the rows with field1 starting with 4 digits)

    0 讨论(0)
  • 2021-01-04 02:52

    redshift should support similar to.

    WHERE field1 SIMILAR TO '[0-9]{4}%'
    

    This reads as where field1 starts with 4 characters in the range of 0-9, then anything else.

    0 讨论(0)
  • 2021-01-04 02:53

    It seems that redshift doesn't support any of the following:

    where left(field1,4) like '[0-9][0-9][0-9][0-9]' 
    where left(field1,4) ~ '^[0-9]{4}'
    where left(field1,4) like '^[0-9]{4}'
    

    what does seem to work is:

    where left(field1,4) between 0 and 9999
    

    this returns all rows that start with four numeric characters.

    it seems that even though field1 is type string, the 'between' function interprets left(field1,4) as a single integer when the string characters are numeric (and does not give an error when they are not numeric). I'll follow up if I find a problem. For instance I don't deal with anything less than 1000, so I assume, but am not sure, that 0001 is interpreted as 1.

    0 讨论(0)
  • 2021-01-04 02:55

    Per Amazon, the posix style ~regex style expressions are slow... https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html

    Using their own REGEXP_* functions seems to be faster. https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html

    For checking just a true/false for integers I've been using the following with success. REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0

    this returns 1 if only numeric, 0 if anything else

    0 讨论(0)
提交回复
热议问题