How to identify unicode text in sql?

后端 未结 2 1827
不知归路
不知归路 2021-01-07 09:08

Table1 has nvarchar column called umsg which contains unicode text and some time english also.

I want to find out English text present in umsg column.



        
相关标签:
2条回答
  • 2021-01-07 09:42

    check below :

    ;WITH CTE
     AS (
     SELECT ID,
            DATE,
            umsg,
            CASE
                WHEN(CAST(umsg AS VARCHAR(MAX)) COLLATE SQL_Latin1_General_Cp1251_CS_AS) = umsg
                THEN 0
                ELSE 1
            END HasSpecialChars
     FROM <table_name>)
     SELECT ID,
            DATE,
            umsg
     FROM CTE
     WHERE Date >= '01/01/2014'
           AND Date < '09/26/2017'
           AND HasSpecialChars = 0;
    

    Desired Output :

    ID  DATE                     umsg
    1   2017-09-12 00:00:00.000  The livers detoxification processes.                                                                     
    2   2017-09-11 00:00:00.000  Purposely added 1      
    

    Hope, it will help you.

    0 讨论(0)
  • 2021-01-07 09:47

    You did not answer what you want in case there are some unicode and some ascii characters in the same string, so I give you 1 idea and 1 solution for the case if you want only to find "pure English" or "mixed" rows.

    You need a table of natural numbers to do this .In case you have no such a table you can generate it like this:

    select top 1000000  row_number() over(order by getdate()) as n
    into dbo.nums
    from sys.messages m1 cross join sys.messages m2;
    
    alter table dbo.nums alter column n int not null;
    
    alter table dbo.nums add constraint PK_nums_n primary key(n); 
    

    Now that you have a table of natural numbers we are going to decompose your strings into single characters to check if ascii(character) = unicode(character):

    declare @t table(col Nvarchar(200));
    insert into @t values
    (N'ref no été'), (N'The livers detoxification processes.'), (N'फेंगशुई के छोटे-छोटे टिप्स से आप जीवन की विषमताओं से')
    
    select t.col, n, substring(t.col, n, 1) as nth_character,
           ascii(substring(t.col, n, 1)) as ascii,
           unicode(substring(t.col, n, 1)) as uni
    from @t t join dbo.nums n
           on n.n <= len(t.col); -- this is to give you an idea how to see if it's unicode character or ascii
    
    with cte as
    (
    select t.col, n, substring(t.col, n, 1) as nth_character,
           ascii(substring(t.col, n, 1)) as ascii,
           unicode(substring(t.col, n, 1)) as uni
    from @t t join dbo.nums n
           on n.n <= len(t.col)
    )
    select col, 
           case
                when sum(case when ascii = uni then 1 else 0 end) = count(*) then 'English only'
                else 'Not only English'
           end as eng_or_not
    from cte
    group by col -- row level solution
    

    The first part of the code shows you your string character by character along with character's ascii ande unicode code: where they are the same it's ascii character.

    The second part just check if all the characters are ascii.

    0 讨论(0)
提交回复
热议问题