Table1 has nvarchar column called umsg which contains unicode text and some time english also.
I want to find out English text present in umsg column.
check below :
;WITH CTE
AS (
SELECT ID,
DATE,
umsg,
CASE
WHEN(CAST(umsg AS VARCHAR(MAX)) COLLATE SQL_Latin1_General_Cp1251_CS_AS) = umsg
THEN 0
ELSE 1
END HasSpecialChars
FROM <table_name>)
SELECT ID,
DATE,
umsg
FROM CTE
WHERE Date >= '01/01/2014'
AND Date < '09/26/2017'
AND HasSpecialChars = 0;
Desired Output :
ID DATE umsg
1 2017-09-12 00:00:00.000 The livers detoxification processes.
2 2017-09-11 00:00:00.000 Purposely added 1
Hope, it will help you.
You did not answer what you want in case there are some unicode and some ascii characters in the same string, so I give you 1 idea and 1 solution for the case if you want only to find "pure English" or "mixed" rows.
You need a table of natural numbers to do this .In case you have no such a table you can generate it like this:
select top 1000000 row_number() over(order by getdate()) as n
into dbo.nums
from sys.messages m1 cross join sys.messages m2;
alter table dbo.nums alter column n int not null;
alter table dbo.nums add constraint PK_nums_n primary key(n);
Now that you have a table of natural numbers we are going to decompose your strings into single characters to check if ascii(character)
= unicode(character)
:
declare @t table(col Nvarchar(200));
insert into @t values
(N'ref no été'), (N'The livers detoxification processes.'), (N'फेंगशुई के छोटे-छोटे टिप्स से आप जीवन की विषमताओं से')
select t.col, n, substring(t.col, n, 1) as nth_character,
ascii(substring(t.col, n, 1)) as ascii,
unicode(substring(t.col, n, 1)) as uni
from @t t join dbo.nums n
on n.n <= len(t.col); -- this is to give you an idea how to see if it's unicode character or ascii
with cte as
(
select t.col, n, substring(t.col, n, 1) as nth_character,
ascii(substring(t.col, n, 1)) as ascii,
unicode(substring(t.col, n, 1)) as uni
from @t t join dbo.nums n
on n.n <= len(t.col)
)
select col,
case
when sum(case when ascii = uni then 1 else 0 end) = count(*) then 'English only'
else 'Not only English'
end as eng_or_not
from cte
group by col -- row level solution
The first part of the code shows you your string character by character along with character's ascii ande unicode code: where they are the same it's ascii character.
The second part just check if all the characters are ascii.