Is there any way to find similar results in column. Example:
I want query return from table data without 4 green tree because there is no similar data to g
You could use SOUNDEX
to do this.
Sample data;
CREATE TABLE #SampleData (Column1 int, Column2 varchar(10))
INSERT INTO #SampleData (Column1, Column2)
VALUES
(1,'blue car')
,(2,'red doll')
,(3,'blue cars')
,(4,'green tree')
,(5,'red dolly')
The following code will use soundex
to create a list of similar entries in column2
. It then uses a different sub query to see how many occurrences of that soundex
field appear;
SELECT
a.GroupingField
,a.Title
,b.SimilarFields
FROM (
SELECT
SOUNDEX(Column2) GroupingField
,MAX(Column2) Title --Just return a unique title for this soundex group
FROM #SampleData
GROUP BY SOUNDEX(Column2)
) a
LEFT JOIN (
SELECT
SOUNDEX(Column2) GroupingField
,COUNT(Column2) SimilarFields --How many fields are in the soundex group?
FROM #SampleData
GROUP BY SOUNDEX(Column2)
) b
ON a.GroupingField = b.GroupingField
WHERE b.SimilarFields > 1
The results look like this (I've left the soundex
field in to show you what it looks like);
GroupingField Title SimilarFields
B400 blue cars 2
R300 red dolly 2
Some further reading on soundex
https://msdn.microsoft.com/en-gb/library/ms187384.aspx
Edit: as per your request, to get the original data you may as well push into a temp table, change the query i've given you to put an INTO
before the FROM
statement;
SELECT
a.GroupingField
,a.Title
,b.SimilarFields
INTO #Duplicates
FROM (
SELECT
SOUNDEX(Column2) GroupingField
,MAX(Column2) Title --Just return a unique title for this soundex group
FROM #SampleData
GROUP BY SOUNDEX(Column2)
) a
LEFT JOIN (
SELECT
SOUNDEX(Column2) GroupingField
,COUNT(Column2) SimilarFields --How many fields are in the soundex group?
FROM #SampleData
GROUP BY SOUNDEX(Column2)
) b
ON a.GroupingField = b.GroupingField
WHERE b.SimilarFields > 1
Then use the following query to link back to your original data;
SELECT
a.GroupingField
,a.Title
,a.SimilarFields
,b.Column1
,b.Column2
FROM #Duplicates a
JOIN #SampleData b
ON a.GroupingField = SOUNDEX(b.Column2)
ORDER BY a.GroupingField
Would give the following result;
GroupingField Title SimilarFields Column1 Column2
B400 blue cars 2 1 blue car
B400 blue cars 2 3 blue cars
R300 red dolly 2 5 red dolly
R300 red dolly 2 2 red doll
Remember to
DROP TABLE #Differences
This approach uses a very basic notion of similarity but can be extended to a better definition. It's not very efficient, mind you. The count(1) + 1
includes the base phrase.
create table phrases ( phrase varchar(max) )
insert phrases values( 'blue car' ), ( 'blue cars' ), ('green tree' ), ( 'red doll' ), ( 'red dolly' )
create function dbo.fnSimilar( @s1 varchar(max), @s2 varchar(max) )
returns int
begin
if @s1 = @s2 return 0 -- a phrase is not similar to itself
if @s1 like @s2 + '%' return 1
if @s2 like @s1 + '%' return 2
return 0
end
select x.phrase, similar = count(1) + 1 from
(
select p1.phrase from phrases p1
inner join phrases p2 on dbo.fnSimilar( p2.phrase, p1.phrase ) = 1
) x
group by x.phrase
Result:
phrase similar
-------- -------
blue car 2
red doll 2
As Gar rightfully commented, you have to define what do you mean by "similarity". But if all you need is just some fixed number (8 in your example) of equal characters, you can do the following :
create table myTest
(
id int,
name varchar(20)
);
insert into myTest values(1, 'blue car');
insert into myTest values(2, 'red doll');
insert into myTest values(3, 'blue cars');
insert into myTest values(4, 'green tree');
insert into myTest values(5, 'red dolly');
select left(name,8), count(*)
from myTest
group by left(name,8)
having count(*) > 1;