The number of differences in a column

后端未结

关注

 5  1130

情深已故

I would like to retrieve a column of how many differences in letters in each row. For instance

If you have a a value \"test\" and another row has a value \"testing \", t

相关标签:

5条回答

粉色の甜心

2021-01-29 07:39
You read a next record with LEAD. Then compare the strings with LIKE or other string functions:
```
select
  id, value, category,
  case when value like next_value + '%' or next_value like value + '%' 
       then len(next_value) - len(value)
  end as differences
from
(
  select id, value, category, lead(value) over (order by id) as next_value 
  from mytable
) this_and_next;
```
If you only want to compare values within the same category use a partition clause:
```
lead(value) over (partition by category order by id)
```
UPDATE: Please see DhruvJoshi's answer on SQL Server's LEN. This function doesn't count trailing blanks, as I assumed, so you need his trick in case you want to have them counted. Here is the doc on LEN confirming this behaviour: https://technet.microsoft.com/en-us/library/ms190329(v=sql.105).aspx
0 讨论(0)
发布评论:

提交评论
- 加载中...

谎友^

2021-01-29 07:51

I think you just want len() and lead():

select t.id, t.value, t.category,
       (len(lead(value) over (partition by t.category order by t.id) -
        len(value)
       ) as difference
from t;

0 讨论(0)

小鲜肉

2021-01-29 07:54

create table #temp
(
id int,
value varchar(30),
category int
)

insert into #temp
select 1,'test',1
union all
select 2,'testing',1
union all
select 1,'Candy',2
union all
select 2,'Ca',2

;with cte
as
(
select id,value,category,lead(value) over (partition by category order by id) as nxtvalue
from #temp
)
select id,value,category,len(replace(nxtvalue,value,'')) as differences
from cte

0 讨论(0)

萌比男神i

2021-01-29 07:58

I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.

^{(though take time to test the code and understand the method rather than just copying the code and using it, so that you can understand the output if something seems to go wrong - otherwise you could be creating some technical debt you'll have to pay back later)}

Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.

0 讨论(0)
发布评论:

提交评论
- 加载中...

[愿得一人]

2021-01-29 08:02

you can also use self joining query like below:

--create table tbl (id int,  value nvarchar(100), category int);
--insert into tbl values
--(1,N'test',1)
--,(2,N' testing',1)
--,(11,N'candy',2)      
--,(12,N'ca',2);
select A.*, LEN(B.value)-LEN(A.value) as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
--drop table tbl

Update: I noticed that you have oddly positioned the space at the end. SQL server most times does not count the trailing spaces when calculating length. So here's the hack on above query

select A.*, LEN(B.value+'>')-LEN(A.value+'>') as difference
from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category

As pointed out in comments, that Id's may not be consecutive, in such cases try this :

create table #temp ( rownum int PRIMARY KEY IDENTITY(1,1), id int, value nvarchar(100), category int)
insert into #temp (id, value, category)
select id, value, category from tbl order by id asc


    select A.id, A.value, A.category, LEN(B.value+'>')-LEN(A.value+'>') as difference
    from #temp A LEFT JOIN #temp B on A.rownum +1 =B.rownum and A.category=B.category

0 讨论(0)