The number of differences in a column

后端 未结 5 1130
情深已故
情深已故 2021-01-29 07:18

I would like to retrieve a column of how many differences in letters in each row. For instance

If you have a a value \"test\" and another row has a value \"testing \", t

相关标签:
5条回答
  • 2021-01-29 07:39

    You read a next record with LEAD. Then compare the strings with LIKE or other string functions:

    select
      id, value, category,
      case when value like next_value + '%' or next_value like value + '%' 
           then len(next_value) - len(value)
      end as differences
    from
    (
      select id, value, category, lead(value) over (order by id) as next_value 
      from mytable
    ) this_and_next;
    

    If you only want to compare values within the same category use a partition clause:

    lead(value) over (partition by category order by id)
    

    UPDATE: Please see DhruvJoshi's answer on SQL Server's LEN. This function doesn't count trailing blanks, as I assumed, so you need his trick in case you want to have them counted. Here is the doc on LEN confirming this behaviour: https://technet.microsoft.com/en-us/library/ms190329(v=sql.105).aspx

    0 讨论(0)
  • 2021-01-29 07:51

    I think you just want len() and lead():

    select t.id, t.value, t.category,
           (len(lead(value) over (partition by t.category order by t.id) -
            len(value)
           ) as difference
    from t;
    
    0 讨论(0)
  • 2021-01-29 07:54
    create table #temp
    (
    id int,
    value varchar(30),
    category int
    )
    
    insert into #temp
    select 1,'test',1
    union all
    select 2,'testing',1
    union all
    select 1,'Candy',2
    union all
    select 2,'Ca',2
    
    ;with cte
    as
    (
    select id,value,category,lead(value) over (partition by category order by id) as nxtvalue
    from #temp
    )
    select id,value,category,len(replace(nxtvalue,value,'')) as differences
    from cte
    
    0 讨论(0)
  • 2021-01-29 07:58

    I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.

    (though take time to test the code and understand the method rather than just copying the code and using it, so that you can understand the output if something seems to go wrong - otherwise you could be creating some technical debt you'll have to pay back later)

    Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.

    0 讨论(0)
  • 2021-01-29 08:02

    you can also use self joining query like below:

    --create table tbl (id int,  value nvarchar(100), category int);
    --insert into tbl values
    --(1,N'test',1)
    --,(2,N' testing',1)
    --,(11,N'candy',2)      
    --,(12,N'ca',2);
    select A.*, LEN(B.value)-LEN(A.value) as difference
    from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
    --drop table tbl
    

    Update: I noticed that you have oddly positioned the space at the end. SQL server most times does not count the trailing spaces when calculating length. So here's the hack on above query

    select A.*, LEN(B.value+'>')-LEN(A.value+'>') as difference
    from tbl A LEFT JOIN tbl B on A.id +1 =B.id and A.category=B.category
    

    As pointed out in comments, that Id's may not be consecutive, in such cases try this :

    create table #temp ( rownum int PRIMARY KEY IDENTITY(1,1), id int, value nvarchar(100), category int)
    insert into #temp (id, value, category)
    select id, value, category from tbl order by id asc
    
    
        select A.id, A.value, A.category, LEN(B.value+'>')-LEN(A.value+'>') as difference
        from #temp A LEFT JOIN #temp B on A.rownum +1 =B.rownum and A.category=B.category
    
    0 讨论(0)
提交回复
热议问题