Sql server function for displaying word frequency in a column

后端 未结 3 1110
醉酒成梦
醉酒成梦 2020-12-06 08:18

I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)

相关标签:
3条回答
  • 2020-12-06 08:42

    Okay this works a treat. Firstly a function to separate the values...

    Alter Function dbo.SeparateValues    
    
    (    
     @data VARCHAR(MAX),    
     @delimiter VARCHAR(10)     
    )     
    RETURNS     
    @tbldata TABLE(col VARCHAR(MAX))    
    As    
    --Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
    --Declare @tbldata TABLE(col VARCHAR(10))    
    --Set @data = 'hello,how,are,you?,234234'    
    --Set @delimiter = ','    
    --DECLARE @tbl TABLE(col VARCHAR(10))    
    Begin    
    DECLARE @pos INT    
    DECLARE @prevpos INT    
    SET @pos = 1     
    SET @prevpos = 0    
    
    WHILE @pos > 0     
    BEGIN    
    SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
    if @pos > 0     
    INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
    else    
    INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
    SET @prevpos = @pos     
    End    
    
    RETURN       
    END    
    

    then I just apply it to my table...

    Select Count(*), sep.Col FROM (
            Select * FROM (
                Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
            ) easyValues
            Where value <> '' 
        ) actualValues 
        Cross Apply dbo.SeparateValues(value, ' ') sep
        Group By sep.Col
        Order By Count(*) Desc
    

    Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.

    0 讨论(0)
  • 2020-12-06 08:57
    DECLARE @phrases TABLE (id int, phrase varchar(max))
    INSERT @phrases values
    (1,'Red and White'  ),
    (2,'green'          ),
    (3,'White and blue' ),
    (4,'Blue'           ),
    (5,'Dark blue'      );
    
    SELECT word, COUNT(*) c
    FROM @phrases
    CROSS APPLY (SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
    CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
    GROUP BY word
    
    word         freq
    ----------- -----------
    and         2
    blue        3
    Dark        1
    green       1
    Red         1
    White       2
    
    0 讨论(0)
  • 2020-12-06 08:59

    You're main problem is that you're missing a split function in SQL Server.

    Theres a sample one here that looks pretty good..

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

    Using that, you write a stored proc along the lines of...

    CREATE TABLE #Temp (Response nvarchar(50), Frequency int) 
    
    DECLARE @response nvarchar(100)
    DECLARE db_cursor CURSOR FOR 
    SELECT response FROM YourTable
    
    OPEN db_cursor  
    FETCH NEXT FROM db_cursor INTO @response 
    
    WHILE @@FETCH_STATUS = 0  
    BEGIN  
           /* Pseudo Code */ 
           --Split @Response 
           --Iterate through each word in returned list
           --IF(EXISTS in #TEMP)
           --    UPDATE THAT ROW & INCREMENT THE FREQUENCY
           --ELSE
           --    NEW WORD, INSERT TO #Temp WITH A FREQUENCY OF 1
    
           FETCH NEXT FROM db_cursor INTO @response 
    END   
    
    SELECT * FROM #Temp
    

    Theres probably a less fugly way to do this without cursors, but if it's just something you need to run once, and you're table or Responses isn't phenomenally large, then this should work

    0 讨论(0)
提交回复
热议问题