Removing repeated duplicated characters

后端 未结 6 1425
醉酒成梦
醉酒成梦 2021-02-08 12:12

I have a string in my stored proc like \',,,sam,,bob,\' or \',,,\' from the above string I have to delete multiple commas from it, it must look like <

6条回答
  •  梦如初夏
    2021-02-08 13:17

    George Mastros wrote:


    I would suggest a UDF to do this. Since the UDF I am about to suggest doesn't touch any tables, the performance should be pretty good.

    I agree that "memory only" Scalar UDF's are quite fast. In fact, I actually used one of George's Scalar UDFs, which solved the "Initial Caps" problem, to demonstrate that sometimes "Set Based" code ISN'T always the best way to go.

    However, Martin Smith (another poster on this very thread) was definitely on the right track. In this case, "Set Based" is still the way to go. Of course, anyone can make an unsubstantiated claim as to performance so let's heat this up with a performance demonstration.

    To demonstrate, we first need some test data. A LOT of test data because both of the functions we're going to test run nasty fast. Here's the code to build a million row test table.

    --===== Conditionally drop the test table 
         -- to make reruns in SSMS easier
         IF OBJECT_ID('tempdb..#MyHead','U') IS NOT NULL
            DROP TABLE #MyHead
    GO
    --===== Create and populate the test table on-the-fly.
         -- This builds a bunch of GUIDs and removes the dashes from them to 
         -- increase the chances of duplicating adjacent characters.
         -- Not to worry.  This takes less than 7 seconds to run because of
         -- the "Pseudo Cursor" created by the CROSS JOIN.
     SELECT TOP 1000000
            RowNum     = IDENTITY(INT,1,1),
            SomeString = REPLACE(CAST(NEWID() AS VARCHAR(36)),'-','')
       INTO #MyHead
       FROM sys.all_columns ac1
      CROSS JOIN sys.all_columns ac2
    ;
    GO
    

    No need to repost George's fine function here but I do need to post mine. The following function produces the same result as George's does. It looks like an "iTVF" (Inline Table Valued Function) and it is but it only returns one value. That's why Microsoft calls them "Inline Scalar Functions" (I call them "iSFs" for short).

     CREATE FUNCTION dbo.CleanDuplicatesJBM
            (@Data VARCHAR(8000), @DuplicateChar VARCHAR(1))
    RETURNS TABLE WITH SCHEMABINDING AS
     RETURN 
     SELECT Item =  STUFF(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
                        @DuplicateChar+@Data COLLATE LATIN1_GENERAL_BIN,
                    REPLICATE(@DuplicateChar,33),@DuplicateChar),
                    REPLICATE(@DuplicateChar,17),@DuplicateChar),
                    REPLICATE(@DuplicateChar, 9),@DuplicateChar),
                    REPLICATE(@DuplicateChar, 5),@DuplicateChar),
                    REPLICATE(@DuplicateChar, 3),@DuplicateChar),
                    REPLICATE(@DuplicateChar, 2),@DuplicateChar),
                    REPLICATE(@DuplicateChar, 2),@DuplicateChar)
                    ,1,1,'')
    ;
    GO
    

    First, let's test George's Scalar UDF. Please read the comments about why we're not using SET STATISTICS TIME ON here.

    /******************************************************************************
     Test George's code.
     Since Scalar Functions don't work well with SET STATISTICS TIME ON, we measure
     duration a different way.  We'll also throw away the result in a "Bit Bucket"
     variable because we're trying to measure the performance of the function 
     rather than how long it takes to display or store results.
    ******************************************************************************/
    --===== Declare some obviously named variables
    DECLARE @StartTime DATETIME,
            @BitBucket VARCHAR(36)
    ;
    --===== Start the "Timer"
     SELECT @StartTime = GETDATE()
    ;
    --===== Run the test on the function
     SELECT @BitBucket = [dbo].[CleanDuplicates](SomeString,'A')
       FROM #MyHead
    ;
    --===== Display the duration in milliseconds
      PRINT DATEDIFF(ms,@StartTime,GETDATE())
    ;
    --===== Run the test a total of 5 times
    GO 5
    

    Here are the returns from that "fiver" run...

    Beginning execution loop
    15750
    15516
    15543
    15480
    15510
    Batch execution completed 5 times.
    (Average is 15,559 on my 10 year old, single 1.8Ghz CPU)
    

    Now, we'll run the "iSF" version...

    /******************************************************************************
     Test Jeff's code.
     Even though this uses an "iSF" (Inline Scalar Function), we'll test exactly
     the same way that we tested George's code so we're comparing apples-to-apples.
     This includes throwing away the result in a "Bit Bucket" variable because 
     we're trying to measure the performance of the function rather than how long 
     it takes to display or store results.
    ******************************************************************************/
    --===== Declare some obviously named variables
    DECLARE @StartTime DATETIME,
            @BitBucket VARCHAR(36)
    ;
    --===== Start the "Timer"
     SELECT @StartTime = GETDATE()
    ;
    --===== Run the test on the function
     SELECT @BitBucket = cleaned.ITEM
       FROM #MyHead
      CROSS APPLY [dbo].[CleanDuplicatesJBM](SomeString,'A') cleaned
    ;
    --===== Display the duration in milliseconds
      PRINT DATEDIFF(ms,@StartTime,GETDATE())
    ;
    --===== Run the test a total of 5 times
    GO 5
    

    Here are the results from that run.

    Beginning execution loop
    6856
    6810
    7020
    7350
    6996
    Batch execution completed 5 times.
    (Average is 7,006 {more than twice as fast} on my 10 year old, single 1.8Ghz CPU)
    

    My point ISN'T that George's code is bad. Not at all. In fact, I use Scalar UDFs when there is no "single query" solution. I'll also state and back George up by saying that not all "single query" solutions are always the best.

    Just don't stop looking for them when it comes to UDFs. ;-)

提交回复
热议问题