Update n random rows in SQL

前端未结

关注

 3  1939

予麋鹿 2021-02-06 11:00

I have table which is having about 1000 rows.I have to update a column(\"X\") in the table to \'Y\' for n ramdom rows. For this i can have following query

update


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   不知归路
                                             
                
                
                (楼主)
            
              
              
                2021-02-06 11:50
              

            
            
                        
You can improve performance by replacing the full table scan with a sample.

The first problem you run into is that you can't use SAMPLE in a DML subquery, ORA-30560: SAMPLE clause not allowed.  But logically this is what is needed:

UPDATE xyz SET x='Y' WHERE rowid IN (
    SELECT r FROM (
        SELECT ROWID r FROM xyz sample(0.15) ORDER BY dbms_random.value
    ) RNDM WHERE rownum < 100/*n*/+1
);


You can get around this by using a collection to store the rowids, and then update the rows using the rowid collection.  Normally breaking a query into separate parts and gluing them together with PL/SQL leads to horrible performance.  But in this case you can still save a lot of time by significantly reducing the amount of data read.

declare
    type rowid_nt is table of rowid;
    rowids rowid_nt;
begin
    --Get the rowids
    SELECT r bulk collect into rowids
    FROM (
        SELECT ROWID r
        FROM xyz sample(0.15)
        ORDER BY dbms_random.value
    ) RNDM WHERE rownum < 100/*n*/+1;

    --update the table
    forall i in 1 .. rowids.count
        update xyz set x = 'Y'
        where rowid = rowids(i);
end;
/


I ran a simple test with 100,000 rows (on a table with only two columns), and N = 100.
The original version took 0.85 seconds, @Gerrat's answer took 0.7 seconds, and the PL/SQL version took 0.015 seconds.

But that's only one scenario, I don't have enough information to say my answer will always be better. As N increases the sampling advantage is lost, and the writing will be more significant than the reading. If you have a very small amount of data, the PL/SQL context switching overhead in my answer may make it slower than @Gerrat's solution.

For performance issues, the size of the table in bytes is usually much more important than the size in rows.  1000 rows that use a terabyte of space is much larger than 100 million rows that only use a gigabyte.

Here are some problems to consider with my answer:


Sampling does not always return exactly the percent you asked for.  With 100,000 rows and a 0.15% sample size the number of rows returned was 147, not 150.  That's why I used 0.15 instead of 0.10.  You need to over-sample a little bit to ensure that you get more than N.  How much do you need to over-sample?  I have no idea, you'll probably have to test it and pick a safe number.
You need to know the approximate number of rows to pick the percent.
The percent must be a literal, so as the number of rows and N change, you'll need to use dynamic SQL to change the percent.

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复