Randomly associate elements of two vectors given conditions

前端未结

关注

 5  743

心在旅途 2021-01-23 18:26

I have a data.table of capitals

capitals<-data.table(capital=c(100,50,25,5))
capitals
   capital
1:     100
2:      50
3:      25
4:       5


      
      
        
          5条回答        

        
                    
            
            
                         
                
              
              
                
                   南方客
                                             
                
                
                (楼主)
            
              
              
                2021-01-23 18:45
              

            
            
                        
The naive solution to this problem involves a loop over n capital values and, for each capital value, a search over n loss values so that the solution time varies by n^2.  Probably not much can be done about the capital loop, but the loss search time can be reduced in two ways.  First, find the upper bounds for the losses which need to be searched can be found as Alex and Shambho do by sorting and using  findInterval() and then second within the capital loop the list of possible losses to be passed to sample() can be updated as I do below rather than re-created from the entire list.  Since the size of the list of possible losses is always much smaller than n, the execution times with this approach increase more nearly linearly with n which results in significantly reduced execution times for this range of n.  It’s also helpful to create the loss tracking vector with full space rather than alloc space on each iteration in loop.  My function also returns the results in the same order as the capital values were input which seems proper.   Microbenchmark reports the times for ffben and ffwalt as shown below for both of Ben’s data sets.   Note that times are in milliseconds.

Unit: milliseconds

              expr         min         lq      median          uq         max neval
    ffben(cap2, los2)   1549.8289   1556.113   1565.7139   1592.3230   1593.9527     5
   ffwalt(cap2, los2)    205.4834    206.267    206.5975    207.0464    212.9808     5
 ffben(capital, loss) 154235.8823 154855.444 154969.9196 155052.6070 156250.5489     5
ffwalt(capital, loss)   2071.3610   2074.692   2099.4889   2100.1091   2117.4721     5


Since the capital data set is 10x the size of the cap2 data set, it appears that the times for ffben increase as n^2 while the times for ffwalt increase only linearly, both as expected.

ffwalt <- function( caps, loss) {
len_cap <- length(caps)
loss_srt <- sort(loss)
caps_ord <- order(caps)
caps_srt <- caps[caps_ord]
cap_mx_ls_idx <- findInterval(caps_srt, loss_srt)  # find upper loss bounds for each value of capital
loss_picked <- vector("numeric",len_cap)  #  alocate space for full loss vector to avoid mem alloc time in capital loop
samp <- seq_len(cap_mx_ls_idx[1])
for( i in seq_len(len_cap-1) )  {
  loss_picked[i] <- sample(x=samp,1, replace=FALSE)
  if(cap_mx_ls_idx[i+1] > cap_mx_ls_idx[i]) 
       add_samp <- seq(cap_mx_ls_idx[i]+1,cap_mx_ls_idx[i+1],1)
  else add_samp  <- NULL
  samp <- c(samp[samp != loss_picked[i]], add_samp)
}
loss_picked[len_cap] <- samp             # avoid problem with sample() when x has length 1
results <- data.frame(capital=caps_srt, loss=loss_srt[loss_picked])
results[caps_ord,] <- results            # restore original caps order
return(results)
}

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它5个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复