Create quick/reliable benchmark with java?

前端未结

关注

 3  1735

I\'m trying to create a benchmark test with java. Currently I have the following simple method:

public static long runTest(int times){
    long start = Syste


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-11-30 12:31
              
            
            
                                                                       
In short: 

(Micro-)benchmarking is very complex, so use a tool like the Benchmarking framework http://www.ellipticgroup.com/misc/projectLibrary.zip - and still be skeptical about the results ("Put micro-trust in a micro-benchmark", Dr. Cliff Click).

In detail:

There are a lot of factors that can strongly influence the results: 


The accuracy and precision of System.nanoTime: it is in the worst case as bad as of System.currentTimeMillis.
code warmup and class loading
mixed mode: JVMs JIT compile (see Edwin Buck's answer) only after a code block is called sufficiently often (1500 or 1000 times)
dynamic optimizations: deoptimization, on-stack replacement, dead code elimination (you should use the result you computed in your loop, e.g. print it)
resource reclamation: garbace collection (see Michael Borgwardt's answer) and object finalization
caching: I/O and CPU
your operating system on the whole: screen saver, power management, other processes (indexer, virus scan, ...)


Brent Boyer's article "Robust Java benchmarking, Part 1: Issues" ( http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html) is a good description of all those issues and whether/what you can do against them (e.g. use JVM options or call ProcessIdleTask beforehand).

You won't be able to eliminate all these factors, so doing statistics is a good idea. But: 


instead of computing the difference between the max and min, you should put in the effort to compute the standard deviation (the results {1, 1000 times 2, 3} is different from {501 times 1, 501 times 3}). 
The reliability is taken into account by producing confidence intervals (e.g. via bootstrapping). 


The above mentioned Benchmark framework ( http://www.ellipticgroup.com/misc/projectLibrary.zip) uses these techniques. You can read about them in Brent Boyer's article "Robust Java benchmarking, Part 2: Statistics and solutions" ( https://www.ibm.com/developerworks/java/library/j-benchmark2/). 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2020-11-30 12:36
              
            
            
                                                                       
In the 10 million times run, odds are good the HotSpot compiler detected a "heavily used" piece of code and compiled it into machine native code.

JVM bytecode is interpreted, which leads it susceptible to more interrupts from other background processes occurring in the JVM (like garbage collection).

Generally speaking, these kinds of benchmarks are rife with assumptions that don't hold.  You cannot believe that a micro benchmark really proves what it set out to prove without a lot of evidence proving that the initial measurement (time) isn't actually measuring your task and possibly some other background tasks.  If you don't attempt to control for background tasks, then the measurement is much less useful.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-11-30 12:37
              
            
            
                                                                       
Your code ends up testing mainly garbage collection performance because appending to a String in a loop ends up creating and immediately discarding a large number of increasingly large String objects. 

This is something that inherently leads to wildly varying measurements and is influenced strongy by multi-thread activity.

I suggest you do something else in your loop that has more predictable performance, like mathematical calculations.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复