benchmarking

does the by( ) function make growing list

自古美人都是妖i 提交于 2019-12-22 04:39:23
问题 Does the by function make a list that grows one element at a time? I need to process a data frame with about 4M observations grouped by a factor column. The situation is similar to the example below: > # Make 4M rows of data > x = data.frame(col1=1:4000000, col2=10000001:14000000) > # Make a factor > x[,"f"] = x[,"col1"] - x[,"col1"] %% 5 > > head(x) col1 col2 f 1 1 10000001 0 2 2 10000002 0 3 3 10000003 0 4 4 10000004 0 5 5 10000005 5 6 6 10000006 5 Now, a tapply on one of the columns takes

Understanding Ruby on Rails render times

大城市里の小女人 提交于 2019-12-22 03:57:11
问题 I am working on an "optimization" on my application and I am trying to understand the output that rails (version 2.2.2) gives at the end of the render. Here is the "old" way: Rendered user/_old_log (25.7ms) Completed in 466ms (View: 195, DB: 8) | 200 OK And the "new" way: Rendered user/_new_log (48.6ms) Completed in 337ms (View: 192, DB: 33) | 200 OK These queries were exactly the same, the difference is the old way is parsing log files while the new way is querying the database log table.

Understanding Ruby on Rails render times

北城余情 提交于 2019-12-22 03:56:29
问题 I am working on an "optimization" on my application and I am trying to understand the output that rails (version 2.2.2) gives at the end of the render. Here is the "old" way: Rendered user/_old_log (25.7ms) Completed in 466ms (View: 195, DB: 8) | 200 OK And the "new" way: Rendered user/_new_log (48.6ms) Completed in 337ms (View: 192, DB: 33) | 200 OK These queries were exactly the same, the difference is the old way is parsing log files while the new way is querying the database log table.

when should I use a sorteddictionary instead of a dictionary [duplicate]

浪尽此生 提交于 2019-12-22 01:34:19
问题 This question already has answers here : SortedList<>, SortedDictionary<> and Dictionary<> (6 answers) Closed 5 years ago . As I wrote in some of my last posts I am still quite new to the c# world so it comes that I wrote small benchmark to compare Dictionary, Hashtable, SortedList and SortedDictionary against each other. The test runs with 8000 iterations and from 50 to 100000 elements. I tested adding of new elements, search for elements and looping through some elements all random. The

Subprocess memory usage in python

杀马特。学长 韩版系。学妹 提交于 2019-12-22 00:02:18
问题 How can one measure/benchmark maximum memory usage of a subprocess executed within python? 回答1: I made a little utility class that demonstrates how to do this with the psutil library: import psutil import subprocess class ProcessTimer: def __init__(self,command): self.command = command self.execution_state = False def execute(self): self.max_vms_memory = 0 self.max_rss_memory = 0 self.t1 = None self.t0 = time.time() self.p = subprocess.Popen(self.command,shell=False) self.execution_state =

Tools to profile function execution times of a .NET program

烈酒焚心 提交于 2019-12-21 21:33:28
问题 What tools are available to profile a .NET program by measuring function execution times and generating graphs to visualize the time spent at various points in the call graph? 回答1: AQTime and dotTrace are two very good commerical profilers. A free option would be ProfileSharp, though I have had little luck with it. Microsoft provides the CLR Profiler as well, which works well, but has fewer features. 回答2: It'll cost you but Ants Performance Profiler will do the job. 回答3: CLR Profiler 回答4:

How to accurately measure clock cycles used by a c++ function?

牧云@^-^@ 提交于 2019-12-21 20:19:11
问题 I know that I have to use: rdtsc. The measured function is deterministic but the result is far from being repeatable (I get 5% oscillations from run to run). Possible causes are: context switching cache misses Do you know any other causes? How to eliminate them? 回答1: TSCs (what rdtsc uses) are often not synchronized on multi-processor systems. It may help to set the CPU affinity in order to bind the process to a single CPU. You could also get timestamps from HPET timers if available, which

Vectorized C# code with SIMD using Vector<T> running slower than classic loop

大城市里の小女人 提交于 2019-12-21 18:34:43
问题 I've seen a few articles describing how Vector<T> is SIMD-enabled and is implemented using JIT intrinsics so the compiler will correctly output AVS/SSE/... instructions when using it, allowing much faster code than classic, linear loops (example here). I decided to try to rewrite a method I have to see if I managed to get some speedup, but so far I failed and the vectorized code is running 3 times slower than the original, and I'm not exactly sure as to why. Here are two versions of a method

Why does linking to librt swap performance between g++ and clang?

倖福魔咒の 提交于 2019-12-21 17:02:46
问题 I just found this answer from @tony-d with a bench code to test virtual function call overhead. I checked is benchmark using g++ : $ g++ -O2 -o vdt vdt.cpp -lrt $ ./vdt virtual dispatch: 150000000 0.128562 switched: 150000000 0.0803207 overheads: 150000000 0.0543323 ... I got better performance that his (ratio is about 2), but then I checked with clang : $ clang++-3.7 -O2 -o vdt vdt.cpp -lrt $ ./vdt virtual dispatch: 150000000 0.462368 switched: 150000000 0.0569544 overheads: 150000000 0

Why does linking to librt swap performance between g++ and clang?

末鹿安然 提交于 2019-12-21 17:01:11
问题 I just found this answer from @tony-d with a bench code to test virtual function call overhead. I checked is benchmark using g++ : $ g++ -O2 -o vdt vdt.cpp -lrt $ ./vdt virtual dispatch: 150000000 0.128562 switched: 150000000 0.0803207 overheads: 150000000 0.0543323 ... I got better performance that his (ratio is about 2), but then I checked with clang : $ clang++-3.7 -O2 -o vdt vdt.cpp -lrt $ ./vdt virtual dispatch: 150000000 0.462368 switched: 150000000 0.0569544 overheads: 150000000 0