I have huge amounts of data to analyze, I tend to leave space between words or variable names as I write my code, So the question is, incases where efficiency is the number
The only part this can affect is the parsing of the source code into tokens. I can't imagine that the difference in parsing time would be significant. However, you can eliminate this aspect by compiling the functions using the compile
or cmpfun
functions of the compiler
package. Then the parsing is only done once and any whitespace difference can not affect execution time.
There should be no difference in performance, although:
fn1<-function(a,b) c<-a+b
fn2<-function(a,b) c <- a + b
library(rbenchmark)
> benchmark(fn1(1,2),fn2(1,2),replications=10000000)
test replications elapsed relative user.self sys.self user.child
1 fn1(1, 2) 10000000 53.87 1.212 53.4 0.37 NA
2 fn2(1, 2) 10000000 44.46 1.000 44.3 0.14 NA
same with microbenchmark
:
Unit: nanoseconds
expr min lq median uq max neval
fn1(1, 2) 0 467 467 468 90397803 1e+07
fn2(1, 2) 0 467 467 468 85995868 1e+07
So the first result was bogus..
TL;DR It would probably take longer just to run your script to remove the whitespaces than the time it saved by removing them.
@Josh O'Brien really hit the nail on the head. But I juts couldnt resist to benchmark
As you can see, if you are dealing with an order of magnitude of 100 MILLION lines then you will see a miniscule hinderance.
HOWEVER With that many lines, there would be a high likelihood of their being at least one (if not hundreds) of hotspots,
where simply improving the code in one of these would give you much greater speed than grep
ing out all the whitespace.
library(microbenchmark)
microbenchmark(LottaSpace = eval(LottaSpace), NoSpace = eval(NoSpace), NormalSpace = eval(NormalSpace), times=10e7)
@ 100 times; Unit: microseconds
expr min lq median uq max
1 LottaSpace 7.526 7.9185 8.1065 8.4655 54.850
2 NormalSpace 7.504 7.9115 8.1465 8.5540 28.409
3 NoSpace 7.544 7.8645 8.0565 8.3270 12.241
@ 10,000 times; Unit: microseconds
expr min lq median uq max
1 LottaSpace 7.284 7.943 8.094 8.294 47888.24
2 NormalSpace 7.182 7.925 8.078 8.276 46318.20
3 NoSpace 7.246 7.921 8.073 8.271 48687.72
WHERE:
LottaSpace <- quote({
a <- 3
b <- 4
c <- 5
for (i in 1:7)
i + i
})
NoSpace <- quote({
a<-3
b<-4
c<-5
for(i in 1:7)
i+i
})
NormalSpace <- quote({
a <- 3
b <- 4
c <- 5
for (i in 1:7)
i + i
})
To a first, second, third, ..., approximation, no, it won't cost you any time at all.
The extra time you spend pressing the space bar is orders of magnitude more costly than the cost at run time (and neither matter at all).
The much more significant cost will come from any any decreased readability that results from leaving out spaces, which can make code harder (for humans) to parse.
In a word, no!
library(microbenchmark)
f1 <- function(x){
j <- rnorm( x , mean = 0 , sd = 1 ) ;
k <- j * 2 ;
return( k )
}
f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)}
microbenchmark( f1(1e3) , f2(1e3) , times= 1e3 )
Unit: microseconds
expr min lq median uq max neval
f1(1000) 110.763 112.8430 113.554 114.319 677.996 1000
f2(1000) 110.386 112.6755 113.416 114.151 5717.811 1000
#Even more runs and longer sampling
microbenchmark( f1(1e4) , f2(1e4) , times= 1e4 )
Unit: milliseconds
expr min lq median uq max neval
f1(10000) 1.060010 1.074880 1.079174 1.083414 66.791782 10000
f2(10000) 1.058773 1.074186 1.078485 1.082866 7.491616 10000
It seems like using microbenchmark would be unfair because the expressions are parsed before ever they are run in the loop. However using source
should mean that with each iteration the sourced code must be parsed and whitespace removed. So I saved the functions to two seperate files, with the last line of the file being a call of the function, e.g.so my file f2.R looks like this:
f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)};f2(1e3)
And I test them like so:
microbenchmark( eval(source("~/Desktop/f2.R")) , eval(source("~/Desktop/f1.R")) , times = 1e3)
Unit: microseconds
expr min lq median uq max neval
eval(source("~/Desktop/f2.R")) 649.786 658.6225 663.6485 671.772 7025.662 1000
eval(source("~/Desktop/f1.R")) 687.023 697.2890 702.2315 710.111 19014.116 1000
And a visual representation of the difference with 1e4 replications....
Maybe it does make a minuscule difference in the situation where functions are repeatedly parsed but this wouldn't happen in normal use cases.