does white space slow down processing

前端未结

关注

 5  729

I have huge amounts of data to analyze, I tend to leave space between words or variable names as I write my code, So the question is, incases where efficiency is the number

相关标签:

5条回答

慢半拍i

2020-12-07 04:41

The only part this can affect is the parsing of the source code into tokens. I can't imagine that the difference in parsing time would be significant. However, you can eliminate this aspect by compiling the functions using the compile or cmpfun functions of the compiler package. Then the parsing is only done once and any whitespace difference can not affect execution time.

0 讨论(0)
发布评论:

提交评论
- 加载中...

走了就别回头了

2020-12-07 04:43

There should be no difference in performance, although:

fn1<-function(a,b) c<-a+b
fn2<-function(a,b) c <- a + b

library(rbenchmark)

> benchmark(fn1(1,2),fn2(1,2),replications=10000000)
       test replications elapsed relative user.self sys.self user.child
1 fn1(1, 2)     10000000   53.87    1.212      53.4     0.37         NA
2 fn2(1, 2)     10000000   44.46    1.000      44.3     0.14         NA

same with microbenchmark:

Unit: nanoseconds
      expr min  lq median  uq      max neval
 fn1(1, 2)   0 467    467 468 90397803 1e+07
 fn2(1, 2)   0 467    467 468 85995868 1e+07

So the first result was bogus..

0 讨论(0)

难免孤独

2020-12-07 04:53

YES

But, No, not really:

TL;DR It would probably take longer just to run your script to remove the whitespaces than the time it saved by removing them.

@Josh O'Brien really hit the nail on the head. But I juts couldnt resist to benchmark

As you can see, if you are dealing with an order of magnitude of 100 MILLION lines then you will see a miniscule hinderance. HOWEVER With that many lines, there would be a high likelihood of their being at least one (if not hundreds) of hotspots, where simply improving the code in one of these would give you much greater speed than greping out all the whitespace.

  library(microbenchmark)

  microbenchmark(LottaSpace = eval(LottaSpace), NoSpace = eval(NoSpace), NormalSpace = eval(NormalSpace), times=10e7)

  @ 100 times;  Unit: microseconds
           expr   min     lq median     uq    max
  1  LottaSpace 7.526 7.9185 8.1065 8.4655 54.850
  2 NormalSpace 7.504 7.9115 8.1465 8.5540 28.409
  3     NoSpace 7.544 7.8645 8.0565 8.3270 12.241

  @ 10,000 times;  Unit: microseconds    
           expr   min    lq median    uq      max
  1  LottaSpace 7.284 7.943  8.094 8.294 47888.24
  2 NormalSpace 7.182 7.925  8.078 8.276 46318.20
  3     NoSpace 7.246 7.921  8.073 8.271 48687.72

WHERE:

  LottaSpace <- quote({
        a            <-            3
        b                  <-                  4   
        c         <-      5
        for   (i            in      1:7)
              i         +            i
  })


  NoSpace <- quote({
  a<-3
  b<-4
  c<-5
  for(i in 1:7)
  i+i
  })

  NormalSpace <- quote({
   a <- 3
   b <- 4 
   c <- 5
   for (i in 1:7)
   i + i
  })

0 讨论(0)

难免孤独

2020-12-07 04:57

To a first, second, third, ..., approximation, no, it won't cost you any time at all.

The extra time you spend pressing the space bar is orders of magnitude more costly than the cost at run time (and neither matter at all).

The much more significant cost will come from any any decreased readability that results from leaving out spaces, which can make code harder (for humans) to parse.

0 讨论(0)
发布评论:

提交评论
- 加载中...

滥情空心

2020-12-07 05:01

In a word, no!

library(microbenchmark)

f1 <- function(x){
    j   <- rnorm( x , mean = 0 , sd = 1 )         ;
    k   <-      j    *      2         ;
    return(    k     )
}

f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)}


microbenchmark( f1(1e3) , f2(1e3) , times= 1e3 )
    Unit: microseconds
     expr     min       lq  median      uq      max neval
 f1(1000) 110.763 112.8430 113.554 114.319  677.996  1000
 f2(1000) 110.386 112.6755 113.416 114.151 5717.811  1000

#Even more runs and longer sampling
microbenchmark( f1(1e4) , f2(1e4) , times= 1e4 )
  Unit: milliseconds
      expr      min       lq   median       uq       max neval
 f1(10000) 1.060010 1.074880 1.079174 1.083414 66.791782 10000
 f2(10000) 1.058773 1.074186 1.078485 1.082866  7.491616 10000

EDIT

It seems like using microbenchmark would be unfair because the expressions are parsed before ever they are run in the loop. However using source should mean that with each iteration the sourced code must be parsed and whitespace removed. So I saved the functions to two seperate files, with the last line of the file being a call of the function, e.g.so my file f2.R looks like this:

f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)};f2(1e3)

And I test them like so:

microbenchmark( eval(source("~/Desktop/f2.R")) ,  eval(source("~/Desktop/f1.R")) , times = 1e3)
  Unit: microseconds
                           expr     min       lq   median      uq       max neval
 eval(source("~/Desktop/f2.R")) 649.786 658.6225 663.6485 671.772  7025.662  1000
 eval(source("~/Desktop/f1.R")) 687.023 697.2890 702.2315 710.111 19014.116  1000

And a visual representation of the difference with 1e4 replications.... enter image description here

Maybe it does make a minuscule difference in the situation where functions are repeatedly parsed but this wouldn't happen in normal use cases.

0 讨论(0)