does white space slow down processing

前端 未结 5 729
孤独总比滥情好
孤独总比滥情好 2020-12-07 04:06

I have huge amounts of data to analyze, I tend to leave space between words or variable names as I write my code, So the question is, incases where efficiency is the number

相关标签:
5条回答
  • 2020-12-07 04:41

    The only part this can affect is the parsing of the source code into tokens. I can't imagine that the difference in parsing time would be significant. However, you can eliminate this aspect by compiling the functions using the compile or cmpfun functions of the compiler package. Then the parsing is only done once and any whitespace difference can not affect execution time.

    0 讨论(0)
  • 2020-12-07 04:43

    There should be no difference in performance, although:

    fn1<-function(a,b) c<-a+b
    fn2<-function(a,b) c <- a + b
    
    library(rbenchmark)
    
    > benchmark(fn1(1,2),fn2(1,2),replications=10000000)
           test replications elapsed relative user.self sys.self user.child
    1 fn1(1, 2)     10000000   53.87    1.212      53.4     0.37         NA
    2 fn2(1, 2)     10000000   44.46    1.000      44.3     0.14         NA
    

    same with microbenchmark:

    Unit: nanoseconds
          expr min  lq median  uq      max neval
     fn1(1, 2)   0 467    467 468 90397803 1e+07
     fn2(1, 2)   0 467    467 468 85995868 1e+07
    

    So the first result was bogus..

    0 讨论(0)
  • 2020-12-07 04:53

    YES

    But, No, not really:

    TL;DR It would probably take longer just to run your script to remove the whitespaces than the time it saved by removing them.

    @Josh O'Brien really hit the nail on the head. But I juts couldnt resist to benchmark

    As you can see, if you are dealing with an order of magnitude of 100 MILLION lines then you will see a miniscule hinderance. HOWEVER With that many lines, there would be a high likelihood of their being at least one (if not hundreds) of hotspots, where simply improving the code in one of these would give you much greater speed than greping out all the whitespace.

      library(microbenchmark)
    
      microbenchmark(LottaSpace = eval(LottaSpace), NoSpace = eval(NoSpace), NormalSpace = eval(NormalSpace), times=10e7)
    
      @ 100 times;  Unit: microseconds
               expr   min     lq median     uq    max
      1  LottaSpace 7.526 7.9185 8.1065 8.4655 54.850
      2 NormalSpace 7.504 7.9115 8.1465 8.5540 28.409
      3     NoSpace 7.544 7.8645 8.0565 8.3270 12.241
    
      @ 10,000 times;  Unit: microseconds    
               expr   min    lq median    uq      max
      1  LottaSpace 7.284 7.943  8.094 8.294 47888.24
      2 NormalSpace 7.182 7.925  8.078 8.276 46318.20
      3     NoSpace 7.246 7.921  8.073 8.271 48687.72
    

    WHERE:

      LottaSpace <- quote({
            a            <-            3
            b                  <-                  4   
            c         <-      5
            for   (i            in      1:7)
                  i         +            i
      })
    
    
      NoSpace <- quote({
      a<-3
      b<-4
      c<-5
      for(i in 1:7)
      i+i
      })
    
      NormalSpace <- quote({
       a <- 3
       b <- 4 
       c <- 5
       for (i in 1:7)
       i + i
      })
    
    0 讨论(0)
  • 2020-12-07 04:57

    To a first, second, third, ..., approximation, no, it won't cost you any time at all.

    The extra time you spend pressing the space bar is orders of magnitude more costly than the cost at run time (and neither matter at all).

    The much more significant cost will come from any any decreased readability that results from leaving out spaces, which can make code harder (for humans) to parse.

    0 讨论(0)
  • 2020-12-07 05:01

    In a word, no!

    library(microbenchmark)
    
    f1 <- function(x){
        j   <- rnorm( x , mean = 0 , sd = 1 )         ;
        k   <-      j    *      2         ;
        return(    k     )
    }
    
    f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)}
    
    
    microbenchmark( f1(1e3) , f2(1e3) , times= 1e3 )
        Unit: microseconds
         expr     min       lq  median      uq      max neval
     f1(1000) 110.763 112.8430 113.554 114.319  677.996  1000
     f2(1000) 110.386 112.6755 113.416 114.151 5717.811  1000
    
    #Even more runs and longer sampling
    microbenchmark( f1(1e4) , f2(1e4) , times= 1e4 )
      Unit: milliseconds
          expr      min       lq   median       uq       max neval
     f1(10000) 1.060010 1.074880 1.079174 1.083414 66.791782 10000
     f2(10000) 1.058773 1.074186 1.078485 1.082866  7.491616 10000
    

    EDIT

    It seems like using microbenchmark would be unfair because the expressions are parsed before ever they are run in the loop. However using source should mean that with each iteration the sourced code must be parsed and whitespace removed. So I saved the functions to two seperate files, with the last line of the file being a call of the function, e.g.so my file f2.R looks like this:

    f2 <- function(x){j<-rnorm(x,mean=0,sd=1);k<-j*2;return(k)};f2(1e3)
    

    And I test them like so:

    microbenchmark( eval(source("~/Desktop/f2.R")) ,  eval(source("~/Desktop/f1.R")) , times = 1e3)
      Unit: microseconds
                               expr     min       lq   median      uq       max neval
     eval(source("~/Desktop/f2.R")) 649.786 658.6225 663.6485 671.772  7025.662  1000
     eval(source("~/Desktop/f1.R")) 687.023 697.2890 702.2315 710.111 19014.116  1000
    

    And a visual representation of the difference with 1e4 replications.... enter image description here

    Maybe it does make a minuscule difference in the situation where functions are repeatedly parsed but this wouldn't happen in normal use cases.

    0 讨论(0)
提交回复
热议问题