How can I find the pixel-wise standard deviation?

后端 未结 2 1990
太阳男子
太阳男子 2021-01-03 03:54

I have 20 rasters with same resolution and extent. It\'s a time series and each raster is for one year.

And I want to calculate the pixel-wise standard deviation of

相关标签:
2条回答
  • 2021-01-03 03:58

    Unfortunately, as you note after my comment above, per-pixel analyses can be slow. I think the next thing for you to try is to parallelize the process. Assuming you have a multi-core processor you can take advantage of calc() and its built-in multi-process optimization:

    cores <- 4
    beginCluster(cores, type='SOCK')
    calc(qq2stack, fun=sd)
    endCluster()
    

    This would result in a significant speed-up if your operating/hardware environment supports it. Obviously, you can increase the number of processes as well depending on your architecture.

    0 讨论(0)
  • 2021-01-03 04:23

    To address the slow performance of raster::calc for calculating the standard deviation of large rasters, I wrote a gist (here) that does it using GDAL's gdal_calc.py.

    To calculate the sd of 5 rasters with 4 million cells (2000 x 2000), gdal_calc takes about 2 seconds, compared to about 60 seconds for raster::calc (see example below).

    You'll need gdal_calc.py on the system path so that it can be found by Sys.which, or else modify gdal_calc <- Sys.which('gdal_calc.py') to specify the full path to gdal_calc.py.

    Note that, from memory, this function supports a maximum of 26 input rasters, since gdal_calc refers to each by a letter of the alphabet. Perhaps this constraint can be circumvented - I'm not familiar enough with gdalnumeric syntax to work it out.

    E.g.:

    First, let's create a stack of five random, 2000 x 2000 rasters. We'll use raster::calc on this stack, but we'll also write them out to temp files, because gdal_calc needs file paths:

    s <- stack(replicate(5, raster(matrix(runif(4000000), 2000))))  
    ff <- replicate(5, tempfile(fileext='.tif')) # a vector of temp file paths
    writeRaster(s, ff, bylayer=TRUE)
    

    Here's the raster::calc version:

    system.time(sd_calc <- calc(s, sd))
    
    ##  user  system elapsed 
    ## 79.83    0.08   80.00
    

    and the gdal_sd (i.e., using gdal_calc.py) version:

    devtools::source_gist('61c8062938e05a4c6b92') # source the gdal_sd gist
    system.time(gdal_sd(infile=ff, outfile=f <- tempfile(fileext='.tif')))
    
    ## Calculating standard deviation and writing to file17b0146640ac.tif
    ##    user  system elapsed 
    ##    0.00    0.03    2.05 
    

    And a comparison of their values:

    range(sd_calc[] - raster(f)[])
    ## [1] -1.110223e-16  1.110223e-16
    

    Note that raster::calc is likely to be faster for small rasters, but clearly gdal_sd is a huge improvement for large rasters.

    Documentation for the standard deviation function used by gdal_calc is given here. I've specified ddof=1 (i.e. delta degrees of freedom) such that results are consistent with those returned by R's sd.


    For posterity, here is the source of gdal_sd as it was at the time of posting:

    gdal_sd <- function(infile, outfile, quiet=TRUE) {
      require(rgdal)
      # infile:   The multiband raster file (or a vector of paths to multiple raster
      #           files) for which to calculate cell standard deviations.
      # outfile:  Path to raster output file.
      # quiet:    Logical. Should gdal_calc.py output be silenced?
      gdal_calc <- Sys.which('gdal_calc.py')
      if(gdal_calc=='') stop('gdal_calc.py not found on system.')
      if(file.exists(outfile)) stop('outfile already exists.')
      nbands <- sapply(infile, function(x) nrow(attr(GDALinfo(x), 'df')))
      if(length(infile) > 26 || nbands > 26) stop('Maximum number of inputs is 26.')
      if(length(nbands) > 1 & any(nbands > 1)) 
        warning('One or more rasters have multiple bands. First band used.')
    
      if(length(infile)==1) {
        inputs <- paste0('-', LETTERS[seq_len(nbands)], ' ', infile, ' --', 
               LETTERS[seq_len(nbands)], '_band ', seq_len(nbands), collapse=' ')
        n <- nbands
      } else {
        inputs <- paste0('-', LETTERS[seq_along(nbands)], ' ', infile, ' --', 
               LETTERS[seq_along(nbands)], '_band 1', collapse=' ')
        n <- length(infile)
      }
    
      message('Calculating standard deviation and writing to ', basename(outfile))
      cmd <- 'python %s %s --outfile=%s --calc="std([%s], 0, ddof=1)"'
      out <- system(
        sprintf(cmd, gdal_calc, inputs, outfile, 
                paste0(LETTERS[seq_len(n)], collapse=',')),
        show.output.on.console=!quiet, intern=TRUE
      )
      if(any(grepl('Error', out))) stop(out, call.=FALSE) else NULL
    }
    
    0 讨论(0)
提交回复
热议问题