问题
I'm working with climate data files with daily data so for most years 365 rasters in a brick. I want to sum over the value in files for subsets of days - say day x to day y. This can be done with stackApply. I've created some code below that generates some rasters, creates a brick and applies stackApply using specific values for x and y, 1 and 3.
What I need however is for x and y to taken from two raster layers. In the code below they are called raster.start and raster.end. Below the first set of code I have a second set that works but is slow.
library(raster)
r <- raster(nrows=100, ncols=100)
s <- stack(lapply(1:5, function(i) setValues(r, runif(ncell(r), min = -10*i, max = 10))))
raster.start <- setValues(r, sample(2, ncell(r), replace=TRUE))
raster.end <- raster.start + 3
rasterb <- brick(s)
indices <- format(as.Date(names(rasterb), format = "layer.%d"), format = "%d")
indices <- c(1,1,1,1,1)
datasum.all <- stackApply(rasterb, indices, fun = sum)
datasum.sub1 <- stackApply(rasterb[[c(1:3)]], indices, fun = sum)
The idea is to step through the rows and columns of the start and end raster to subset the brick and operate on it. Here's the code I developed to do this.
raster.out <- r
for (i in 1:nrow(r)){
for (j in 1:ncol(r)){
start <- raster.start[[1]][i,j] # get the starting day
end <- raster.end[[1]][i,j] # get the ending day
raster.out[i,j] <- sum(rasterb[[start:end]][i,j])
}
}
However, even for this toy example the computation time is slow. It took about 1.3 minutes to complete. I tried replacing some of the code with functions, as follows but it had no effect on the time to completion. Any advice on how to speed up this process greatly appreciated.
startEnd <- function(raster.start, raster.end, i,j) {
start <- raster.start[i,j] # get the starting day
end <- raster.end[i,j] # get the ending day
return(c(start,end))
}
rasterOutValue <- function(rasterb, i, j, startEnd){
return(sum(rasterb[[startEnd]][i,j]))
}
for (i in 1:nrow(raster.in1)){
for (j in 1:ncol(raster.in1)){
raster.out[i,j] <-rasterOutValue(rasterb, i, j, startEnd(raster.start, raster.end, i,j))
}
}
回答1:
Your example data
library(raster)
r <- raster(nrows=100, ncols=100)
set.seed(88)
b <- stack(lapply(1:5, function(i) setValues(r, runif(ncell(r), min = -10*i, max = 10))))
r.start <- setValues(r, sample(2, ncell(r), replace=TRUE))
r.end <- raster.start + 3
First an improved version of your example that works, but is too slow. The below is considerably faster, but still rather slow.
raster.out <- r
for (i in 1:ncell(r)){
start <- raster.start[i] # get the starting day
end <- raster.end[i] # get the ending day
raster.out[i] <- sum(rasterb[i][start:end])
}
That brings the time down from 74 to 5 seconds for me. But you should never loop over cells, that is always going to be too slow. Instead, you can do (in 0.04 seconds for me):
s <- stack(r.start, r.end, b)
x <- calc(s, fun=function(x) sum(x[(x[1]:x[2])+2]))
#class : RasterLayer
#dimensions : 100, 100, 10000 (nrow, ncol, ncell)
#resolution : 3.6, 1.8 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#crs : +proj=longlat +datum=WGS84 +no_defs
#source : memory
#names : layer
#values : -129.5758, 30.31813 (min, max)
And that seems to be correct
a <- s[1]
a
# layer.1.1 layer.2.1 layer.1.2 layer.2.2 layer.3 layer.4 layer.5
#[1,] 1 4 -1.789974 2.640807 4.431439 -23.09203 -5.688119
fun <- function(x) sum(x[(x[1]:x[2])+2])
fun(a)
#[1] -17.80976
x[1]
#[1] -17.80976
calc
is to Raster objects what apply
is to matrices. (that is why it is called app
in terra
.
The place to start is to first write a function that does what you want with a vector.
x <- 1:10
test1 <- function(start, end, values) {
mean(values[start:end])
}
test1(2, 5, x)
test1(5, 8, x)
calc
only takes one argument, so a function like this
test2 <- function(values) {
# the +2 to skip the first two elements in the computation
start <- values[1] + 2
end <- values[2] + 2
mean(values[start:end])
}
test2(c(2, 5, x))
test2(c(5, 8, x))
And a more concise version
test3 <- function(v) {
mean(v[ (v[1]:v[2])+2 ] )
}
test3(c(2, 5, x))
#[1] 3.5
test3(c(5, 8, x))
#[1] 6.5
Second addition (and reminder to always check with NA values!). test3
breaks when one of the indices (start and end) are NA
(it is OK if the others are NA
)
test3(c(NA, 5, x))
#Error in v[1]:v[2] : NA/NaN argument
So we need a function that catches these
test4 <- function(v) {
if (any(is.na(v[1:2]))) {
NA
} else {
mean(v[ (v[1]:v[2])+2 ] )
}
}
test4(c(NA, 5, x))
#[1] NA
test4(c(1, 5, x))
#[1] 3
Typically "start" and "end" will both be NA
at the same time, so a simpler version that should also work could be
test5 <- function(v) {
if (is.na(v[1])) {
NA
} else {
mean(v[ (v[1]:v[2])+2 ] )
}
}
This approach with calc
might be slow as it turns a RasterBrick into a RasterStack with 365 + 2 layers. That considerabley slows downs reading the data. So you could try this approach with overlay
instead (here using sum
again)
f <- function(i, v) {
j <- !is.na(i[,1])
r <- rep(NA, nrow(i))
x <- cbind(i[j,,drop=FALSE], v[j,,drop=FALSE])
r[j] <- apply(x, 1, function(y) sum(y[ (y[1]:y[2])+2 ] ))
r
}
cal <-stack(r.start, r.end)
x <- overlay(cal, b, fun= f, recycle=FALSE)
x
#class : RasterLayer
# ...
#values : -129.5758, 30.31813 (min, max)
You can speed up the algorithm by writing it in Rcpp/C++
library(Rcpp)
cppFunction('std::vector<double> gtemp(NumericMatrix cal, NumericMatrix wth) {
std::vector<double> out(cal.nrow(), NAN);
for (int i=0; i<cal.nrow(); i++) {
if (!std::isnan(cal(i,0))){
NumericVector v = wth(i,_);
size_t start = cal(i,0)-1;
size_t end = cal(i,1);
out[i] = std::accumulate(v.begin()+start, v.begin()+end, 0.0);
}
}
return out;
}')
x <- overlay(cal, b, fun=gtemp, recycle=FALSE)
And here is how you can do this with terra
(version >= 0.6-14) and the rapp
(range-apply) method.
Example data
library(terra)
d <- rast(nrows=100, ncols=100, nl=5)
rstart <- rast(d, nlyr=1)
nc <- ncell(d)
set.seed(88)
values(d) <- t(sapply(1:5, function(i) runif(nc, min = -10*i, max = 10)))
values(rstart) <- sample(2, nc, replace=TRUE)
rend <- rstart + 3
Solution
idx <- c(rstart, rend)
z <- rapp(d, idx, "sum")
z
#class : SpatRaster
#dimensions : 100, 100, 1 (nrow, ncol, nlyr)
#resolution : 3.6, 1.8 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#data source : memory
#names : lyr1
#min values : -184.6918
#max values : 34.93876
来源:https://stackoverflow.com/questions/61578461/r-raster-brick-sum-values-in-the-cells-determined-by-two-different-rasters-how