Merge multiple *.bil climate data into *.csv

问题

I have over 7,000 files *.bil files I'm trying to merge into one *.csv file and export it. I am able to read the *.bil files using raster and as.data.frame :

setwd("/.../Prism Weather Data All/")
filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")
r = raster("PRISM_ppt_stable_4kmM2_189501_bil.bil")
test <- as.data.frame(r, na.rm=TRUE)

This sets the working directory and grabs all files with *.bil. But I only raster one file and set as.data.frame to verify it's correct, which works perfect. But I'm trying to figure out how to merge all 7000 files (filenames) into one.

Any help with this would be appreciated. Thanks in advance.

回答1:

Assuming 7000 is a real number an not an approximation, and that all the data in each file is identically structured (same number of columns and rows):

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns of each file (assuming they're all the same)
nr<- ## put the number of rows of each file (assuming they're all the same)

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=7000*nc,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    final.df[,counter:counter+nc]<-test
    counter<-counter+nc+1
}

# write the csv
write.csv(final.df,"final-filename.csv")

Keep in mind that your machine has to have enough memory to keep all the data, as R needs to have objects in memory.

If the number of columns differs from file to file, you can adjust that by adjusting the indexes in the final.df assignment inside the loop, and increasing counter accordingly.

Edit: to produce expected result

I think a for loop is about the only way to do this sort of job. And indeed 7000 files is quite a large set, so expect to spend some time seeing it iterate.

setwd("/.../Prism Weather Data All/")

nc<- ## put the number of columns you expect the data in the files to have
nr<- ## put roughly the number of rows times 12 (if you plan to read a year worth of data)
     ## PLUS some tolerance, so you'll end up with an object actually larger than needed

filenames <- list.files(path = "/.../Prism Weather Data All/", pattern = ".bil")

# initialize what is likely to be a large object
final.df<-as.data.frame(matrix(NA,ncol=c,nrow=nr)) 
counter=1
# loop through the files
for (i in filenames){
    r = raster(i)
    test <- as.data.frame(r, na.rm=TRUE)
    numrow2<-nrow(test)
    final.df[counter:counter+numrow2,]<-test
    counter<-counter+numrow2+1
}

final.df[counter-1:nrow(final.df),]<-NULL  ## remove empty rows

# write the csv
write.csv(final.df,"final-filename.csv")

Hope it helps.

回答2:

I've been working with Prism data and here is another way to do it, below. This is if you are okay with merging each of the "stations" or row names across your 7,000 .bil files. Each month, in this case, would be a separate column, corresponding to the same station ID/row.

setwd("/.../Prism Weather Data All/")
require(dplyr)
require(raster)

#This makes sure only .bil is read (not asc.bil, etc)

filenames <- dir("/.../Prism Weather Data All/", pattern = "\\.bil$")

z <- as.data.frame(matrix(NA)) 

#loop through the data, and name each column the name of the date in 
#the spreadsheet (according to Prism's naming convention, the date 
#starts at character 24 and ends at character 29)

for (file in filenames){
  r <- raster(filenames)
  test <- as.data.frame(r, na.rm=TRUE, row.names=TRUE, col.names=FALSE)
  names(test)<- c(substring(file, 24, 29))
  z <- cbind(z, test)
}

#then export the data.frame to CSV!

来源：https://stackoverflow.com/questions/27222028/merge-multiple-bil-climate-data-into-csv

标签

dataframe

raster