How can I subsample a SpatialPointsDataFrame in R

断了今生、忘了曾经 提交于 2020-08-09 05:25:27

问题


I am working on running RandomForest. I've imported point data representing used and unused sites and created a raster stack from raster GIS layers. I've created a SpatialPointDataFrame with all of my used and unused points with their underlying raster values attached.

require(sp)
require(rgdal)
require(raster)

#my raster stack
xvariables <- stack(rlist)  #rlist = a list of raster layers   

# Reading in the spatial used and unused points.
ldata <- readOGR(dsn=paste(path, "DATA", sep="/"), layer=used_avail)
str(Ldata@data)


#Attach raster values to point data.
v <- as.data.frame(extract(xvariables, ldata))
ldata@data = data.frame(ldata@data, v[match(rownames(ldata@data), rownames(v)),])

Next I plan to run a Random Forest using this data. The problem is, I have a very large data set (over 40,000 data points). I need to sub sample my data but I am having a really hard time figuring out how to do this. I've tried using the sample() function but I think that because I have a SpatialPointsDataFram it wont work? I'm new to R and would really appreciate any ideas.

Thanks!


回答1:


Subsetting a Spatial*DataFrame object is fairly simple; just use

spSubset <- spObject[<sample_criterion>,]

Here's an example where we load all the colleges in NJ, then grab a random sample of size=20. There's also an example where we load all the US states and grab just "New Jersey".

library(rgdal)
set.seed(1)
# random sample of NJ colleges...
sampleSize=20
spPoints <- readOGR(dsn=".",layer="NJ_College_Univ_NAD83njsp")
spSample <- spPoints[sample(1:length(spPoints),sampleSize),]

# extract NJ from US States TIGER/Line file
states   <- readOGR(dsn=".",layer="tl_2013_us_state")
NJ       <- states[states$NAME=="New Jersey",]
NJ       <- spTransform(NJ,CRS=CRS(proj4string(spSample)))

# render the map
NJ.df    <- fortify(NJ)
library(ggplot2)
ggplot() +
  geom_path(data=NJ.df, aes(x=long,y=lat, group=group))+
  geom_point(data=as.data.frame(coordinates(spPoints)), 
             aes(x=coords.x1,y=coords.x2),colour="blue", size=3)+
  geom_point(data=as.data.frame(coordinates(spSample)), 
             aes(x=coords.x1,y=coords.x2),colour="red", size=3)+
  coord_fixed() + labs(x="", y="") + theme(axis.text=element_blank())

The TIGER/Line file of US States can be found here. The NJ college shapefile is here.



来源:https://stackoverflow.com/questions/21488825/how-can-i-subsample-a-spatialpointsdataframe-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!