问题
I am working on running RandomForest. I've imported point data representing used and unused sites and created a raster stack from raster GIS layers. I've created a SpatialPointDataFrame with all of my used and unused points with their underlying raster values attached.
require(sp)
require(rgdal)
require(raster)
#my raster stack
xvariables <- stack(rlist) #rlist = a list of raster layers
# Reading in the spatial used and unused points.
ldata <- readOGR(dsn=paste(path, "DATA", sep="/"), layer=used_avail)
str(Ldata@data)
#Attach raster values to point data.
v <- as.data.frame(extract(xvariables, ldata))
ldata@data = data.frame(ldata@data, v[match(rownames(ldata@data), rownames(v)),])
Next I plan to run a Random Forest using this data. The problem is, I have a very large data set (over 40,000 data points). I need to sub sample my data but I am having a really hard time figuring out how to do this. I've tried using the sample() function but I think that because I have a SpatialPointsDataFram it wont work? I'm new to R and would really appreciate any ideas.
Thanks!
回答1:
Subsetting a Spatial*DataFrame object is fairly simple; just use
spSubset <- spObject[<sample_criterion>,]
Here's an example where we load all the colleges in NJ, then grab a random sample of size=20. There's also an example where we load all the US states and grab just "New Jersey".
library(rgdal)
set.seed(1)
# random sample of NJ colleges...
sampleSize=20
spPoints <- readOGR(dsn=".",layer="NJ_College_Univ_NAD83njsp")
spSample <- spPoints[sample(1:length(spPoints),sampleSize),]
# extract NJ from US States TIGER/Line file
states <- readOGR(dsn=".",layer="tl_2013_us_state")
NJ <- states[states$NAME=="New Jersey",]
NJ <- spTransform(NJ,CRS=CRS(proj4string(spSample)))
# render the map
NJ.df <- fortify(NJ)
library(ggplot2)
ggplot() +
geom_path(data=NJ.df, aes(x=long,y=lat, group=group))+
geom_point(data=as.data.frame(coordinates(spPoints)),
aes(x=coords.x1,y=coords.x2),colour="blue", size=3)+
geom_point(data=as.data.frame(coordinates(spSample)),
aes(x=coords.x1,y=coords.x2),colour="red", size=3)+
coord_fixed() + labs(x="", y="") + theme(axis.text=element_blank())
The TIGER/Line file of US States can be found here. The NJ college shapefile is here.
来源:https://stackoverflow.com/questions/21488825/how-can-i-subsample-a-spatialpointsdataframe-in-r