问题
My df looks like this:
bid ts latitude longitude
1 827566 1999-10-07 42.40944 -88.17822
2 827566 2013-04-11 41.84740 -87.63126
3 1902966 2012-05-02 45.52607 -94.20649
4 1902966 2013-03-25 41.94083 -87.65852
5 3211972 2012-08-14 43.04786 -87.96618
6 3211972 2013-08-02 41.88258 -87.63760
I want to create a new df that calculates the difference in time and distance from each successive point. I would like to calculate down the rows grouped by bid's that are the same. I used the following for loop to accomplish this:
library(geosphere)
lengthdata <- nrow(twopoint)
twopointdata <- data.frame(matrix(ncol = 4, nrow =lengthdata))
x <- c("bid", "time", "d", "dsq")
colnames(twopointdata) <- x
n <- numeric()
n <- 1
for (i in 1:lengthdata)
{
if (twopoint[i+1,1] == twopoint[i,1])
{
twopointdata[n,1] <- twopoint[i+1,1]
twopointdata[n,2] <- as.numeric(twopoint[i+1,5]-twopoint[i,5])
twopointdata[n,3] <- distm(c(twopoint[i+1,10], twopoint[i+1,9]),
c(twopoint[i,10],twopoint[i,9]), fun =
distHaversine)
twopointdata[n,4] <- twopoint[n,3]^2
n <- n+1
}
}
attach(twopointdata)
head(twopointdata)
(some of the column numbers are off because I took out some rows to display more clearly)
My result looks like this:
bid time d dsq
1 827566 4935 77159.8 5.677201e+11
2 1902966 327 660457.0 6.436004e+16
3 3211972 353 132494.8 3.540118e+12
4 3692174 4722 727359.6 6.394166e+16
5 4404655 4833 201644.7 1.092944e+13
6 6644203 4518 210485.9 6.721980e+16
It has the ids for each data point, time difference between each, distance calculated from long and lat, and the squared distance. PROBLEM: it's very slow and eventually i'll be doing this on a very large data set.
I was able to do this without a for loop successfully with the time difference using dplyr like this:
library(dplyr)
library(geosphere)
latlongdata2 <- latlongdata
latlongdata2 %>%
group_by(bid)%>%
transmute(
bid = bid,
t = c(NA,diff(ts)))
I can't figure out how to do this with the latitude and longitude because unlike the ts values they are in two different columns. Anyone have any suggestions?
P.S. the overall aim of the project is to do a mean squared displacement analysis on the data.
回答1:
I think you're overcomplicating it a little. I wish geosphere::distHaversine
had a slightly more intuitive calling method (similar to, say, diff
), but it's not hard to work around it:
dat <- read.table(text = " bid ts latitude longitude
827566 1999-10-07 42.40944 -88.17822
827566 2013-04-11 41.84740 -87.63126
1902966 2012-05-02 45.52607 -94.20649
1902966 2013-03-25 41.94083 -87.65852
3211972 2012-08-14 43.04786 -87.96618
3211972 2013-08-02 41.88258 -87.63760", header = TRUE, stringsAsFactors = FALSE)
dat$ts <- as.Date(dat$ts)
library(dplyr)
library(geosphere)
group_by(dat, bid) %>%
mutate(
d = c(NA,
distHaversine(cbind(longitude[-n()], latitude[-n()]),
cbind(longitude[ -1], latitude[ -1]))),
dts = c(NA, diff(ts))
) %>%
ungroup() %>%
filter( ! is.na(d) )
# # A tibble: 3 × 6
# bid ts latitude longitude d dts
# <int> <date> <dbl> <dbl> <dbl> <dbl>
# 1 827566 2013-04-11 41.84740 -87.63126 77159.35 4935
# 2 1902966 2013-03-25 41.94083 -87.65852 660457.41 327
# 3 3211972 2013-08-02 41.88258 -87.63760 132494.65 353
来源:https://stackoverflow.com/questions/47115848/how-can-i-calculate-the-distance-between-latitude-and-longitude-along-rows-of-co