I have been provided with some customer data in Latitude, Longitude, and Counts format. All the data I need to create a ggplot heatmap is present, but I do not know how to put it into the format ggplot requires.
I am trying to aggregate the data by total counts within 0.01 Lat and 0.01 Lon blocks (typical heatmap), and I instinctively thought "tapply". This creates a nice summary by block size, as desired, but the format is wrong. Furthermore, I would really like to have empty Lat or Lon block values be included as zeroes, even if there is nothing there... otherwise the heatmap ends up looking streaky and odd.
Your help is greatly appreciated.
I have created a subset of my data for your reference in the code below:
# m is the matrix of data provided
m = matrix(c(44.9591051,44.984884,44.984884,44.9811399,
44.9969096,44.990894,44.9797023,44.983334,
-93.3120017,-93.297668,-93.297668,-93.2993524,
-93.2924484,-93.282462,-93.2738911,-93.26667,
69,147,137,22,68,198,35,138), nrow=8, ncol=3)
colnames(m) <- c("Lat", "Lon", "Count")
m <- as.data.frame(m)
s = as.data.frame((tapply(m$Count, list(round(m$Lon,2), round(m$Lat,2)), sum)))
s[is.na(s)] <- 0
# Data frame "s" has all the data, but not exactly in the format desired...
# First, it has a column for each latitude, instead of one column for Lon
# and one for Lat, and second, it needs to have 0 as the entry data for
# Lat / Lon pairs that have no other data. As it is, there are only zeroes
# when one of the other entries has a Lat or Lon that matches... if there
# are no entries for a particular Lat or Lon value, then nothing at all is
# reported.
desired.format = matrix(c(44.96,44.96,44.96,44.96,44.96,
44.97,44.97,44.97,44.97,44.97,44.98,44.98,44.98,
44.98,44.98,44.99,44.99,44.99,44.99,44.99,45,45,
45,45,45,-93.31,-93.3,-93.29,-93.28,-93.27,-93.31,
-93.3,-93.29,-93.28,-93.27,-93.31,-93.3,-93.29,
-93.28,-93.27,-93.31,-93.3,-93.29,-93.28,-93.27,
-93.31,-93.3,-93.29,-93.28,-93.27,69,0,0,0,0,0,0,
0,0,0,0,306,0,0,173,0,0,0,198,0,0,0,68,0,0),
nrow=25, ncol=3)
colnames(desired.format) <- c("Lat", "Lon", "Count")
desired.format <- as.data.frame(desired.format)
minneapolis = get_map(location = "minneapolis, mn", zoom = 12)
ggmap(minneapolis) + geom_tile(data = desired.format, aes(x = Lon, y = Lat, alpha = Count), fill="red")
Here is a stab with geom_hex and stat_density2d. The idea of making bins by truncating coordinates makes me a bit uneasy.
What you have is count data, with lat/longs given, which means ideally you would need a weight parameter, but that is as far as I know not implemented with geom_hex. Instead, we hack it by repeating rows per the count variable, similar to the approach here.
## hack job to repeat records to full count
m<-as.data.frame(m)
m_long <- with(m, m[rep(1:nrow(m), Count),])
## stat_density2d
ggplot(m_long, aes(Lat, Lon)) +
stat_density2d(aes(alpha=..level.., fill=..level..), size=2,
bins=10, geom=c("polygon","contour")) +
scale_fill_gradient(low = "blue", high = "red") +
geom_density2d(colour="black", bins=10) +
geom_point(data = m_long)
## geom_hex alternative
bins=6
ggplot(m_long, aes(Lat, Lon)) +
geom_hex(bins=bins)+
coord_equal(ratio = 1/1)+
scale_fill_gradient(low = "blue", high = "red") +
geom_point(data = m_long,position = "jitter")+
stat_binhex(aes(label=..count..,size=..count..*.5), size=3.5,geom="text", bins=bins, colour="white")
These, respectively, produce the following:
And the binned version:EDIT:
With basemap:
map +
stat_density2d(data = m_long, aes(x = Lon, y = Lat,
alpha=..level.., fill=..level..),
size=2,
bins=10,
geom=c("polygon","contour"),
inherit.aes=FALSE) +
scale_fill_gradient(low = "blue", high = "red") +
geom_density2d(data = m_long, aes(x = Lon, y=Lat),
colour="black", bins=10,inherit.aes=FALSE) +
geom_point(data = m_long, aes(x = Lon, y=Lat),inherit.aes=FALSE)
## and the hexbin map...
map + #ggplot(m_long, aes(Lat, Lon)) +
geom_hex(bins=bins,data = m_long, aes(x = Lon, y = Lat),alpha=.5,
inherit.aes=FALSE) +
geom_point(data = m_long, aes(x = Lon, y=Lat),
inherit.aes=FALSE,position = "jitter")+
scale_fill_gradient(low = "blue", high = "red")
来源:https://stackoverflow.com/questions/24600513/summarizing-latitude-longitude-and-counts-data-for-ggplot-usage