问题
I've been working on loading KML files into R to make web maps with Leaflet/Shiny. The import is pretty simple (using this sample KML):
library(rgdal)
sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])
In this example, ogrListLayers
pulls in all of the kml layers, and I subset only the first element/layer. Easy peasy.
The problem is that using this method to read KML layers only pulls in two fields: "Name" and "Description," as seen below:
> sampleKml <- readOGR("D:/KML_Samples.kml", layer = ogrListLayers("D:/KML_Samples.kml")[1])
OGR data source with driver: KML
Source: "D:/KML_Samples.kml", layer: "Placemarks"
with 3 features
It has 2 fields
> sampleKml@data
Name Description
1 Simple placemark Attached to the ground. Intelligently places itself at the height of the underlying terrain.
2 Floating placemark Floats a defined distance above the ground.
3 Extruded placemark Tethered to the ground by a customizable "tail"
So R reads the KML layer as a SpatialPointsDataFrame with 3 features (3 different points) and two fields (the columns). However, when I pull the layer into QGIS and read its attribute table, there are many fields in addition to Name and Description, seen here.
From what I can tell, 'name' and 'description' are KML Placemarks, and any additional data are considered ExtendedData. I want to pull import this extended data along with the placemark data.
Is there a way to pull ALL of these KML layer fields/attributes into R? Preferably with readOGR()
, but I'm open to all suggestions.
回答1:
TL;DR
The underlying problem is the missing library LibKML for windows. My solution is extracting the data directly from the KML via a function.
Problem
I ran into the same problem and after some googling it appears that this has something to do with LibKML and Windows. Executing the same code on my Ubuntu machine yielded different results, namely the ExtendedData was retrieved when loading the saved KML file.
library(rgdal)
library(dplyr)
poly_df<-data.frame(x=c(1,1,0,0),y=c(1,0,0,1))
poly<-poly_df %>%
Polygon %>%
list %>%
Polygons(ID="1") %>%
list %>%
SpatialPolygons(proj4string = CRS("+init=epsg:4326")) %>%
SpatialPolygonsDataFrame(data=data.frame(test="this is a test"))
writeOGR(poly,"test.kml",driver="KML",layer="poly")
poly2<-readOGR("test.kml")
poly2@data
If one would manage to build LibKML [1], s/he would be able to load KML files with the ExtendedData [2].
On Windows the LibKML needs to be build with Visual Studio 2005 [1]. This Visual Studio version is not supported anymore [3]. In [3] user2889419 supplies the link to the 2005 version.
I downloaded and installed the version but building LibKML eventually failed with a lot of errors and warnings (certain files do not exist). This is were I stopped because I am way out of my comfort zone but wanted to share the results of my chase.
Solution in R
My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.
I had some troubles extracting the nodes from the KML files at first because I was not aware of the concept of namespaces [4]. (Edited the following function because I ran into troubles with KML files of other origins.)
readKML <- function(file,keep_name_description=FALSE,layer,...) {
# Set keep_name_description = TRUE to keep "Name" and "Description" columns
# in the resulting SpatialPolygonsDataFrame. Only works when there is
# ExtendedData in the kml file.
sp_obj<-readOGR(file,layer,...)
xml1<-read_xml(file)
if (!missing(layer)) {
different_layers <- xml_find_all(xml1, ".//d1:Folder")
layer_names <- different_layers %>%
xml_find_first(".//d1:name") %>%
xml_contents() %>%
xml_text()
selected_layer <- layer_names==layer
if (!any(selected_layer)) stop("Layer does not exist.")
xml2 <- different_layers[selected_layer]
} else {
xml2 <- xml1
}
# extract name and type of variables
variable_names1 <-
xml_find_first(xml2, ".//d1:ExtendedData") %>%
xml_children()
while(variable_names1 %>%
xml_attr("name") %>%
is.na() %>%
any()&variable_names1 %>%
xml_children() %>%
length>0) variable_names1 <- variable_names1 %>%
xml_children()
variable_names <- variable_names1 %>%
xml_attr("name") %>%
unique()
# return sp_obj if no ExtendedData is present
if (is.null(variable_names)) return(sp_obj)
data1 <- xml_find_all(xml2, ".//d1:ExtendedData") %>%
xml_children()
while(data1 %>%
xml_children() %>%
length>0) data1 <- data1 %>%
xml_children()
data <- data1 %>%
xml_text() %>%
matrix(.,ncol=length(variable_names),byrow = TRUE) %>%
as.data.frame()
colnames(data) <- variable_names
if (keep_name_description) {
sp_obj@data <- data
} else {
try(sp_obj@data <- cbind(sp_obj@data,data),silent=TRUE)
}
sp_obj
}
Old: extracting via ReadLines
My solution is to read the KML directly and then extract the ExtendedData while loading the Spatial Object via rgdal's readOGR. My assumption is that readOGR starts on top of the file as does my extraction routine. Both are then merged and the output is a SpatialPolygonsDataFrame.
library(tidyverse)
library(rgdal)
readKML<-function(file,keep_name_description=FALSE,...) {
# Set keep_name_description = TRUE to keep "Name" and "Description" columns
# in the resulting SpatialPolygonsDataFrame. Only works when there is
# ExtendedData in the kml file.
if (!grepl("\\.kml$",file)) stop("File is not a KML file.")
if (!file.exists(file)) stop("File does not exist.")
map<-readOGR(file,...)
f1<-readLines(file)
# get positions of ExtendedData in document
exdata_position<-grep("ExtendedData",f1) %>%
matrix(ncol=2,byrow = TRUE) %>%
apply(1,function(x) {
pos<-x[1]:x[2]
pos[2:(length(pos)-1)]
}) %>%
t %>%
as.data.frame
# if there is no ExtendedData return SpatialPolygonsDataFrame
if (ncol(exdata_position)==0) return(map)
# Get Name of different columns
extract1<-f1[exdata_position[1,] %>%
unlist]
names_of_data<-extract1 %>%
strsplit("name=\"") %>%
lapply(function(x) strsplit(x[[2]],split="\"") ) %>%
unlist(recursive = FALSE) %>%
lapply(function(x) return(x[1])) %>%
unlist
# Extract Extended Data
dat<-lapply(seq(nrow(exdata_position)),function(x) {
extract2<-f1[exdata_position[x,] %>%
unlist]
extract2 %>%
strsplit(">") %>%
lapply(function(x) strsplit(x[[2]],split="<") ) %>% unlist(recursive = FALSE) %>%
lapply(function(x) return(x[1])) %>%
unlist %>%
matrix(nrow=1) %>%
as.data.frame
}) %>%
do.call(rbind,.)
# Rename columns
colnames(dat)<-names_of_data
# Check if Name and Description should be dropped
if (keep_name_description) {
map@data<-cbind(map@data,dat)
} else {
map@data<-dat
}
map
}
[1] https://github.com/google/libkml/wiki/Building-and-installing-libkml
[2] https://github.com/r-spatial/sf/issues/499
[3] Where to download visual studio express 2005?
[4] Parsing XML in R: Incorrect namespaces
来源:https://stackoverflow.com/questions/45989198/how-to-load-all-fields-extendeddata-not-just-name-and-description-from-kml