问题
I am trying to use the Hadoop GIS Framework, in order to add Spatial support to hive. One of the things I want to do is to create a spatial table from external data (from PostGIS). Unfortunately, the serializer provided by ESRI maps to a ESRI JSON format, rather than standards such as WKT, GeoJSON. What I ended up doing, was a bit of a workaround.
The first thing, was to export my PostGIS data as a tab separated file, transforming the geometric field into GeoJSON.
\COPY (select id, ST_AsGeoJSON(geom) from grid_10) TO '/tmp/grid_10.geojson';
Then I put it somewhere in the S3 filesystem, and loaded it using the csv serializer. It created a table with two fields: and integer, and text (containing GeoJSON).
CREATE EXTERNAL TABLE grid_10 (id bigint, json STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION 's3://some-url/data/grids/geojson';
I can generate geometry correctly from this GeoJSON, using this query:
SELECT ID, ST_AsText(ST_GeomFromGeoJSON(json)) from grid_10 limit 3;
Which outputs:
Now I wanted to convert this table into an actual spatial table, where geometry is stored as a BLOB, rather than some text. I did it with the following query:
create table new_grid as SELECT ID, ST_GeomFromGeoJSON(json) as geom from grid_10;
To my surprise, the content of this table is the same geometry, repeated over and over.
I tried the same approach - creating a geometry from a WKT/GeoJSON and writing it into a table - with the same results. Is this a bug? Does it mean, I am condemned to perform spatial queries using conversions-on-the-fly, and by the way isn't it much costly in computational terms than manipulating BLOBs?
create table grid_cnt as
SELECT grid_10.id, count(grid_10.id) as ptcnt FROM grid_10 JOIN tweets WHERE ST_Contains(ST_GeomFromGeoJSON(grid_10.json),ST_Point(tweets.longitude, tweets.latitude))=true GROUP BY grid_10.id;
I was wondering if anybody has experienced the same issues.
Update: This problem was happening with Hive 0.11, running on Amazon Hadoop's Distribution 3.3.1. I was also pulling the ESRI jars, from this link:
https://github.com/Esri/gis-tools-for-hadoop/archive/master.zip
When I switched to the jar 2.0, and the latest hive (0.13), the problem disappeared.
You can find my issue report here. Hope this helps someone experiencing the same issues.
回答1:
I went through same issues described by you above...The solution from some expert that I got was to stored your geometry information in wkt i.e. text format instead of geometry format which you have tried.
来源:https://stackoverflow.com/questions/27147274/how-to-load-spatial-data-using-the-hadoop-gis-framework