I have a geoJSON database with lots of polygons (census tracts specifically) and I have lots of long,lat points.
I am hoping that there would exist an efficient Python c
One cannot have really fast geometric code in Python. Instead the usual approach is to use fast C/C++ library with Python wrappers.
For example, you can start with CGAL - a very comprehensive C++ geometric library. It has Python bindings for most of its routines, see the link http://code.google.com/p/cgal-bindings/.
I found an interesting article describing how to do exactly what you are looking to do.
TL;DR: Use Shapely
You will find this code at the end of the article:
import json
from shapely.geometry import shape, Point
# depending on your version, use: from shapely.geometry import shape, Point
# load GeoJSON file containing sectors
with open('sectors.json') as f:
js = json.load(f)
# construct point based on lon/lat returned by geocoder
point = Point(-122.7924463, 45.4519896)
# check each polygon to see if it contains the point
for feature in js['features']:
polygon = shape(feature['geometry'])
if polygon.contains(point):
print 'Found containing polygon:', feature
A great option for working with these types of data is PostGIS, a spatial database extender for PostgreSQL. I personally keep all of my geo data in a PostGIS database, and then reference it in python using psycopg2
. I know it's not pure python, but it's got unbelievable performance benefits (discussed below) over pure python.
PostGIS has functionality built in to determine if a point or shape is within another shape. The good documentation on the ST_Within function expands upon this simple example:
SELECT
ST_WITHIN({YOUR_POINT},boundary)
FROM census;
-- returns true or false for each of your tracts
The benifit you'll gain from PostGIS that you likely won't achieve elsewhere is indexing, which can improve your speed 1,000x [1], making it better than even the best written C program (unless the C program also creates an index for your data). The database, when properly setup, will cache information about your tracts, and when you ask if a point is within a tract, it won't have to search everything... it can take advantage of it's index.
Getting data into and out of PostGRES is pretty simple. A great tutorial that will walk you through the basics of PostGIS with sample datasets not too different from yours can be found here. It's reasonably long, but if you're new to PostGIS (as I was), you'll be very entertained and excited the entire time:
http://workshops.boundlessgeo.com/postgis-intro/
[1] Indexing decreased a nearest neighbor search in one of my huge databases (20 m from 53 seconds to 8.2 milliseconds.