Python: suggestions to improve a chunk-by-chunk code to read several millions of points

这一生的挚爱 提交于 2019-12-19 04:01:02

问题


I wrote a code to read *.las file in Python. *las file are special ascii file where each line is x,y,z value of points.

My function read N. number of points and check if they are inside a polygon with points_inside_poly.

I have the following questions:

  1. When I arrive at the end of the file I get this message: LASException: LASError in "LASReader_GetPointAt": point subscript out of range because the number of points is under the chunk dimension. I cannot figure how to resolve this problem.
  2. a = [file_out.write(c[m]) for m in xrange(len(c))] I use a = in order to avoid video print. Is it correct?
  3. In c = [chunk[l] for l in index] I create a new list c because I am not sure that replacing a new chunk is the smart solution (ex: chunk = [chunk[l] for l in index]).
  4. In a statement if else...else I use pass. Is this the right choice?

Really thank for help. It's important to improve listen suggestions from expertise!!!!

import shapefile
import numpy
import numpy as np
from numpy import nonzero
from liblas import file as lasfile
from shapely.geometry import Polygon
from matplotlib.nxutils import points_inside_poly  


# open shapefile (polygon)
sf = shapefile.Reader(poly)
shapes = sf.shapes()
# extract vertices
verts = np.array(shapes[0].points,float)

# open las file
f = lasfile.File(inFile,None,'r') # open LAS
# read "header"
h = f.header

# create a file where store the points
file_out = lasfile.File(outFile,mode='w',header= h)


chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
    chunk = f[i:i+chunkSize]

    x,y = [],[]

    # extraxt x and y value for each points
    for p in xrange(len(chunk)):
        x.append(chunk[p].x)
        y.append(chunk[p].y)

    # zip all points 
    points = np.array(zip(x,y))
    # create an index where are present the points inside the polygon
    index = nonzero(points_inside_poly(points, verts))[0]

    # if index is not empty do this otherwise "pass"
    if len(index) != 0:
        c = [chunk[l] for l in index] #Is It correct to create a new list or i can replace chunck?
        # save points
        a = [file_out.write(c[m]) for m in xrange(len(c))] #use a = in order to avoid video print. Is it correct?
    else:
        pass #Is It correct to use pass?

f.close()
file_out.close()

code proposed by @Roland Smith and changed by Gianni

f = lasfile.File(inFile,None,'r') # open LAS
h = f.header
# change the software id to libLAS
h.software_id = "Gianni"
file_out = lasfile.File(outFile,mode='w',header= h)
f.close()
sf = shapefile.Reader(poly) #open shpfile
shapes = sf.shapes()
for i in xrange(len(shapes)):
    verts = np.array(shapes[i].points,float)
    inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)]
    for p in inside_points:
        file_out.write(p)
f.close()
file_out.close()

i used these solution: 1) reading f = lasfile.File(inFile,None,'r') and after the read head because i need in the *.las output file 2) close the file 3) i used inside_points = [p for p in lasfile.File(inFile,None,'r') if pnpoly(p.x, p.y, verts)] instead of

with lasfile.File(inFile, None, 'r') as f:
...     inside_points = [p for p in f if pnpoly(p.x, p.y, verts)]
...     

because i always get this error message

Traceback (most recent call last): File "", line 1, in AttributeError: _exit_


回答1:


Regarding (1):

First, why are you using chunks? Just use the lasfile as an iterator (as shown in the tutorial), and process the points one at a time. The following should get write all the points inside the polygon to the output file, by using the pnpoly function in a list comprehension instead of points_inside_poly.

from liblas import file as lasfile
import numpy as np
from matplotlib.nxutils import pnpoly

with lasfile.File(inFile, None, 'r') as f:
    inside_points = (p for p in f if pnpoly(p.x, p.y, verts))
    with lasfile.File(outFile,mode='w',header= h) as file_out:
        for p in inside_points:
            file_out.write(p)

The five lines directly above should replace the whole big for-loop. Let's go over them one-by-one:

  • with lasfile.File(inFile...: Using this construction means that the file will be closed automatically when the with block finishes.
  • Now comes the good part, the generator expression that does all the work (the part between ()). It iterates over the input file (for p in f). Every point that is inside the polygon (if pnpoly(p.x, p.y, verts)) is added to the generator.
  • We use another with block for the output file
  • and all the points (for p in inside_points, this is were the generator is used)
  • are written to the output file (file_out.write(p))

Because this method only adds the points that are inside the polygon to the list, you don't waste memory on points that you don't need!

You should only use chunks if the method shown above doesn't work. When using chunks you should handle the exception properly. E.g:

from liblas import LASException

chunkSize = 100000
for i in xrange(0,len(f), chunkSize):
    try:
        chunk = f[i:i+chunkSize]
    except LASException:
        rem = len(f)-i
        chunk = f[i:i+rem]

Regarding (2): Sorry, but I fail to understand what you are trying to accomplish here. What do you mean by "video print"?

Regarding (3): since you are not using the original chunk anymore, you can re-use the name. Realize that in python a "variable" is just a nametag.

Regarding (4): you aren't using the else, so leave it out completely.



来源:https://stackoverflow.com/questions/12769353/python-suggestions-to-improve-a-chunk-by-chunk-code-to-read-several-millions-of

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!