Is Python 'sys.argv' limited in the maximum number of arguments?

社会主义新天地 提交于 2019-12-06 07:05:36

问题


I have a Python script that needs to process a large number of files. To get around Linux's relatively small limit on the number of arguments that can be passed to a command, I am using find -print0 with xargs -0.

I know another option would be to use Python's glob module, but that won't help when I have a more advanced find command, looking for modification times, etc.

When running my script on a large number of files, Python only accepts a subset of the arguments, a limitation I first thought was in argparse, but appears to be in sys.argv. I can't find any documentation on this. Is it a bug?

Here's a sample Python script illustrating the point:

import argparse
import sys
import os

parser = argparse.ArgumentParser()
parser.add_argument('input_files', nargs='+')
args = parser.parse_args(sys.argv[1:])

print 'pid:', os.getpid(), 'argv files', len(sys.argv[1:]), 'argparse files:', len(args.input_files)

I have a lot of files to run this on:

$ find ~/ -name "*" -print0 | xargs -0 ls > filelist
748709 filelist

But it appears xargs or Python is chunking my big list of files and processing it with several different Python runs:

$ find ~/ -name "*" -print0 | xargs -0 python test.py
pid: 4216 argv files 1819 number of files: 1819
pid: 4217 argv files 1845 number of files: 1845
pid: 4218 argv files 1845 number of files: 1845
pid: 4219 argv files 1845 number of files: 1845
pid: 4220 argv files 1845 number of files: 1845
pid: 4221 argv files 1845 number of files: 1845
...

Why are multiple processes being created to process the list? Why is it being chunked at all? I don't think there are newlines in the file names and shouldn't -print0 and -0 take care of that issue? If there were newlines, I'd expect sed -n '1810,1830p' filelist to show some weirdness for the above example. What gives?

I almost forgot:

$ python -V
Python 2.7.2+

回答1:


xargs will chunk your arguments by default. Have a look at the --max-args and --max-chars options of xargs. Its man page also explains the limits (under --max-chars).




回答2:


Everything that you want from find is available from os.walk.

Don't use find and the shell for any of this.

Use os.walk and write all your rules and filters in Python.

"looking for modification times" means that you'll be using os.stat or some similar library function.




回答3:


Python does not seem to place a limit on the number of arguments but the operating system does.

Have a look here for a more comprehensive discussion.




回答4:


xargs will pass as much as it can, but there's still a limit. For instance,

find ~/ -name "*" -print0 | xargs -0 wc -l | grep total

will give you multiple lines of output.

You probably want to have your script either take a file containing a list of filenames, or accept filenames on its stdin.



来源:https://stackoverflow.com/questions/9103023/is-python-sys-argv-limited-in-the-maximum-number-of-arguments

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!