How can I check on runtime that a python module is valid without importing it?

白昼怎懂夜的黑 提交于 2019-12-10 19:23:06

问题


I have a package containing subpackages only one of which I need imported during runtime - but I need to test they are valid. Here is my folder structure:

game/
 __init__.py
 game1/
   __init__.py
   constants.py
   ...
 game2/
   __init__.py
   constants.py
   ...

For now the code that runs on boot does:

import pkgutil
import game as _game
# Detect the known games
for importer,modname,ispkg in pkgutil.iter_modules(_game.__path__):
    if not ispkg: continue # game support modules are packages
    # Equivalent of "from game import <modname>"
    try:
        module = __import__('game',globals(),locals(),[modname],-1)
    except ImportError:
        deprint(u'Error in game support module:', modname, traceback=True)
        continue
    submod = getattr(module,modname)
    if not hasattr(submod,'fsName') or not hasattr(submod,'exe'): continue
    _allGames[submod.fsName.lower()] = submod

but this has the disadvantage that all the subpackages are imported, importing the other modules in the subpackage (such as the constants.py etc) which amounts to some few magabytes of garbage. So I want to substitute this code with a test that the submodules are valid (they would import fine). I guess I should be using eval somehow - but how ? Or what should I do ?

EDIT: tldr;

I am looking for an equivalent to the core of the loop above:

    try:
        probaly_eval(game, modname) # fails iff `from game import modname` fails
        # but does _not_ import the module
    except: # I'd rather have a more specific error here but methinks not possible
        deprint(u'Error in game support module:', modname, traceback=True)
        continue

So I want a clear answer if an exact equivalent to the import statement vis a vis error checking exists - without importing the module. That's my question, a lot of answerers and commenters answered different questions.


回答1:


Maybe you're looking for the py_compile or compileall modules.
Here the documentation:
https://docs.python.org/2/library/py_compile.html
https://docs.python.org/2/library/compileall.html#module-compileall

You can load the one you want as a module and call it from within your program.
For example:

import py_compile

try:
    py_compile.compile(your_py_file, doraise=True)
    module_ok = True
except py_compile.PyCompileError:
    module_ok = False



回答2:


If you want to compile the file without importing it (in current interpreter), you may use py_compile.compile as:

>>> import py_compile

# valid python file
>>> py_compile.compile('/path/to/valid/python/file.py')

# invalid python file
>>> py_compile.compile('/path/to/in-valid/python/file.txt')
Sorry: TypeError: compile() expected string without null bytes

Above code writes the error to std.error. In case you want to raise the exception, you will have to set doraise as True (default False). Hence, your code will be:

from py_compile import compile, PyCompileError

try:
    compile('/path/to/valid/python/file.py', doraise=True)
    valid_file = True
except PyCompileError:
    valid_file = False

As per the py_compile.compile's documents:

Compile a source file to byte-code and write out the byte-code cache file. The source code is loaded from the file named file. The byte-code is written to cfile, which defaults to file + 'c' ('o' if optimization is enabled in the current interpreter). If dfile is specified, it is used as the name of the source file in error messages instead of file. If doraise is true, a PyCompileError is raised when an error is encountered while compiling file. If doraise is false (the default), an error string is written to sys.stderr, but no exception is raised.

Check to make sure the compiled module is not imported (in current interpreter):

>>> import py_compile, sys
>>> py_compile.compile('/path/to/main.py')

>>> print [key for key in locals().keys() if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['py_compile', 'sys']  # main not present



回答3:


You can't really do what you want efficiently. In order to see if a package is "valid", you need to run it -- not just check if it exists -- because it could have errors or unmet dependencies.

Using the pycompile and compileall will only test if you can compile a python file, not import a module. There is a big difference between the two.

  1. That approach means you know the actual file-structure of the modules -- import foo could represent /foo.py or /foo/__init__.py.
  2. That approach doesn't guarantee the module is in your interpreter's pythonpath or is the module your interpreter would load. Things will get tricky if you have multiple versions in /site-packages/ or python is looking in one of the many possible places for a module.
  3. Just because your file "compiles" doesn't mean it will "run". As a package it could have unmet dependences or even raise errors.

Imagine this is your python file:

 from makebelieve import nothing
 raise ValueError("ABORT")

The above will compile, but if you import them... it will raise an ImportError if you don't have makebelieve installed and will raise a ValueError if you do.

My suggestions are:

  1. import the package then unload the modules. to unload them, just iterate over stuff in sys.modules.keys()​​​. if you're worried about external modules that are loaded, you could override import to log what your packages load. An example of this is in a terrible profiling package i wrote: https://github.com/jvanasco/import_logger [I forgot where I got the idea to override import from. Maybe celery?] As some noted, unloading modules is entirely dependent on the interpreter -- but pretty much every option you have has many drawbacks.

  2. Use subprocesses to spin up a new interpreter via popen. ie popen('python', '-m', 'module_name'). This would have a lot of overhead if you do this to every needed module (an overhead of each interpreter and import), but you could write a ".py" file that imports everything you need and just try to run that. In either case, you would have to analyze the output -- as importing a "valid" package could cause acceptable errors during execution. i can't recall if the subprocess inherits your environment vars or not , but I believe it does. The subprocess is an entirely new operating system process/interpreter, so the modules will be loaded into that short-lived processes' memory.clarified answer.




回答4:


I believe imp.find_module satisfies at least some of your requirements: https://docs.python.org/2/library/imp.html#imp.find_module

A quick test shows that it does not trigger an import:

>>> import imp
>>> import sys
>>> len(sys.modules)
47
>>> imp.find_module('email')
(None, 'C:\\Python27\\lib\\email', ('', '', 5))
>>> len(sys.modules)
47
>>> import email
>>> len(sys.modules)
70

Here's an example usage in some of my code (which attempts to classify modules): https://github.com/asottile/aspy.refactor_imports/blob/2b9bf8bd2cf22ef114bcc2eb3e157b99825204e0/aspy/refactor_imports/classify.py#L38-L44




回答5:


We already had a custom importer (disclaimer: I did not write that code I 'm just the current maintainer) whose load_module:

def load_module(self,fullname):
    if fullname in sys.modules:
        return sys.modules[fullname]
    else: # set to avoid reimporting recursively
        sys.modules[fullname] = imp.new_module(fullname)
    if isinstance(fullname,unicode):
        filename = fullname.replace(u'.',u'\\')
        ext = u'.py'
        initfile = u'__init__'
    else:
        filename = fullname.replace('.','\\')
        ext = '.py'
        initfile = '__init__'
    try:
        if os.path.exists(filename+ext):
            with open(filename+ext,'U') as fp:
                mod = imp.load_source(fullname,filename+ext,fp)
                sys.modules[fullname] = mod
                mod.__loader__ = self
        else:
            mod = sys.modules[fullname]
            mod.__loader__ = self
            mod.__file__ = os.path.join(os.getcwd(),filename)
            mod.__path__ = [filename]
            #init file
            initfile = os.path.join(filename,initfile+ext)
            if os.path.exists(initfile):
                with open(initfile,'U') as fp:
                    code = fp.read()
                exec compile(code, initfile, 'exec') in mod.__dict__
        return mod
    except Exception as e: # wrap in ImportError a la python2 - will keep
        # the original traceback even if import errors nest
        print 'fail', filename+ext
        raise ImportError, u'caused by ' + repr(e), sys.exc_info()[2]

So I thought I could replace the parts that access the sys.modules cache with overriddable methods that would in my override leave that cache alone:

So:

@@ -48,2 +55,2 @@ class UnicodeImporter(object):
-        if fullname in sys.modules:
-            return sys.modules[fullname]
+        if self._check_imported(fullname):
+            return self._get_imported(fullname)
@@ -51 +58 @@ class UnicodeImporter(object):
-            sys.modules[fullname] = imp.new_module(fullname)
+            self._add_to_imported(fullname, imp.new_module(fullname))
@@ -64 +71 @@ class UnicodeImporter(object):
-                    sys.modules[fullname] = mod
+                    self._add_to_imported(fullname, mod)
@@ -67 +74 @@ class UnicodeImporter(object):
-                mod = sys.modules[fullname]
+                mod = self._get_imported(fullname)

and define:

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = {}

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _get_imported(self, fullname):
        try:
            return sys.modules[fullname]
        except KeyError:
            return self._modules_to_discard[fullname]

    def _add_to_imported(self, fullname, mod):
        self._modules_to_discard[fullname] = mod

    @classmethod
    def cleanup(cls):
        cls._modules_to_discard.clear()

Then I added the importer in the sys.meta_path and was good to go:

importer = sys.meta_path[0]
try:
    if not hasattr(sys,'frozen'):
        sys.meta_path = [fake_importer()]
    perform_the_imports() # see question
finally:
    fake_importer.cleanup()
    sys.meta_path = [importer]

Right ? Wrong!

Traceback (most recent call last):
  File "bash\bush.py", line 74, in __supportedGames
    module = __import__('game',globals(),locals(),[modname],-1)
  File "Wrye Bash Launcher.pyw", line 83, in load_module
    exec compile(code, initfile, 'exec') in mod.__dict__
  File "bash\game\game1\__init__.py", line 29, in <module>
    from .constants import *
ImportError: caused by SystemError("Parent module 'bash.game.game1' not loaded, cannot perform relative import",)

Huh ? I am currently importing that very same module. Well the answer is probably in import's docs

If the module is not found in the cache, then sys.meta_path is searched (the specification for sys.meta_path can be found in PEP 302).

That's not completely to the point but what I guess is that the statement from .constants import * looks up the sys.modules to check if the parent module is there, and I see no way of bypassing that (note that our custom loader is using the builtin import mechanism for modules, mod.__loader__ = self is set after the fact).

So I updated my FakeImporter to use the sys.modules cache and then clean that up.

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = set()

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _add_to_imported(self, fullname, mod):
        super(FakeUnicodeImporter, self)._add_to_imported(fullname, mod)
        self._modules_to_discard.add(fullname)

    @classmethod
    def cleanup(cls):
        for m in cls._modules_to_discard: del sys.modules[m]

This however blew in a new way - or rather two ways:

  • a reference to the game/ package was held in bash top package instance in sys.modules:

    bash\
      __init__.py
      the_code_in_question_is_here.py
      game\
        ...
    

    because game is imported as bash.game. That reference held references to all game1, game2,..., subpackages so those were never garbage collected

  • a reference to another module (brec) was held as bash.brec by the same bash module instance. This reference was imported as from .. import brec in game\game1 without triggering an import, to update SomeClass. However, in yet another module, an import of the form from ...brec import SomeClass did trigger an import and another instance of the brec module ended up in the sys.modules. That instance had a non updated SomeClass and blew with an AttributeError.

Both were fixed by manually deleting those references - so gc collected all modules (for 5 mbytes of ram out of 75) and the from .. import brec did trigger an import (this from ... import foo vs from ...foo import bar warrants a question).

The moral of the story is that it is possible but:

  • the package and subpackages should only reference each other
  • all references to external modules/packages should be deleted from top level package attributes
  • the package reference itself should be deleted from top level package attribute

If this sounds complicated and error prone it is - at least now I have a much cleaner view of interdependencies and their perils - time to address that.


This post was sponsored by Pydev's debugger - I found the gc module very useful in grokking what was going on - tips from here. Of course there were a lot of variables that were the debugger's and that complicated stuff



来源:https://stackoverflow.com/questions/41897470/how-can-i-check-on-runtime-that-a-python-module-is-valid-without-importing-it

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!