Find all strings in python code files

后端 未结 6 1128
清歌不尽
清歌不尽 2021-01-11 23:26

I would like to list all strings within my large python project.

Imagine the different possibilities to create a string in python:

mystring = \"hello         


        
6条回答
  •  时光说笑
    2021-01-11 23:36

    unwind's suggestion of using the ast module in 2.6 is a good one. (There's also the undocumented _ast module in 2.5.) Here's example code for that

    code = """a = 'blah'
    b = '''multi
    line
    string'''
    c = u"spam"
    """
    
    import ast
    root = ast.parse(code)
    
    class ShowStrings(ast.NodeVisitor):
      def visit_Str(self, node):
        print "string at", node.lineno, node.col_offset, repr(node.s)
    
    show_strings = ShowStrings()
    show_strings.visit(root)
    

    The problem is multiline strings. If you run the above you'll get.

    string at 1 4 'blah'
    string at 4 -1 'multi\nline\nstring'
    string at 5 4 u'spam'
    

    You see that it doesn't report the start of the multiline string, only the end. There's no good solution for that using the builtin Python tools.

    Another option is that you can use my 'python4ply' module. This is a grammar definition for Python for PLY, which is a parser generator. Here's how you might use it:

    import compiler
    import compiler.visitor
    
    # from python4ply; requires the ply parser generator
    import python_yacc
    
    code = """a = 'blah'
    b = '''multi
    line
    string'''
    c = u"spam"
    d = 1
    """
    
    tree = python_yacc.parse(code, "")
    #print tree
    
    class ShowStrings(compiler.visitor.ASTVisitor):
        def visitConst(self, node):
            if isinstance(node.value, basestring):
                print "string at", node.lineno, repr(node.value)
    
    visitor = ShowStrings()
    compiler.walk(tree, visitor)
    

    The output from this is

    string at 1 'blah'
    string at 2 'multi\nline\nstring'
    string at 5 u'spam'
    

    There's no support for column information. (There is some mostly complete commented out code to support that, but it's not fully tested.) Then again, I see you don't need it. It also means working with Python's 'compiler' module, which is clumsier than the AST module.

    Still, with a 30-40 lines of code you should have exactly what you want.

提交回复
热议问题