Find all strings in python code files

后端未结

关注

 6  1145

I would like to list all strings within my large python project.

Imagine the different possibilities to create a string in python:

mystring = \"hello


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2021-01-11 23:35
              
            
            
                                                                       
Python's included tokenize module will also do the trick.

from __future__ import with_statement
import sys
import tokenize

for filename in sys.argv[1:]:
    with open(filename) as f:
        for toktype, tokstr, (lineno, _), _, _ in tokenize.generate_tokens(f.readline):
            if toktype == tokenize.STRING:
                strrepr = repr(eval(tokstr))
                print filename, lineno, strrepr

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-01-11 23:36
              
            
            
                                                                       
unwind's suggestion of using the ast module in 2.6 is a good one. (There's also the undocumented _ast module in 2.5.) Here's example code for that

code = """a = 'blah'
b = '''multi
line
string'''
c = u"spam"
"""

import ast
root = ast.parse(code)

class ShowStrings(ast.NodeVisitor):
  def visit_Str(self, node):
    print "string at", node.lineno, node.col_offset, repr(node.s)

show_strings = ShowStrings()
show_strings.visit(root)


The problem is multiline strings. If you run the above you'll get.

string at 1 4 'blah'
string at 4 -1 'multi\nline\nstring'
string at 5 4 u'spam'


You see that it doesn't report the start of the multiline string, only the end. There's no good solution for that using the builtin Python tools.

Another option is that you can use my 'python4ply' module. This is a grammar definition for Python for PLY, which is a parser generator. Here's how you might use it:

import compiler
import compiler.visitor

# from python4ply; requires the ply parser generator
import python_yacc

code = """a = 'blah'
b = '''multi
line
string'''
c = u"spam"
d = 1
"""

tree = python_yacc.parse(code, "<string>")
#print tree

class ShowStrings(compiler.visitor.ASTVisitor):
    def visitConst(self, node):
        if isinstance(node.value, basestring):
            print "string at", node.lineno, repr(node.value)

visitor = ShowStrings()
compiler.walk(tree, visitor)


The output from this is

string at 1 'blah'
string at 2 'multi\nline\nstring'
string at 5 u'spam'


There's no support for column information. (There is some mostly complete commented out code to support that, but it's not fully tested.) Then again, I see you don't need it. It also means working with Python's 'compiler' module, which is clumsier than the AST module.

Still, with a 30-40 lines of code you should have exactly what you want.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  闹比i        
                
              
                            
                2021-01-11 23:39
              
            
            
                                                                       
If you can do this in Python, I'd suggest starting by looking at the ast (Abstract Syntax Tree) module, and going from there.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2021-01-11 23:46
              
            
            
                                                                       
You may also consider to parse your code with 
pygments.

I don't know the other solution, but it sure is very
simple to use.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2021-01-11 23:52
              
            
            
                                                                       
Are you asking about the I18N utilities in Python?

http://docs.python.org/library/gettext.html#internationalizing-your-programs-and-modules

There's a utility called po-utils (formerly xpot) that can help with this.

http://po-utils.progiciels-bpi.ca/README.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  孤街浪徒        
                
              
                            
                2021-01-11 23:54
              
            
            
                                                                       
Gettext might help you. Put your strings in _(...) structures:

a = _('Test')
b = a
c = _('Another text')


Then run in the shell prompt:

pygettext test.py


You'll get a messages.pot file with the information you need:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR ORGANIZATION
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"POT-Creation-Date: 2009-02-25 08:48+BRT\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
"Generated-By: pygettext.py 1.5\n"


#: teste.py:1
msgid "Test"
msgstr ""

#: teste.py:3
msgid "Another text"
msgstr ""

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复