python regex to remove comments

后端未结

关注

 3  2013

How would I write a regex that removes all comments that start with the # and stop at the end of the line -- but at the same time exclude the first two lines which say

相关标签:

3条回答

不知归路

2020-12-06 15:55
```
sed -e '1,2p' -e '/^\s*#/d' infile
```
Then wrap this in a subprocess.Popen call.

However, this doesn't substitute a real parser! Why would this be of interest? Well, assume this Python script:
```
output = """
This is
#1 of 100"""
```
Boom, any non-parsing solution instantly breaks your script.
0 讨论(0)
发布评论:

提交评论
- 加载中...

独厮守ぢ

2020-12-06 16:03

You can remove comments by parsing the Python code with tokenize.generate_tokens. The following is a slightly modified version of this example from the docs:

import tokenize
import io
import sys
if sys.version_info[0] == 3:
    StringIO = io.StringIO
else:
    StringIO = io.BytesIO

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(StringIO(s).readline)  
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

with open('script.py','r') as f:
    content=f.read()

print(nocomment(content))

For example:

If script.py contains

def foo(): # Remove this comment
    ''' But do not remove this #1 docstring 
    '''
    # Another comment
    pass

then the output of nocomment is

def foo ():
    ''' But do not remove this #1 docstring 
    '''

    pass

0 讨论(0)

故里飘歌

2020-12-06 16:04

I don't actually think this can be done purely with a regex expression, as you'd need to count quotes to ensure that an instance of # isn't inside of a string.

I'd look into python's built-in code parsing modules for help with something like this.

0 讨论(0)
发布评论:

提交评论
- 加载中...