Python 2 newline tokens in tokenize module

放肆的年华 提交于 2019-12-08 17:45:36

问题


I am using the tokenize module in Python and wonder why there are 2 different newline tokens:

NEWLINE = 4
NL = 54

Any examples of code that would produce both tokens would be appreciated.


回答1:


According to python documentation:

tokenize.NL
Token value used to indicate a non-terminating newline. The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines.

More here: https://docs.python.org/2/library/tokenize.html




回答2:


There are at least 4 possible cases of '\n' in Python code; 2 of them are codified by tokens:

  1. Statement-terminating newline: tokenize.NEWLINE - this is the token more or less corresponding to the C or Java ;.

  2. Any newline that does not terminate a statement, and does not belong to cases 3 or 4: tokenize.NL.

  3. The newlines in multiline strings.

  4. A newline that occurs at line-continuation \ - contrary to what the documentation would seem to indicate, this case does not produce any token at all.

Thus the following example:

# case 1
a = 6
b = 7

# case 2
answer = (
    a * b
)

# case 3
format = """
A multiline string
"""

# case 4
print "something that is continued" \
    "on the following line."

Gives the all possible cases:

1,0-1,8:        COMMENT '# case 1'
1,8-1,9:        NL      '\n'
2,0-2,1:        NAME    'a'
2,2-2,3:        OP      '='
2,4-2,5:        NUMBER  '6'
2,5-2,6:        NEWLINE '\n'
3,0-3,1:        NAME    'b'
3,2-3,3:        OP      '='
3,4-3,5:        NUMBER  '7'
3,5-3,6:        NEWLINE '\n'
4,0-4,1:        NL      '\n'
5,0-5,8:        COMMENT '# case 2'
5,8-5,9:        NL      '\n'
6,0-6,6:        NAME    'answer'
6,7-6,8:        OP      '='
6,9-6,10:       OP      '('
6,10-6,11:      NL      '\n'
7,4-7,5:        NAME    'a'
7,6-7,7:        OP      '*'
7,8-7,9:        NAME    'b'
7,9-7,10:       NL      '\n'
8,0-8,1:        OP      ')'
8,1-8,2:        NEWLINE '\n'
9,0-9,1:        NL      '\n'
10,0-10,8:      COMMENT '# case 3'
10,8-10,9:      NL      '\n'
11,0-11,6:      NAME    'format'
11,7-11,8:      OP      '='
11,9-13,3:      STRING  '"""\nA multiline string\n"""'
13,3-13,4:      NEWLINE '\n'
14,0-14,1:      NL      '\n'
15,0-15,8:      COMMENT '# case 4'
15,8-15,9:      NL      '\n'
16,0-16,5:      NAME    'print'
16,6-16,35:     STRING  '"something that is continued"'
17,4-17,28:     STRING  '"on the following line."'
17,28-17,29:    NEWLINE '\n'
18,0-18,0:      ENDMARKER       ''



回答3:


In addition to the quote from the docs

The NEWLINE token indicates the end of a logical line of Python code; NL tokens are generated when a logical line of code is continued over multiple physical lines.

here is an example

def a_func(a, b):
    pass

This will generate

1,0-1,3:        NAME    'def'
1,4-1,10:       NAME    'a_func'
1,10-1,11:      OP      '('
1,11-1,12:      NAME    'a'
1,12-1,13:      OP      ','
1,14-1,15:      NAME    'b'
1,15-1,16:      OP      ')'
1,16-1,17:      OP      ':'
1,17-1,18:      NEWLINE '\n'
2,0-2,4:        INDENT  '    '
2,4-2,8:        NAME    'pass'
2,8-2,9:        NEWLINE '\n'
3,0-3,0:        DEDENT  ''

Whereas

def a_func(a,
           b):
    pass

will generate this

1,0-1,3:        NAME    'def'
1,4-1,10:       NAME    'a_func'
1,10-1,11:      OP      '('
1,11-1,12:      NAME    'a'
1,12-1,13:      OP      ','
1,13-1,14:      NL      '\n'
2,11-2,12:      NAME    'b'
2,12-2,13:      OP      ')'
2,13-2,14:      OP      ':'
2,14-2,15:      NEWLINE '\n'
3,0-3,4:        INDENT  '    '
3,4-3,8:        NAME    'pass'
3,8-3,9:        NEWLINE '\n'
4,0-4,0:        DEDENT  ''
4,0-4,0:        ENDMARKER       ''

Note the 1,13-1,14: NL '\n' after a,


Basically the difference between NEWLINE and NL is that NL is generated after a line that is not 'complete':

def a_func(a, b):

results in NEWLINE because the entire logical line is on 1 physical line

def another_func(a,
                 b)

results in NL, because the code for that 1 logical line is spread over 2 physical lines



来源:https://stackoverflow.com/questions/24519046/python-2-newline-tokens-in-tokenize-module

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!