How can I include special characters (tab, newline) in a python doctest result string?

后端未结

关注

 6  1519

Given the following python script:

# dedupe.py
import re

def dedupe_whitespace(s,spacechars=\'\\t \'):
    \"\"\"Merge repeated whitespace characters.
    Examp


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  星月不相逢        
                
              
                            
                2021-02-18 19:38
              
            
            
                                                                       
TL;DR: Escape the backslash, i.e., use \\n or \\t instead of \n or \t in your otherwise unmodified strings;

You probably don't want to make your docstrings raw as then you won't be able to use any Python string escapes including those you might want to.

For a method that supports using normal escapes, just escape the backslash in the backslash-character escape so after Python interprets it, it leaves a literal backslash followed by the character which doctest can parse.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2021-02-18 19:38
              
            
            
                                                                       
You must set the NORMALIZE_WHITESPACE. Or, alternatively, capture the output and compare it to the expected value:


def dedupe_whitespace(s,spacechars='\t '):
    """Merge repeated whitespace characters.
    Example:
    >>> output = dedupe_whitespace(r"Black\t\tGround")  #doctest: +REPORT_NDIFF
    >>> output == 'Black\tGround'
    True
    """




From the doctest documentation section How are Docstring Examples Recognized?:


  All hard tab characters are expanded to spaces, using 8-column tab
  stops. Tabs in output generated by the tested code are not modified.
  Because any hard tabs in the sample output are expanded, this means
  that if the code output includes hard tabs, the only way the doctest
  can pass is if the
  NORMALIZE_WHITESPACE
  option or directive is in effect. Alternatively, the test can be
  rewritten to capture the output and compare it to an expected value as
  part of the test. This handling of tabs in the source was arrived at
  through trial and error, and has proven to be the least error prone
  way of handling them. It is possible to use a different algorithm for
  handling tabs by writing a custom DocTestParser class.


Edit: My mistake, I understood the docs the other way around. Tabs are being expanded to 8 spaces at both the string argument passed to dedupe_whitespace and the string literal being compared on the next line, so output contains:

"Black Ground"


and is being compared to:

"Black        Ground"


I can't find a way to overcome this limitation without writing your own DocTestParser or testing for deduplicated spaces instead of tabs.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2021-02-18 19:41
              
            
            
                                                                       
I've gotten this to work using literal string notation for the docstring:

def join_with_tab(iterable):
    r"""
    >>> join_with_tab(['1', '2'])
    '1\t2'
    """

    return '\t'.join(iterable)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-02-18 19:43
              
            
            
                                                                       
I got it to work by escaping the tab character in the expected string:

>>> function_that_returns_tabbed_text()
'\\t\\t\\tsometext\\t\\t'


instead of

>>> function_that_returns_tabbed_text()
\t\t\tsometext\t\t

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  离开以前        
                
              
                            
                2021-02-18 19:44
              
            
            
                                                                       
It's the raw heredoc string notation (r""") that did the trick:

# filename: dedupe.py
import re,doctest
def dedupe_whitespace(s,spacechars='\t '):
    r"""Merge repeated whitespace characters.
    Example:
    >>> dedupe_whitespace('Black\t\tGround')  #doctest: +REPORT_NDIFF
    'Black\tGround'
    """
    for w in spacechars:
        s = re.sub(r"("+w+"+)", w, s)
    return s

if __name__ == "__main__":
    doctest.testmod()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2021-02-18 19:47
              
            
            
                                                                       
This is basically YatharhROCK's answer, but a bit more explicit.  You can use raw strings or double escaping.  But why?

You need the string literal to contain valid Python code that, when interpreted, is the code you want to run/test.  These both work:

#!/usr/bin/env python

def split_raw(val, sep='\n'):
  r"""Split a string on newlines (by default).

  >>> split_raw('alpha\nbeta\ngamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)


def split_esc(val, sep='\n'):
  """Split a string on newlines (by default).

  >>> split_esc('alpha\\nbeta\\ngamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)

import doctest
doctest.testmod()


The effect of using raw strings and the effect of double-escaping (escape the slash) both leaves in the string two characters, the slash and the n.  This code is passed to the Python interpreter, which takes "slash then n" to mean "newline character" inside a string literal.

Use whichever you prefer.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复