What is the Python equivalent of Perl\'s chomp
function, which removes the last character of a string if it is a newline?
s = '''Hello World \t\n\r\tHi There'''
# import the module string
import string
# use the method translate to convert
s.translate({ord(c): None for c in string.whitespace}
>>'HelloWorldHiThere'
With regex
s = ''' Hello World
\t\n\r\tHi '''
print(re.sub(r"\s+", "", s), sep='') # \s matches all white spaces
>HelloWorldHi
Replace \n,\t,\r
s.replace('\n', '').replace('\t','').replace('\r','')
>' Hello World Hi '
With regex
s = '''Hello World \t\n\r\tHi There'''
regex = re.compile(r'[\n\r\t]')
regex.sub("", s)
>'Hello World Hi There'
with Join
s = '''Hello World \t\n\r\tHi There'''
' '.join(s.split())
>'Hello World Hi There'
If you are concerned about speed (say you have a looong list of strings) and you know the nature of the newline char, string slicing is actually faster than rstrip. A little test to illustrate this:
import time
loops = 50000000
def method1(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string[:-1]
t1 = time.time()
print('Method 1: ' + str(t1 - t0))
def method2(loops=loops):
test_string = 'num\n'
t0 = time.time()
for num in xrange(loops):
out_sting = test_string.rstrip()
t1 = time.time()
print('Method 2: ' + str(t1 - t0))
method1()
method2()
Output:
Method 1: 3.92700004578
Method 2: 6.73000001907
"line 1\nline 2\r\n...".replace('\n', '').replace('\r', '')
>>> 'line 1line 2...'
or you could always get geekier with regexps :)
have fun!
I'm bubbling up my regular expression based answer from one I posted earlier in the comments of another answer. I think using re
is a clearer more explicit solution to this problem than str.rstrip
.
>>> import re
If you want to remove one or more trailing newline chars:
>>> re.sub(r'[\n\r]+$', '', '\nx\r\n')
'\nx'
If you want to remove newline chars everywhere (not just trailing):
>>> re.sub(r'[\n\r]+', '', '\nx\r\n')
'x'
If you want to remove only 1-2 trailing newline chars (i.e., \r
, \n
, \r\n
, \n\r
, \r\r
, \n\n
)
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r\n')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n\r')
'\nx\r'
>>> re.sub(r'[\n\r]{1,2}$', '', '\nx\r\n')
'\nx'
I have a feeling what most people really want here, is to remove just one occurrence of a trailing newline character, either \r\n
or \n
and nothing more.
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n\n', count=1)
'\nx\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n\r\n', count=1)
'\nx\r\n'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\r\n', count=1)
'\nx'
>>> re.sub(r'(?:\r\n|\n)$', '', '\nx\n', count=1)
'\nx'
(The ?:
is to create a non-capturing group.)
(By the way this is not what '...'.rstrip('\n', '').rstrip('\r', '')
does which may not be clear to others stumbling upon this thread. str.rstrip
strips as many of the trailing characters as possible, so a string like foo\n\n\n
would result in a false positive of foo
whereas you may have wanted to preserve the other newlines after stripping a single trailing one.)
And I would say the "pythonic" way to get lines without trailing newline characters is splitlines().
>>> text = "line 1\nline 2\r\nline 3\nline 4"
>>> text.splitlines()
['line 1', 'line 2', 'line 3', 'line 4']
I find it convenient to have be able to get the chomped lines via in iterator, parallel to the way you can get the un-chomped lines from a file object. You can do so with the following code:
def chomped_lines(it):
return map(operator.methodcaller('rstrip', '\r\n'), it)
Sample usage:
with open("file.txt") as infile:
for line in chomped_lines(infile):
process(line)