What is the Python equivalent of Perl\'s chomp
function, which removes the last character of a string if it is a newline?
Just use :
line = line.rstrip("\n")
or
line = line.strip("\n")
You don't need any of this complicated stuff
Try the method rstrip()
(see doc Python 2 and Python 3)
>>> 'test string\n'.rstrip()
'test string'
Python's rstrip()
method strips all kinds of trailing whitespace by default, not just one newline as Perl does with chomp.
>>> 'test string \n \r\n\n\r \n\n'.rstrip()
'test string'
To strip only newlines:
>>> 'test string \n \r\n\n\r \n\n'.rstrip('\n')
'test string \n \r\n\n\r '
There are also the methods strip()
, lstrip()
and strip()
:
>>> s = " \n\r\n \n abc def \n\r\n \n "
>>> s.strip()
'abc def'
>>> s.lstrip()
'abc def \n\r\n \n '
>>> s.rstrip()
' \n\r\n \n abc def'
Note that rstrip doesn't act exactly like Perl's chomp() because it doesn't modify the string. That is, in Perl:
$x="a\n";
chomp $x
results in $x
being "a"
.
but in Python:
x="a\n"
x.rstrip()
will mean that the value of x
is still "a\n"
. Even x=x.rstrip()
doesn't always give the same result, as it strips all whitespace from the end of the string, not just one newline at most.
s = s.rstrip()
will remove all newlines at the end of the string s
. The assignment is needed because rstrip
returns a new string instead of modifying the original string.
There are three types of line endings that we normally encounter: \n
, \r
and \r\n
. A rather simple regular expression in re.sub, namely r"\r?\n?$"
, is able to catch them all.
(And we gotta catch 'em all, am I right?)
import re
re.sub(r"\r?\n?$", "", the_text, 1)
With the last argument, we limit the number of occurences replaced to one, mimicking chomp to some extent. Example:
import re
text_1 = "hellothere\n\n\n"
text_2 = "hellothere\n\n\r"
text_3 = "hellothere\n\n\r\n"
a = re.sub(r"\r?\n?$", "", text_1, 1)
b = re.sub(r"\r?\n?$", "", text_2, 1)
c = re.sub(r"\r?\n?$", "", text_3, 1)
... where a == b == c
is True
.
This will work both for windows and linux (bit expensive with re sub if you are looking for only re solution)
import re
if re.search("(\\r|)\\n$", line):
line = re.sub("(\\r|)\\n$", "", line)