问题
I'm happy to ask my first python question !!! I would like to strip the beginning (the part before the first occurrence of the article) of the sample file below. To do this I use re.sub library.
below this is my file sample.txt:
fdasfdadfa
adfadfasdf
afdafdsfas
adfadfadf
adfadsf
afdaf
article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc
And my Python code to parse this file:
for line in open('sample.txt'):
test = test + line
result = re.sub(r'.*article:', 'article', test, 1, flags=re.S)
print result
Sadly this code only displays the last article. The output of the code:
article: name of the first article
ccccccc
ccccccc
ccccccc
Does someone know how to strip only the beginning of the file and display the 3 articles?
回答1:
You can use itertools.dropwhile to get this effect
from itertools import dropwhile
with open('filename.txt') as f:
articles = ''.join(dropwhile(lambda line: not line.startswith('article'), f))
print(articles)
prints
article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc
来源:https://stackoverflow.com/questions/49525317/how-to-strip-the-beginning-of-a-file-with-python-library-re-sub