how to strip the beginning of a file with python library re.sub?

问题

I'm happy to ask my first python question !!! I would like to strip the beginning (the part before the first occurrence of the article) of the sample file below. To do this I use re.sub library.

below this is my file sample.txt:

fdasfdadfa
adfadfasdf
afdafdsfas
adfadfadf
adfadsf
afdaf

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc

And my Python code to parse this file:

for line in open('sample.txt'):
    test = test + line

result = re.sub(r'.*article:', 'article', test, 1, flags=re.S)
print result

Sadly this code only displays the last article. The output of the code:

article: name of the first article
ccccccc
ccccccc
ccccccc

Does someone know how to strip only the beginning of the file and display the 3 articles?

回答1:

You can use itertools.dropwhile to get this effect

from itertools import dropwhile

with open('filename.txt') as f:
    articles = ''.join(dropwhile(lambda line: not line.startswith('article'), f))

print(articles)

prints

article: name of the first article
aaaaaaa
aaaaaaa
aaaaaaa
article: name of the first article
bbbbbbb
bbbbbbb
bbbbbbb
article: name of the first article
ccccccc
ccccccc
ccccccc

来源：https://stackoverflow.com/questions/49525317/how-to-strip-the-beginning-of-a-file-with-python-library-re-sub

标签

python

regex

substitution

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!