问题
I'm learning Python. To teach myself I've decided to try to build a tool which gathers RSS feeds and stores the output, title, URL and Summary in a database (I will later build a tool to access the data and scrape the pages)
So far, I have created a local version that gathers gathers content from a list of RSS feeds and puts it into a pandas dataframe.
What I'm trying to understand next is, what tools do I need to turn this local script into a script that runs every, for example, 30 mins and adds the new found data to the database.
Any direction would be helpful.
import feedparser
import pandas as pd
rawrss = [
'http://newsrss.bbc.co.uk/rss/newsonline_uk_edition/front_page/rss.xml',
'https://www.yahoo.com/news/rss/',
'http://www.huffingtonpost.co.uk/feeds/index.xml',
'http://feeds.feedburner.com/TechCrunch/',
]
posts = []
for url in rawrss:
feed = feedparser.parse(url)
for post in feed.entries:
posts.append((post.title, post.link, post.summary))
df = pd.DataFrame(posts, columns=['title', 'link', 'summary']) # pass data to init
df
来源:https://stackoverflow.com/questions/45735189/script-for-feedpaser-to-regularly-gather-rss-then-storing-data-in-database