Best practice to keep RSS feeds unique in sql database

牧云@^-^@ 提交于 2019-12-24 03:03:29

问题


I am working on a project which shows rss feeds from different sites. I keep them in the database, every 3 hours my program fetches and inserts them into sql database. I want unique records for providers not to show duplicate content.

But problem is some providers do not give GUID field, and some others gives GUID field but not pubdate.. And some others does not even give GUID or PubDate just title and link.

So to keep rss feeds uniqe in sql server what would be the best way?

Should I check for first guid, then pubbdate, then link, then title? Will it be to good practice to compare link fields in SQL to check uniqueness?

Thanks.


回答1:


I would develop a routine that takes certain key parameters like the title, source and body and then combines them to create a CRC hash. Then store the hash as an attribute with the feed and check for a matching hash before adding a new feed.

I'm not sure what your environment contraints are but here is an example for calculating CRC-32 in C#: http://damieng.com/blog/2006/08/08/calculating_crc32_in_c_and_net



来源:https://stackoverflow.com/questions/11953807/best-practice-to-keep-rss-feeds-unique-in-sql-database

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!