Looking for dataset to test FULLTEXT style searches on [closed]

纵然是瞬间 提交于 2019-12-30 08:50:50

问题


I am looking for a corpus of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).

Any ideas or suggestions?


回答1:


I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).




回答2:


Project Gutenberg has 32000 books available.

Edit: As of now (17.06.16) there are 52,284 free ebooks to download as plain text file in UTF-8 in a wide variety of topics (From science to religion). Also in formats EPUB, Kindle or html format. Check here Project Gutenberg




回答3:


Why not use a Wikipedia dump?



来源:https://stackoverflow.com/questions/3095813/looking-for-dataset-to-test-fulltext-style-searches-on

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!