I am looking for a corpus of text to run some trial fulltext style data searches across. Either something I can download, or a system that generates it. Something a bit more random would be better e.g. 1,000,000 wikipedia articles in a format easy to insert into a 2 column database (id, text).
Any ideas or suggestions?
I'll throw this out there since I'm familiar with it - Prosper.com makes their member loan listings available for analysis through an XML export. The export would have about 50,000 loan requests with descriptions and over 1,000,000 member profiles (although many of those are empty).
Project Gutenberg has 32000 books available.
Edit: As of now (17.06.16) there are 52,284 free ebooks to download as plain text file in UTF-8 in a wide variety of topics (From science to religion). Also in formats EPUB, Kindle or html format. Check here Project Gutenberg
Why not use a Wikipedia dump?
来源:https://stackoverflow.com/questions/3095813/looking-for-dataset-to-test-fulltext-style-searches-on