NLP: Building (small) corpora, or “Where to get lots of not-too-specialized English-language text files?”

前端 未结 7 817
温柔的废话
温柔的废话 2021-01-13 03:41

Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a

7条回答
  •  野的像风
    2021-01-13 04:07

    You've covered the obvious ones. The only other areas that I can think of too supplement:

    1) News articles / blogs.

    2) Magazines are posting a lot of free material online, and you can get a good cross section of topics.

提交回复
热议问题