Extracting whole words
问题 I have a large set of real-world text that I need to pull words out of to input into a spell checker. I'd like to extract as many meaningful words as possible without too much noise. I know there's plenty of regex ninjas around here, so hopefully someone can help me out. Currently I'm extracting all alphabetical sequences with '[a-z]+' . This is an okay approximation, but it drags a lot of rubbish out with it. Ideally I would like some regex (doesn't have to be pretty or efficient) that