This is a classic problem in the field of streaming algorithms. There's clearly no way to do this that works in certain degenerate cases; you'll need to settle for a bunch of elements that are approximately (in a well-defined sense) the top k words in your stream. I don't know any classic references, but a quick Google brought me to this. It seems to have a nice survey on various techniques for doing streaming top-K. You might check the references therein for other ideas.
One other idea (and one that doesn't fly in the streaming model) is just to randomly sample as many words as will fit into memory, sort-and-uniq them, and do another pass over the file counting hits of the words in your sample. Then you can easily find the top k.