We were set an algorithm problem in class today, as a \"if you figure out a solution you don\'t have to do this subject\". SO of course, we all thought we will give it a go.
Interesting problem. What you're looking at is word classification. While you can learn and use traditional information retrieval methods like LSA and categorization based on such - I'm not sure if that is your intent (if it is, then do so by all means! :)
Since you say you can use external data, I would suggest using wordnet and its link between words. For instance, using wordnet,
# S: (n) **fishing**, sportfishing (the act of someone who fishes as a diversion)
* direct hypernym / inherited hypernym / sister term
o S: (n) **outdoor sport, field sport** (a sport that is played outdoors)
+ direct hypernym / inherited hypernym / sister term
# S: (n) **sport**, athletics
(an active diversion requiring physical exertion and competition)
What we see here is a list of relationships between words. The term fishing relates to outdoor sport, which relates to sport.
Now, if you get the drift - it is possible to use this relationship to compute a probability of classifying "fishing" to "sport" - say, based on the linear distance of the word-chain, or number of occurrences, et al. (should be trivial to find resources on how to construct similarity measures using wordnet. when the prof says "not to use google", I assume he means programatically and not as a means to get information to read up on!)
As for C# with wordnet - how about http://opensource.ebswift.com/WordNet.Net/