Categorizing Words and Category Values

前端 未结 21 1588
温柔的废话
温柔的废话 2021-01-31 05:49

We were set an algorithm problem in class today, as a \"if you figure out a solution you don\'t have to do this subject\". SO of course, we all thought we will give it a go.

相关标签:
21条回答
  • 2021-01-31 06:40

    My attempt would be to use the toolset of CRM114 to provide a way to analyze a big corpus of text. Then you can utilize the matchings from it to give a guess.

    0 讨论(0)
  • 2021-01-31 06:42

    As you don't need to attend the subject when you solve this 'riddle' it's not supposed to be easy I think. Nevertheless I would do something like this (told in a very simplistic way)

    Build up a Neuronal Network which you give some input (a (e)book, some (e)books) => no google needed

    this network classifies words (Neural networks are great for 'unsure' classification). I think you may simply know which word belongs to which category because of the occurences in the text. ('fishing' is likely to be mentioned near 'sports'). After some training of the neural network it should "link" you the words to the categories.

    0 讨论(0)
  • 2021-01-31 06:43

    Interesting problem. What you're looking at is word classification. While you can learn and use traditional information retrieval methods like LSA and categorization based on such - I'm not sure if that is your intent (if it is, then do so by all means! :)

    Since you say you can use external data, I would suggest using wordnet and its link between words. For instance, using wordnet,

    # S: (n) **fishing**, sportfishing (the act of someone who fishes as a diversion)
    * direct hypernym / inherited hypernym / sister term
          o S: (n) **outdoor sport, field sport** (a sport that is played outdoors)
          + direct hypernym / inherited hypernym / sister term
                # S: (n) **sport**, athletics 
                (an active diversion requiring physical exertion and competition) 
    

    What we see here is a list of relationships between words. The term fishing relates to outdoor sport, which relates to sport.

    Now, if you get the drift - it is possible to use this relationship to compute a probability of classifying "fishing" to "sport" - say, based on the linear distance of the word-chain, or number of occurrences, et al. (should be trivial to find resources on how to construct similarity measures using wordnet. when the prof says "not to use google", I assume he means programatically and not as a means to get information to read up on!)

    As for C# with wordnet - how about http://opensource.ebswift.com/WordNet.Net/

    0 讨论(0)
  • 2021-01-31 06:47

    You might be able to put use the WordNet database, create some metric to determine how closely linked two words (the word and the category) are and then choose the best category to put the word in.

    0 讨论(0)
  • 2021-01-31 06:51

    Use (either online, or download) WordNet, and find the number of relationships you have to follow between words and each category.

    0 讨论(0)
  • 2021-01-31 06:52

    Scrape delicious.com and search for each word, looking at collective tag counts, etc.

    Not much more I can say about that, but delicious is old, huge, incredibly-heavily tagged and contains a wealth of current relevant semantic information to draw from. It would be very easy to build a semantics database this way, using your word list as a basis from scraping.

    The knowledge is in the tags.

    0 讨论(0)
提交回复
热议问题