Comparing list of strings with an available dictionary/thesaurus

前端 未结 2 1327
[愿得一人]
[愿得一人] 2021-01-19 16:21

I have a program (C#) that generates a list of strings (permutations of an original string). Most of the strings are random grouping of the original letters as expected (ie

相关标签:
2条回答
  • 2021-01-19 17:08

    You could download a list of words from the web (say one of the files mentioned here: http://www.outpost9.com/files/WordLists.html), then then do a quick:

    // Read words from file.
    string [] words = ReadFromFile();
    
    Dictionary<String, List<String>> permuteDict = new Dictionary<String, List<String>>(StringComparer.OrdinalIgnoreCase);
    
    foreach (String word in words) {
        String sortedWord = new String(word.ToArray().Sort());
        if (!permuteDict.ContainsKey(sortedWord)) {
            permuteDict[sortedWord] = new List<String>();
        }
        permuteDict[sortedWord].Add(word);
    }
    
    // To do a lookup you can just use
    
    String sortedWordToLook = new String(wordToLook.ToArray().Sort());
    
    List<String> outWords;
    if (permuteDict.TryGetValue(sortedWordToLook, out outWords)) {
        foreach (String outWord in outWords) {
            Console.WriteLine(outWord);
        }
    }
    
    0 讨论(0)
  • 2021-01-19 17:18

    You can also use Wiktionary. The MediaWiki API (Wikionary uses MediaWiki) allows you to query for a list of article titles. In wiktionary, article titles are (among other things) word entries in the dictionary. The only catch is that foreign words are also in the dictionary, so you might get "incorrect" matches sometimes. Your user will also need internet access, of course. You can get help and info on the api at: http://en.wiktionary.org/w/api.php

    Here's an example of your query URL:

    http://en.wiktionary.org/w/api.php?action=query&format=xml&titles=dog|god|ogd|odg|gdo
    

    This returns the following xml:

    <?xml version="1.0"?>
    <api>
      <query>
        <pages>
          <page ns="0" title="ogd" missing=""/>
          <page ns="0" title="odg" missing=""/>
          <page ns="0" title="gdo" missing=""/>
          <page pageid="24" ns="0" title="dog"/>
          <page pageid="5015" ns="0" title="god"/>
        </pages>
      </query>
    </api>
    

    In C#, you can then use System.Xml.XPath to get the parts you need (page items with pageid). Those are the "real words".

    I wrote an implementation and tested it (using the simple "dog" example from above). It returned just "dog" and "god". You should test it more extensively.

    public static IEnumerable<string> FilterRealWords(IEnumerable<string> testWords)
    {
        string baseUrl = "http://en.wiktionary.org/w/api.php?action=query&format=xml&titles=";
        string queryUrl = baseUrl + string.Join("|", testWords.ToArray());
    
        WebClient client = new WebClient();
        client.Encoding = UnicodeEncoding.UTF8; // this is very important or the text will be junk
    
        string rawXml = client.DownloadString(queryUrl);
    
        TextReader reader = new StringReader(rawXml);
        XPathDocument doc = new XPathDocument(reader);
        XPathNavigator nav = doc.CreateNavigator();
        XPathNodeIterator iter = nav.Select(@"//page");
    
        List<string> realWords = new List<string>();
        while (iter.MoveNext())
        {
            // if the pageid attribute has a value
            // add the article title to the list.
            if (!string.IsNullOrEmpty(iter.Current.GetAttribute("pageid", "")))
            {
                realWords.Add(iter.Current.GetAttribute("title", ""));
            }
        }
    
        return realWords;
    }
    

    Call it like this:

    IEnumerable<string> input = new string[] { "dog", "god", "ogd", "odg", "gdo" };
    IEnumerable<string> output = FilterRealWords(input);
    

    I tried using LINQ to XML, but I'm not that familiar with it, so it was a pain and I gave up on it.

    0 讨论(0)
提交回复
热议问题