Programmatically determine whether to describe an object with “a” or “an”?

后端 未结 8 1491
耶瑟儿~
耶瑟儿~ 2020-12-01 21:13

I have a database of nouns (ex \"house\", \"exclamation point\", \"apple\") that I need to output and describe in my application. It\'s hard to put together a natural-soundi

相关标签:
8条回答
  • 2020-12-01 21:45

    Make an array with vowels in it. Check if the first letter of the word you are checking is in the vowel array. Will work except when dealing with acronyms.

    0 讨论(0)
  • 2020-12-01 21:46

    I've written a PHP port of the popular JS a-vs-an code as described in this stackoverflow post https://stackoverflow.com/a/1288473/1526020.

    Github page: https://github.com/UseAllFive/a-vs-an.

    E.g.

    $result = $aVsAn->query('0800 number');
    print_r($result);
    

    Returns

    Array
    (
        [aCount] => 8
        [anCount] => 25
        [prefix] => 08
        [article] => an
    )
    
    0 讨论(0)
  • 2020-12-01 21:52

    I needed this for a C# project so here's the C# port of the Python code mentioned above. Make sure to include using System.Text.RegularExpressions; in your source file.

    private string GetIndefiniteArticle(string noun_phrase)
    {
        string word = null;
        var m = Regex.Match(noun_phrase, @"\w+");
        if (m.Success)
            word = m.Groups[0].Value;
        else
            return "an";
    
        var wordi = word.ToLower();
        foreach (string anword in new string[] { "euler", "heir", "honest", "hono" })
            if (wordi.StartsWith(anword))
                return "an";
    
        if (wordi.StartsWith("hour") && !wordi.StartsWith("houri"))
            return "an";
    
        var char_list = new char[] { 'a', 'e', 'd', 'h', 'i', 'l', 'm', 'n', 'o', 'r', 's', 'x' };
        if (wordi.Length == 1)
        {
            if (wordi.IndexOfAny(char_list) == 0)
                return "an";
            else
                return "a";
        }
    
        if (Regex.Match(word, "(?!FJO|[HLMNS]Y.|RY[EO]|SQU|(F[LR]?|[HL]|MN?|N|RH?|S[CHKLMNPTVW]?|X(YL)?)[AEIOU])[FHLMNRSX][A-Z]").Success)
            return "an";
    
        foreach (string regex in new string[] { "^e[uw]", "^onc?e\b", "^uni([^nmd]|mo)", "^u[bcfhjkqrst][aeiou]" })
        {
            if (Regex.IsMatch(wordi, regex))
                return "a";
        }
    
        if (Regex.IsMatch(word, "^U[NK][AIEO]"))
            return "a";
        else if (word == word.ToUpper())
        {
            if (wordi.IndexOfAny(char_list) == 0)
                return "an";
            else
                return "a";
        }
    
        if (wordi.IndexOfAny(new char[] { 'a', 'e', 'i', 'o', 'u' }) == 0)
            return "an";
    
        if (Regex.IsMatch(wordi, "^y(b[lor]|cl[ea]|fere|gg|p[ios]|rou|tt)"))
            return "an";
    
        return "a";
    }
    
    0 讨论(0)
  • 2020-12-01 21:53

    It should be pretty easy to write from scratch, tbh. If a word starts with a vowel, it gets an 'a'; if it begins with a consonant, it gets an 'an'. Programmatically it's easy to do - if you have any edge cases (for eg you might use the BBC english-style 'an historic occasion') you can handle them individually.

    Kind of like using an inflector, only with the 'a'/'an' grammar rule instead of plurals. Look into how CakePHP or Rails handle inflection for a more thorough discussion of the concept, including how to handle edge cases - you don't want to inflect 'deer' as 'deers' in the plural, for example, or 'goose' as 'gooses', so they need to be handled individually, just like your own edge cases like 'universe' or aspirated/non-aspirated 'H's.

    0 讨论(0)
  • 2020-12-01 21:55

    Was looking for just such a solution so thanks marcog. Here's an attempt to port your friend's python version (I don't know python or perl so there's probably some mistakes):

    function indefinite_article($word) {
        // Lowercase version of the word
        $word_lower = strtolower($word);
    
        // An 'an' word (specific start of words that should be preceeded by 'an')
        $an_words = array('euler', 'heir', 'honest', 'hono');
        foreach($an_words as $an_word) {
                if(substr($word_lower,0,strlen($an_word)) == $an_word) return "an";
        }
        if(substr($word_lower,0,4) == "hour" and substr($word_lower,0,5) != "houri") return "an";
    
        // An 'an' letter (single letter word which should be preceeded by 'an')
        $an_letters = array('a','e','f','h','i','l','m','n','o','r','s','x');
        if(strlen($word) == 1) {
                if(in_array($word_lower,$an_letters)) return "an";
                else return "a";
        }
    
        // Capital words which should likely by preceeded by 'an'
        if(preg_match('/(?!FJO|[HLMNS]Y.|RY[EO]|SQU|(F[LR]?|[HL]|MN?|N|RH?|S[CHKLMNPTVW]?|X(YL)?)[AEIOU])[FHLMNRSX][A-Z]/', $word)) return "an";
    
        // Special cases where a word that begins with a vowel should be preceeded by 'a'
        $regex_array = array('^e[uw]','^onc?e\b','^uni([^nmd]|mo)','^u[bcfhjkqrst][aeiou]');
        foreach($regex_array as $regex) {
                if(preg_match('/'.$regex.'/',$word_lower)) return "a";        
        }
    
        // Special capital words
        if(preg_match('/^U[NK][AIEO]/',$word)) return "a";
        // Not sure what this does
        else if($word == strtoupper($word)) {
                $array = array('a','e','d','h','i','l','m','n','o','r','s','x');
                if(in_array($word_lower[0],$array)) return "an";
                else return "a";
        }
    
        // Basic method of words that begin with a vowel being preceeded by 'an'
        $vowels = array('a','e','i','o','u');
        if(in_array($word_lower[0],$vowels)) return "an";
    
        // Instances where y follwed by specific letters is preceeded by 'an'
        if(preg_match('/^y(b[lor]|cl[ea]|fere|gg|p[ios]|rou|tt)/', $word_lower)) return "an";
    
        // Default to 'a'
        return "a";
    }
    

    There's one bit (below the comment "// Not sure what this does") that I was unsure of what it did. If anyone can figure it out, I'd be happy to know.

    0 讨论(0)
  • 2020-12-01 21:56

    I was also looking for such solution but in JavaScript. So I ported it over to JS, you can check out the actual project in github https://github.com/rigoneri/indefinite-article.js

    Here is the code snippet:

     function indefinite_article(phrase) {
    
        // Getting the first word 
        var match = /\w+/.exec(phrase);
        if (match)
            var word = match[0];
        else
            return "an";
    
        var l_word = word.toLowerCase();
        // Specific start of words that should be preceeded by 'an'
        var alt_cases = ["honest", "hour", "hono"];
        for (var i in alt_cases) {
            if (l_word.indexOf(alt_cases[i]) == 0)
                return "an";
        }
    
        // Single letter word which should be preceeded by 'an'
        if (l_word.length == 1) {
            if ("aedhilmnorsx".indexOf(l_word) >= 0)
                return "an";
            else
                return "a";
        }
    
        // Capital words which should likely be preceeded by 'an'
        if (word.match(/(?!FJO|[HLMNS]Y.|RY[EO]|SQU|(F[LR]?|[HL]|MN?|N|RH?|S[CHKLMNPTVW]?|X(YL)?)[AEIOU])[FHLMNRSX][A-Z]/)) {
            return "an";
        }
    
        // Special cases where a word that begins with a vowel should be preceeded by 'a'
        regexes = [/^e[uw]/, /^onc?e\b/, /^uni([^nmd]|mo)/, /^u[bcfhjkqrst][aeiou]/]
        for (var i in regexes) {
            if (l_word.match(regexes[i]))
                return "a"
        }
    
        // Special capital words (UK, UN)
        if (word.match(/^U[NK][AIEO]/)) {
            return "a";
        }
        else if (word == word.toUpperCase()) {
            if ("aedhilmnorsx".indexOf(l_word[0]) >= 0)
                return "an";
            else 
                return "a";
        }
    
        // Basic method of words that begin with a vowel being preceeded by 'an'
        if ("aeiou".indexOf(l_word[0]) >= 0)
            return "an";
    
        // Instances where y follwed by specific letters is preceeded by 'an'
        if (l_word.match(/^y(b[lor]|cl[ea]|fere|gg|p[ios]|rou|tt)/))
            return "an";
    
        return "a";
    }
    
    0 讨论(0)
提交回复
热议问题