ReverseString, a C# interview-question

前端 未结 12 1871
予麋鹿
予麋鹿 2021-01-31 06:15

I had an interview question that asked me for my \'feedback\' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily o

12条回答
  •  无人及你
    2021-01-31 07:06

    Necromancing.
    As a public service, this is how you actually CORRECTLY reverse a string
    (reversing a string is NOT equal to reversing a sequence of chars)

    public static class Test
    {
    
        private static System.Collections.Generic.List GraphemeClusters(string s)
        {
            System.Collections.Generic.List ls = new System.Collections.Generic.List();
    
            System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
            while (enumerator.MoveNext())
            {
                ls.Add((string)enumerator.Current);
            }
    
            return ls;
        }
    
    
        // this 
        private static string ReverseGraphemeClusters(string s)
        {
            if(string.IsNullOrEmpty(s) || s.Length == 1)
                 return s;
    
            System.Collections.Generic.List ls = GraphemeClusters(s);
            ls.Reverse();
    
            return string.Join("", ls.ToArray());
        }
    
        public static void TestMe()
        {
            string s = "Les Mise\u0301rables";
            // s = "noël";
            string r = ReverseGraphemeClusters(s);
    
            // This would be wrong:
            // char[] a = s.ToCharArray();
            // System.Array.Reverse(a);
            // string r = new string(a);
    
            System.Console.WriteLine(r);
        }
    }
    

    See: https://vimeo.com/7403673

    By the way, in Golang, the correct way is this:

    package main
    
    import (
      "unicode"
      "regexp"
    )
    
    func main() {
        str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
        println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
        println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
    }
    
    func ReverseGrapheme(str string) string {
    
      buf := []rune("")
      checked := false
      index := 0
      ret := "" 
    
        for _, c := range str {
    
            if !unicode.Is(unicode.M, c) {
    
                if len(buf) > 0 {
                    ret = string(buf) + ret
                }
    
                buf = buf[:0]
                buf = append(buf, c)
    
                if checked == false {
                    checked = true
                }
    
            } else if checked == false {
                ret = string(append([]rune(""), c)) + ret
            } else {
                buf = append(buf, c)
            }
    
            index += 1
        }
    
        return string(buf) + ret
    }
    
    func ReverseGrapheme2(str string) string {
        re := regexp.MustCompile("\\PM\\pM*|.")
        slice := re.FindAllString(str, -1)
        length := len(slice)
        ret := ""
    
        for i := 0; i < length; i += 1 {
            ret += slice[length-1-i]
        }
    
        return ret
    }
    

    And the incorrect way is this (ToCharArray.Reverse):

    func Reverse(s string) string {
        runes := []rune(s)
        for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
            runes[i], runes[j] = runes[j], runes[i]
        }
        return string(runes)
    }
    

    Note that you need to know the difference between
    - a character and a glyph
    - a byte (8 bit) and a codepoint/rune (32 bit)
    - a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)

    Reference:

    Character is an overloaded term than can mean many things.

    A code point is the atomic unit of information. Text is a sequence of code points. Each code point is a number which is given meaning by the Unicode standard.

    A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system. For example, both a and ä are graphemes, but they may consist of multiple code points (e.g. ä may be two code points, one for the base character a followed by one for the diaresis; but there's also an alternative, legacy, single code point representing this grapheme). Some code points are never part of any grapheme (e.g. the zero-width non-joiner, or directional overrides).

    A glyph is an image, usually stored in a font (which is a collection of glyphs), used to represent graphemes or parts thereof. Fonts may compose multiple glyphs into a single representation, for example, if the above ä is a single code point, a font may chose to render that as two separate, spatially overlaid glyphs. For OTF, the font's GSUB and GPOS tables contain substitution and positioning information to make this work. A font may contain multiple alternative glyphs for the same grapheme, too.

提交回复
热议问题