ReverseString, a C# interview-question

前端未结

关注

 12  1871

予麋鹿 2021-01-31 06:15

I had an interview question that asked me for my \'feedback\' on a piece of code a junior programmer wrote. They hinted there may be a problem and said it will be used heavily o

12条回答

无人及你 (楼主)

2021-01-31 07:06

Necromancing.
As a public service, this is how you actually CORRECTLY reverse a string
(reversing a string is NOT equal to reversing a sequence of chars)

public static class Test
{

    private static System.Collections.Generic.List GraphemeClusters(string s)
    {
        System.Collections.Generic.List ls = new System.Collections.Generic.List();

        System.Globalization.TextElementEnumerator enumerator = System.Globalization.StringInfo.GetTextElementEnumerator(s);
        while (enumerator.MoveNext())
        {
            ls.Add((string)enumerator.Current);
        }

        return ls;
    }


    // this 
    private static string ReverseGraphemeClusters(string s)
    {
        if(string.IsNullOrEmpty(s) || s.Length == 1)
             return s;

        System.Collections.Generic.List ls = GraphemeClusters(s);
        ls.Reverse();

        return string.Join("", ls.ToArray());
    }

    public static void TestMe()
    {
        string s = "Les Mise\u0301rables";
        // s = "noël";
        string r = ReverseGraphemeClusters(s);

        // This would be wrong:
        // char[] a = s.ToCharArray();
        // System.Array.Reverse(a);
        // string r = new string(a);

        System.Console.WriteLine(r);
    }
}

See: https://vimeo.com/7403673

By the way, in Golang, the correct way is this:

package main

import (
  "unicode"
  "regexp"
)

func main() {
    str := "\u0308" + "a\u0308" + "o\u0308" + "u\u0308"
    println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme(str))
    println("u\u0308" + "o\u0308" + "a\u0308" + "\u0308" == ReverseGrapheme2(str))
}

func ReverseGrapheme(str string) string {

  buf := []rune("")
  checked := false
  index := 0
  ret := "" 

    for _, c := range str {

        if !unicode.Is(unicode.M, c) {

            if len(buf) > 0 {
                ret = string(buf) + ret
            }

            buf = buf[:0]
            buf = append(buf, c)

            if checked == false {
                checked = true
            }

        } else if checked == false {
            ret = string(append([]rune(""), c)) + ret
        } else {
            buf = append(buf, c)
        }

        index += 1
    }

    return string(buf) + ret
}

func ReverseGrapheme2(str string) string {
    re := regexp.MustCompile("\\PM\\pM*|.")
    slice := re.FindAllString(str, -1)
    length := len(slice)
    ret := ""

    for i := 0; i < length; i += 1 {
        ret += slice[length-1-i]
    }

    return ret
}

And the incorrect way is this (ToCharArray.Reverse):

func Reverse(s string) string {
    runes := []rune(s)
    for i, j := 0, len(runes)-1; i < j; i, j = i+1, j-1 {
        runes[i], runes[j] = runes[j], runes[i]
    }
    return string(runes)
}

Note that you need to know the difference between
- a character and a glyph
- a byte (8 bit) and a codepoint/rune (32 bit)
- a codepoint and a GraphemeCluster [32+ bit] (aka Grapheme/Glyph)

Reference:

Character is an overloaded term than can mean many things.

A code point is the atomic unit of information. Text is a sequence of code points. Each code point is a number which is given meaning by the Unicode standard.

A grapheme is a sequence of one or more code points that are displayed as a single, graphical unit that a reader recognizes as a single element of the writing system. For example, both a and ä are graphemes, but they may consist of multiple code points (e.g. ä may be two code points, one for the base character a followed by one for the diaresis; but there's also an alternative, legacy, single code point representing this grapheme). Some code points are never part of any grapheme (e.g. the zero-width non-joiner, or directional overrides).

A glyph is an image, usually stored in a font (which is a collection of glyphs), used to represent graphemes or parts thereof. Fonts may compose multiple glyphs into a single representation, for example, if the above ä is a single code point, a font may chose to render that as two separate, spatially overlaid glyphs. For OTF, the font's GSUB and GPOS tables contain substitution and positioning information to make this work. A font may contain multiple alternative glyphs for the same grapheme, too.

0 讨论(0)

查看其它12个回答