Invalid character exception when adding Metadata to a CloudBlob

前端 未结 4 1574
南方客
南方客 2021-01-01 15:59

Task

Upload a file to Azure Blob Storage with the original filename and also assign the filename as meta-data to the CloudBlob

相关标签:
4条回答
  • 2021-01-01 16:34

    To expand on the answer by bPratik, we've found that Base64 encoding metadata works nicely. We use this extension method to do the encode and decode:

        public static class Base64Extensions
        {
            public static string ToBase64(this string input)
            {
                var bytes = Encoding.UTF8.GetBytes(input);
                return Convert.ToBase64String(bytes);
            }
    
            public static string FromBase64(this string input)
            {
                var bytes = Convert.FromBase64String(input);
                return Encoding.UTF8.GetString(bytes);
            }
        }
    

    and then when setting blob metadata:

    blobReference.Metadata["Filename"] = filename.ToBase64();
    

    and when retrieving it:

    var filename = blobReference.Metadata["Filename"].FromBase64();
    

    For search, you would have to decode the filename before presenting it to the indexer, or use the blob's actual filename assuming you're still using the original filename there.

    0 讨论(0)
  • 2021-01-01 16:45

    If the above list is exhaustive, it should be possible to encode the metadata to HTML and then decode it when you need it:

    var htmlEncodedValue = System.Web.HttpUtility.HtmlEncode(value)
    var originalValue = System.Web.HttpUtility.HtmlDecode(value)
    
    0 讨论(0)
  • 2021-01-01 16:49

    Unless I get an answer that actually solves the issue, this workaround is a solution for the above issue!

    Workaround

    To get this to work, I am using a combination of the below methods to:

    1. Convert all possible characters to their ascii/english equivivalent
    2. Invalid Characters that escape this cleanup are literally stripped out of the string

    But this isn't ideal as we are losing data!

    Diacritics to ASCII

    /// <summary>
    /// Converts all Diacritic characters in a string to their ASCII equivalent
    /// Courtesy: http://stackoverflow.com/a/13154805/476786
    /// A quick explanation:
    /// * Normalizing to form D splits charactes like è to an e and a nonspacing `
    /// * From this, the nospacing characters are removed
    /// * The result is normalized back to form C (I'm not sure if this is neccesary)
    /// </summary>
    /// <param name="value"></param>
    /// <returns></returns>
    public static string ConvertDiacriticToASCII(this string value)
    {
        if (value == null) return null;
        var chars =
            value.Normalize(NormalizationForm.FormD)
                 .ToCharArray()
                 .Select(c => new {c, uc = CharUnicodeInfo.GetUnicodeCategory(c)})
                 .Where(@t => @t.uc != UnicodeCategory.NonSpacingMark)
                 .Select(@t => @t.c);
        var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
        return cleanStr;
    }
    

    Non-ASCII Burninator

    /// <summary>
    /// Removes all non-ASCII characters from the string
    /// Courtesy: http://stackoverflow.com/a/135473/476786
    /// Uses the .NET ASCII encoding to convert a string. 
    /// UTF8 is used during the conversion because it can represent any of the original characters. 
    /// It uses an EncoderReplacementFallback to to convert any non-ASCII character to an empty string.
    /// </summary>
    /// <param name="value"></param>
    /// <returns></returns>
    public static string RemoveNonASCII(this string value)
    {
        string cleanStr = 
               Encoding.ASCII
                       .GetString(
                                  Encoding.Convert(Encoding.UTF8,
                                                   Encoding.GetEncoding(Encoding.ASCII.EncodingName,
                                                                        new EncoderReplacementFallback(string.Empty),
                                                                        new DecoderExceptionFallback()
                                                                        ),
                                                   Encoding.UTF8.GetBytes(value)
                                                   )
                                  );
        return cleanStr;
    }
    

    I really hope to get an answer as the workaround is obviously not ideal, and it also doesn't make sense why this is not possible!

    0 讨论(0)
  • 2021-01-01 16:52

    Just have had confirmation from the azure-sdk-for-net team on GitHub that only ASCII characters are valid as data within blob meta-data.

    joeg commented:
    The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.

    Source on GitHub

    So as a work-around, either:

    • escape the string (percent encode), base64 encode, etc, as suggested by joeg
    • use the techniques that I have mentioned in my other answer.
    0 讨论(0)
提交回复
热议问题