Invalid character exception when adding Metadata to a CloudBlob

前端未结

关注

 4  1574

Task

Upload a file to Azure Blob Storage with the original filename and also assign the filename as meta-data to the CloudBlob

相关标签:

4条回答

轻奢々

2021-01-01 16:34

To expand on the answer by bPratik, we've found that Base64 encoding metadata works nicely. We use this extension method to do the encode and decode:

    public static class Base64Extensions
    {
        public static string ToBase64(this string input)
        {
            var bytes = Encoding.UTF8.GetBytes(input);
            return Convert.ToBase64String(bytes);
        }

        public static string FromBase64(this string input)
        {
            var bytes = Convert.FromBase64String(input);
            return Encoding.UTF8.GetString(bytes);
        }
    }

and then when setting blob metadata:

blobReference.Metadata["Filename"] = filename.ToBase64();

and when retrieving it:

var filename = blobReference.Metadata["Filename"].FromBase64();

For search, you would have to decode the filename before presenting it to the indexer, or use the blob's actual filename assuming you're still using the original filename there.

0 讨论(0)

忘掉有多难

2021-01-01 16:45
If the above list is exhaustive, it should be possible to encode the metadata to HTML and then decode it when you need it:
```
var htmlEncodedValue = System.Web.HttpUtility.HtmlEncode(value)
var originalValue = System.Web.HttpUtility.HtmlDecode(value)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

我寻月下人不归

2021-01-01 16:49

Unless I get an answer that actually solves the issue, this workaround is a solution for the above issue!

Workaround

To get this to work, I am using a combination of the below methods to:

Convert all possible characters to their ascii/english equivivalent
Invalid Characters that escape this cleanup are literally stripped out of the string

But this isn't ideal as we are losing data!

Diacritics to ASCII

/// <summary>
/// Converts all Diacritic characters in a string to their ASCII equivalent
/// Courtesy: http://stackoverflow.com/a/13154805/476786
/// A quick explanation:
/// * Normalizing to form D splits charactes like è to an e and a nonspacing `
/// * From this, the nospacing characters are removed
/// * The result is normalized back to form C (I'm not sure if this is neccesary)
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string ConvertDiacriticToASCII(this string value)
{
    if (value == null) return null;
    var chars =
        value.Normalize(NormalizationForm.FormD)
             .ToCharArray()
             .Select(c => new {c, uc = CharUnicodeInfo.GetUnicodeCategory(c)})
             .Where(@t => @t.uc != UnicodeCategory.NonSpacingMark)
             .Select(@t => @t.c);
    var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
    return cleanStr;
}

Non-ASCII Burninator

/// <summary>
/// Removes all non-ASCII characters from the string
/// Courtesy: http://stackoverflow.com/a/135473/476786
/// Uses the .NET ASCII encoding to convert a string. 
/// UTF8 is used during the conversion because it can represent any of the original characters. 
/// It uses an EncoderReplacementFallback to to convert any non-ASCII character to an empty string.
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string RemoveNonASCII(this string value)
{
    string cleanStr = 
           Encoding.ASCII
                   .GetString(
                              Encoding.Convert(Encoding.UTF8,
                                               Encoding.GetEncoding(Encoding.ASCII.EncodingName,
                                                                    new EncoderReplacementFallback(string.Empty),
                                                                    new DecoderExceptionFallback()
                                                                    ),
                                               Encoding.UTF8.GetBytes(value)
                                               )
                              );
    return cleanStr;
}

I really hope to get an answer as the workaround is obviously not ideal, and it also doesn't make sense why this is not possible!

0 讨论(0)

伪装坚强ぢ

2021-01-01 16:52
Just have had confirmation from the azure-sdk-for-net team on GitHub that only ASCII characters are valid as data within blob meta-data.

joeg commented:
The supported characters in the blob metadata must be ASCII characters. To work around this you can either escape the string ( percent encode), base64 encode etc.

Source on GitHub

So as a work-around, either:
- escape the string (percent encode), base64 encode, etc, as suggested by joeg
- use the techniques that I have mentioned in my other answer.
0 讨论(0)
发布评论:

提交评论
- 加载中...