Regex to parse image data URI

后端 未结 5 2197
野性不改
野性不改 2021-02-07 04:29

If I have :



        
相关标签:
5条回答
  • 2021-02-07 04:43

    EDIT: expanded to show usage

    var regex = new Regex(@"data:(?<mime>[\w/\-\.]+);(?<encoding>\w+),(?<data>.*)", RegexOptions.Compiled);
    
    var match = regex.Match(input);
    
    var mime = match.Groups["mime"].Value;
    var encoding = match.Groups["encoding"].Value;
    var data = match.Groups["data"].Value;
    

    NOTE: The regex applies to the input shown in question. If there was a charset specified too, it would not work and would have to be rewritten.

    0 讨论(0)
  • 2021-02-07 04:58

    Actually, you don't need a regex for that. According to Wikipedia, the data URI format is

    data:[<MIME-type>][;charset=<encoding>][;base64],<data>
    

    so just do the following:

    byte[] imagedata = Convert.FromBase64String(imageSrc.Substring(imageSrc.IndexOf(",") + 1));
    
    0 讨论(0)
  • Here is my regular expression where I had to separate the mime-type (image/jpg) as well.

    ^data:(?<mimeType>(?<mime>\w+)\/(?<extension>\w+));(?<encoding>\w+),(?<data>.*)
    
    0 讨论(0)
  • 2021-02-07 05:03

    Data URI's have a bit of complexity to them, they can contain params, media type, etc... and sometimes you need to know this info, not just the data.

    To parse a data URI and extract all of the relevant parts, try this:

    /**
     * Parse a data uri and return an object with information about the different parts
     * @param {*} data_uri 
     */
    function parseDataURI(data_uri) {
        let regex = /^\s*data:(?<media_type>(?<mime_type>[a-z\-]+\/[a-z\-\+]+)(?<params>(;[a-z\-]+\=[a-z\-]+)*))?(?<encoding>;base64)?,(?<data>[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*)$/i;
        let result = regex.exec(data_uri);
        let info = {
            media_type: result.groups.media_type,
            mime_type: result.groups.mime_type,
            params: result.groups.params,
            encoding: result.groups.encoding,
            data: result.groups.data
        }
        if(info.params)
            info.params = Object.fromEntries(info.params.split(';').slice(1).map(param => param.split('=')));
        if(info.encoding)
            info.encoding = info.encoding.replace(';','');
        return info;
    }

    This will give you an object that has all the relevant bits parsed out, and the params as a dictionary {foo: baz}.

    Example (mocha test with assert):

    describe("Parse data URI", () => {
        it("Should extract data URI parts correctly",
            async ()=> {
                let uri = 'data:text/vnd-example+xyz;foo=bar;bar=baz;base64,R0lGODdh';
                let info = parseDataURI(uri);
                assert.equal(info.media_type,'text/vnd-example+xyz;foo=bar;bar=baz');
                assert.equal(info.mime_type,'text/vnd-example+xyz');
                assert.equal(info.encoding, 'base64');
                assert.equal(info.data, 'R0lGODdh');
                assert.equal(info.params.foo, 'bar');
                assert.equal(info.params.bar, 'baz');
            }
        );
    });

    0 讨论(0)
  • 2021-02-07 05:04

    I faced also with the need to parse the data URI scheme. As a result, I improved the regular expression given on this page specifically for C# and which fits any data URI scheme (to check the scheme, you can take it from here or here.

    Here is my solution for C#:

    private class DataUriModel {
      public string MediaType { get; set; }
      public string Type { get; set; }
      public string[] Tree { get; set; }
      public string Subtype { get; set; }
      public string Suffix { get; set; }
      public string[] Params { get; set; }
      public string Encoding { get; set; }
      public string Data { get; set; }
    }
    
    static void Main(string[] args) {
      string s = "data:image/prs.jpeg+gzip;charset=UTF-8;page=21;page=22;base64,/9j/4AAQSkZJRgABAQAAAQABAAD";
      var parsedUri = GetDataURI(s);
      Console.WriteLine(decodedUri.Type);
      Console.WriteLine(decodedUri.Subtype);
      Console.WriteLine(decodedUri.Encoding);
    }
    
    private static DataUriModel GetDataURI(string data) {
      var result = new DataUriModel();
      Regex regex = new Regex(@"^\s*data:(?<media_type>(?<type>[a-z\-]+){1}\/(?<tree>([a-z\-]+\.)+)?(?<subtype>[a-z\-]+){1}(?<suffix>\+[a-z]+)?(?<params>(;[a-z\-]+\=[a-z0-9\-\+]+)*)?)?(?<encoding>;base64)?(?<data>,+[a-z0-9\\\!\$\&\'\,\(\)\*\+\,\;\=\-\.\~\:\@\/\?\%\s]*\s*)?$", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Multiline);
      var match = regex.Match(data);
    
      if (!match.Success)
        return result;
    
      var names = regex.GetGroupNames();
      foreach (var name in names) {
        var group = match.Groups[name];
        switch (name) {
          case "media_type": result.MediaType = group.Value; break;
          case "type": result.Type = group.Value; break;
          case "tree": result.Tree = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[0..^1].Split(".") : null; break;
          case "subtype": result.Subtype = group.Value; break;
          case "suffix": result.Suffix = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
          case "params": result.Params = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..].Split(";") : null; break;
          case "encoding": result.Encoding = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
          case "data": result.Data = !string.IsNullOrWhiteSpace(group.Value) && group.Value.Length > 1 ? group.Value[1..] : null; break;
        }
      }
    
      return result;
    }
    
    0 讨论(0)
提交回复
热议问题