Data validation in AVRO

痞子三分冷 提交于 2020-01-11 05:26:07

问题


I am new to AVRO and please excuse me if it is a simple question. I have a use case where I am using AVRO schema for record calls.

Let's say I have avro schema

{
    "name": "abc",
    "namepsace": "xyz",
    "type": "record",
    "fields": [
        {"name": "CustId", "type":"string"},
        {"name": "SessionId", "type":"string"},
     ]
}

Now if the input is like

{
    "CustId" : "abc1234"
    "sessionID" : "000-0000-00000"
}

I want to use some regex validations for these fields and I want take this input only if it comes in particular format shown as above. Is there any way to specify in avro schema to include regex expression?

Any other data serialization formats which supports something like this?


回答1:


You should be able to use a custom logical type for this. You would then include the regular expressions directly in the schema.

For example, here's how you would implement one in JavaScript:

var avro = require('avsc'),
    util = require('util');

/**
 * Sample logical type that validates strings using a regular expression.
 *
 */
function ValidatedString(attrs, opts) {
  avro.types.LogicalType.call(this, attrs, opts);
  this._pattern = new RegExp(attrs.pattern);
}
util.inherits(ValidatedString, avro.types.LogicalType);

ValidatedString.prototype._fromValue = function (val) {
  if (!this._pattern.test(val)) {
    throw new Error('invalid string: ' + val);
  }
  return val;
};

ValidatedString.prototype._toValue = ValidatedString.prototype._fromValue;

And how you would use it:

var type = avro.parse({
  name: 'Example',
  type: 'record',
  fields: [
    {
      name: 'custId',
      type: 'string' // Normal (free-form) string.
    },
    {
      name: 'sessionId',
      type: {
        type: 'string',
        logicalType: 'validated-string',
        pattern: '^\\d{3}-\\d{4}-\\d{5}$' // Validation pattern.
      }
    },
  ]
}, {logicalTypes: {'validated-string': ValidatedString}});

type.isValid({custId: 'abc', sessionId: '123-1234-12345'}); // true
type.isValid({custId: 'abc', sessionId: 'foobar'}); // false

You can read more about implementing and using logical types here.

Edit: For the Java implementation, I believe you will want to look at the following classes:

  • LogicalType, the base you'll need to extend.
  • Conversion, to perform the conversion (or validation in your case) of the data.
  • LogicalTypes and Conversions, a few examples of existing implementations.
  • TestGenericLogicalTypes, relevant tests which could provide a helpful starting point.


来源:https://stackoverflow.com/questions/37279096/data-validation-in-avro

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!