Nesting Avro schemas

走远了吗. 提交于 2019-12-21 09:28:47

问题


According to this question on nesting Avro schemas, the right way to nest a record schema is as follows:

{
    "name": "person",
    "type": "record",
    "fields": [
        {"name": "firstname", "type": "string"},
        {"name": "lastname", "type": "string"},
        {
            "name": "address",
            "type": {
                        "type" : "record",
                        "name" : "AddressUSRecord",
                        "fields" : [
                            {"name": "streetaddress", "type": "string"},
                            {"name": "city", "type": "string"}
                        ]
                    },
        }
    ]
}

I don't like giving the field the name address and having to give a different name (AddressUSRecord) to the field's schema. Can I give the field and schema the same name, address?

What if I want to use the AddressUSRecord schema in multiple other schemas, not just person? If I want to use AddressUSRecord in another schema, let's say business, do I have to name it something else?

Ideally, I'd like to define AddressUSRecord in a separate schema, then let the type of address reference AddressUSRecord. However, it's not clear that Avro 1.8.1 supports this out-of-the-box. This 2014 article shows that sub-schemas need to be handled with custom code. What the best way to define reusable schemas in Avro 1.8.1?

Note: I'd like a solution that works with Confluent Inc.'s Schema Registry. There's a Google Groups thread that seems to suggest that Schema Registry does not play nice with schema references.


回答1:


Can I give the field and schema the same name, address?

Yes, you can name the record with the same name as the field name.

What if I want to use the AddressUSRecord schema in multiple other schemas, not just person?

You can use multiple schemas using a couple of techniques: the avro schema parser clients (JVM and others) allow you to specify multiple schemas, usually through the names parameter (the Java Schema$Parser/parse method allows multiple schema String arguments).

You can then specify dependant Schemas as a named type:

{
  "type": "record",
  "name": "Address",
  "fields": [
    {
      "name": "streetaddress",
      "type": "string"
    },
    {
      "name": "city",
      "type": "string"
    }
  ]
}

And run this through the parser before the parent schema:

{
  "name": "person",
  "type": "record",
  "fields": [
    {
      "name": "firstname",
      "type": "string"
    },
    {
      "name": "lastname",
      "type": "string"
    },
    {
      "name": "address",
      "type": "Address"
    }
  ]
}

Incidentally, this allows you to parse from separate files.

Alternatively, you can also parse a single Union schema that references schemas in the same way:

[
  {
    "type": "record",
    "name": "Address",
    "fields": [
      {
        "name": "streetaddress",
        "type": "string"
      },
      {
        "name": "city",
        "type": "string"
      }
    ]
  },
  {
    "type": "record",
    "name": "person",
    "fields": [
      {
        "name": "firstname",
        "type": "string"
      },
      {
        "name": "lastname",
        "type": "string"
      },
      {
        "name": "address",
        "type": "Address"
      }
    ]
  }
]

I'd like a solution that works with Confluent Inc.'s Schema Registry.

The schema registry does not support parsing schemas separately, but it does support the latter example of parsing into a union type.




回答2:


You can set namespace to the record type and then, in subsequent fields, use {namespace}.{name} as the type argument. Unfortunately currently there is no possibility to reference types from other schema files.



来源:https://stackoverflow.com/questions/40854529/nesting-avro-schemas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!